Server Specs - A SearchDataCenter.com blog

Server Specs:

 

A SearchDataCenter.com blog


The blog for all things data center, including, design and infrastructure, Unix, Linux, mainframes and x86 servers, power and cooling efficiency, information technology (IT) service management, server consolidation and virtualization and more.

High-performance computing, mondo memory and new style applications

I was at TheServerSide.com’s Java Symposium last week and got a fascinating perspective on where high-end Java apps are headed, and the infrastructure that will be needed to support them.

Hedge funds and more traditional financial service firms all are deep into creating what they call grids (they’re not talking about time sharing across occasionally idle computers) for doing performance intensive stuff like programmed trading. Imagine hundreds of motherboards ripped out of servers and velcroed into racks, all running stripped down Linux cores and highly tuned Java Virtual Machines (JVMs) on top.

Kirk Pepperdine, veteran Java performance tuning consultant, discussed the growing reliance on non-volatile memory over disk (you’ll have to register to download the talk, but it’s pretty easy) to reduce latency in these applications. NVRAM can mean high-memory footprint motherboards, as we are now in the hundreds of gigabytes for some systems. But it can also mean solid state disk, which maybe undergoing one of its periodic surges. Not only are people trying to put entire programs in memory, but as much data as they can, too.

Typical databases of, say, 1990, would easily fit into today’s NVRAM. But not necessarily the databases of today, which have grown into millions and millions of rows. But objects and services give developers an option they are starting to explore: stuffing the data into the objects themselves and stuffing the objects into solid state caches, whether that’s onboard or outboard.

One thing that is making this possible is using the grid. Some shops are using what is effectively distributed memory, as Iona Technology’s technical director, John Davies, pointed out. Products like GiagSpaces, Oracle’s Coherence and GemFire create and manage a memory space across many machines.

Another take comes from Azul Compute Appliance, your classic black box.The company has attacked a specific and troublesome problem with Java apps – garbage collection. The JVM can pause for as long as 30 seconds every few minutes to do its thing. Even if garbage collection isn’t that extreme, brief pauses are not acceptable for high-performance trading apps. So Azul designed its own chips that use a proprietary instruction set to make garbage collection non-disruptive. They stuff up to 768 of them in a box, with up to 768GB RAM. Software on the hosts redirects calls to the JVM to the Azul box, where it runs as if it was on the host.

Azul’s boxes run in the tens and hundreds of thousands of dollars, but they can run many JVMs at once. According to benchmark’s run by Pepperdine, they definitely turn up the heat on Java apps.

Server managers in the big Wall Street firms are already dealing with these new concepts, and you can expect them to migrate outward in coming years. Just as storage and networking have been disaggregated from the computer, some amount of memory and processing, at least for specialized purposes, may also migrate on to the network (probably 10GigE or InfiniBand). One of the big reasons pointed out by Pepperdine: with multicore processors, clock speeds are not increasing. Therefore, app developers must seek other ways to increase performance.

Given that parallelism is still only minimally doable for all but the rocket scientists, techniques like greater use of caching are bound to gain popularity, so it’s probably worth your while to start investigating this whole technology area. At the architecture level, there’s plenty to understand: should you virtualize and cluster at the app level, or go the route of Virtual Iron or ScaleMP, which allow you to concatenate multiple physical machines into a single large VM?

Never a dull moment.

IBM dominates latest TOP500 supercomputing list

The twice-yearly TOP500 list of the world’s fastest supercomputers released November 12 is dominated by IBM in number of systems and performance.

The TOP500 list was released at SC07, the international conference on high performance computing, networking, storage and analysis, in Reno, Nevada.

The top spot is held by reigning world champion Blue Gene/L System. Pictured below, the system is a joint development of IBM and the Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) and is installed at DOE’s Lawrence Livermore National Laboratory in Livermore, Calif. The system was upgraded recently and now achieves a Linpack benchmark performance of now 478.2 teraflops (TFlop/s) — or trillions of calculations per second.
The world's fastest supercomputer from IBM at Livermore National Laboratory in California

Coming in at No. 2 is a brand-new installation of the same type of IBM system, albeit a newer version. It is a Blue Gene/P system installed in Germany at the Forschungszentrum Juelich (FZJ) and it achieved performance of 167.3 TFlop/s.

It’s not before No. 3 in the TOP500 list that you find a non-IBM system. That honor is reserved for the New Mexico Computing Applications Center (NMCAC), which has a Silicon Graphics Inc. (SGI) system built on the Altix ICE 8200 model that posted a speed of 126.9 TFlop/s.

In the latest list, IBM regained a solid lead in the total number of systems with 232 (46.4%) over Hewlett-Packard (HP) with 166 systems (33.2%).

Amazon EC2 users lose data due to “growing pains”

Amazon Elastic Compute Cloud(EC2), a web service that provides resizable compute capacity in the “cloud,” went down Saturday and a bunch of customers lost their application data. We saw this info in Data Center Knowledge and thought it was interesting enough to post here.

Amazon EC2 is basically a virtual data center that allows developers to increase or decrease capacity — from one to even thousands of server instances simultaneously — within moments.

Using Xen Virtualization, each virtual machine is the “equivalent of a system with a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth.”

The incident understandably ticked off EC2 users who lost their data, but it doesn’t look like they have much recourse, since this service is still considered beta and lacks service level agreements.

The outage signals a serious need for backup.

One user, Reuven, posted a comment saying, “To be blunt, this scares the hell out of me. What kind of redundancy does the current EC2 API have to avoid this from happening again? Does EC2 practice what it preaches and use SQS or some other queue service?”
This incident is considered by Amazon as growing pains of EC2 service, which is about a year old now.

Not too long ago I wrote a blog about Sun’s CIO saying the days corporate owned and operated data centers will be a thing of the past by 2015.

But virtualization/ cloud computing issues like this do nothing to win the confidence of conservative data center managers who likely sigh a collective “I told you so” from the safety of their brick and mortar facilities full of physical machines and back up.

Windows makes Top 500 list

Windows introduced its first operating system for high performance computing clusters last year, and its already achieved two spots on the Top 500 Supercomputers List.

Windows Compute Cluster Server 2003 appeared on the computing industry’s semiannual top 500 list of the world’s most powerful supercomputers this week.

The operating system served on a new HPC cluster at Mitsubishi UFJ Securities of  Japan, which placed at 193 on the list

The top 500 benchmark was run on a 448-node IBM BladeCenter HS21 cluster with 1,760 processors. The benchmark result was 6.52 trillion computations per second (teraflops).

Windows also served as operating system for a new HPC cluster at Microsoft’s datacenter in Tukwila, Wash., which ranked 106 in the top 500. This system achieved 8.99 teraflops on 256 compute nodes and 2,048 processing cores of 64-bit Intel Xeon 5300 quad-core processors, powering Dell blade servers.  

I spoke with Microsoft this week for a story on their emergence into the high performance computing market, and they were pretty psyched about making this list.

Look out Linux.

IBM expands HPC cluster offerings

IBM announced today the availability of Windows Compute Cluster Server 2003 for the IBM System Cluster 1350, giving mid-market and enterprise clients a familiar operating system to work with in additional to Linux.

IBM also announced today expanded server, storage and networking options for clusters.

High performance computing clusters can range from a few up to thousands of servers woven together to deliver high-speed performance.

The clustered system is also designed to leverage Novell SUSE Linux Enterprise Server (SLES), and now supports the SLES10 operating system.

To read more, go to IBM’s press room.

PNNL quantifies energy savings of liquid cooling

RICHLAND, WASH.– Pacific Northwest National Laboratory is putting its shoulder to the wheel in the effort to reduce data center energy consumption. The Dept. of Energy’s high performance computing lab has launched a program to measure the effectiveness and possible energy savings that would come from liquid cooling in the data center. The lab will focus on a liquid cooling spray technology, developed by Liberty Lake, Wash.-based ISR, SprayCool.

PNNL will run the experiment on an eight rack system, a 14 teraflop peak, 9 sustained teraflop computer in a very small space — 800 square feet. The computer won’t just sit there — it will run codes, mainly computational fluid dynamics, measuring performance, temperature of the processors, overall room temperature.

According to Dr. Moe A. Khaleel, Director, Computational Sciences and Mathematics Division at PNNL, the lab will be able to conduct “what-if” experiments, like turning off the room’s air conditioning to see what happens to the room temperature and the temperature of the processor.

In addition to measuring the effectiveness of liquid cooling in high density environments, the study will also measure performance per watt. The facility is outfitted to meter the electrical input on the room.

“We’ll be able to see how much energy savings you can have if there are any, over air cooling. We’ll report on it nationally and will publish results month by month,” Khaleel said. “This is a national lab, we have to serve the national mission. Energy efficiency of data centers is one of the things we need to be doing. We believe the results will be positive, but we want to quantify things.”

PNNL’s first liquid cooling report will come out in June or July.

IBM rolls out new HPC cluster offerings

Big Blue is rolling out new high performance computing (HPC) initiatives targeted at commercial users. IBM dominates the Top 500 with its Blue Gene technology, but according to Charles King at Pund-IT Research in Hayward, Calif. IBM is missing the long tail of more mid-level HPC users. Those shops are predominantly HP and Dell shops with x86-based HPC systems.

From King’s analyst report: “Though IBM delivers a wide variety of innovative x86 solutions, including rack- and blade-based servers with Intel and AMD processors, the company trails HP and Dell in overall x86 market share and in clustered HPC sales. How can IBM make up that ground? By pursuing a three-prong strategy that leverages the company’s extensive supercomputing experience, provides benefits to its ISV and business partners, and makes life easier for potential customers.”

IBM’s plan includes selling preconfigured clusters, offering more options for Microsoft Compute Cluster server and supporting more try-and-buy programs for customers.

High performance computing failover tool from Scali

High performance computing startup Scali rolled out a high availability extension for its HPC management platform, called Scali Manage/HA. The new extension plugs into Scali’s Linux cluster management software, Scali Manage 5.4. It’s designed for fault tolerance on the Scali Manage server, providing access to the management console, configuration states and images in the event of a failure. It also provides access to cluster gateways enabling access to all nodes within clusters. You can check out Scali’s customer list for more info on the company.

Roadtrip: Pacific Northwest National Lab

Next week I’m headed out to the Pacific Northwest National Laboratory in Richland, Washington. I’ll be crossing the snowy Cascades, into Eastern Oregon and finally crossing the Columbia (The data center Promised Land) into Washington. It’s about a 6 hour drive from Eugene, OR but it’s going to be worth it.

 I’m going to be meeting with scientists and IT staff to talk about high performance computing, energy efficiency in the data center, and the projects underway at PNNL in the healthcare and airline sector.