Supercomputing For The Masses

Original Article Date: 2005-03-11


The new ANDROMEDA Supercomputer

Once the preserve of CRAY and SGI, the dream of supercomputing would typically require a budget the size of small country's GDP. But with the introduction of the 800 Series AMD Opteron CPUs, affordable supercomputing is now a reality.

Electronics Nexus brings you the ANDROMEDA 5U rackmount server, which houses no less that EIGHT Opteron CPUs on two mainboards connected via AMD Hyper-Transport links, providing seamless parallel computing for the heaviest enterprise server workloads.

With 8 CPUs at your disposal in a single machine, addressable by a single operating system, new possibilities open up in the realm of enterprise class servers, research computing, engineering modelling, data centers and media streaming. With the equivalent processing power of FOUR Dual CPU servers in a single box, the hassle of clustering, or running different applications on different servers can be avoided.

The ANDROMEDA rackmount server comes with all the expected enterprise server features, except in this case the dial is turned up to "11"! 24 SCSI Hotswap Drive Bays with  RAID 0/1/10/5/50/JBOD support, four independent PCI-X 133MHz slots, capacity for up to 64MB of RAM (128MB with 4GB DIMMs), 8 Gigabit LAN ports and 1350W of power to support all this muscle. You simply won't find a more powerful server built using x86 architecture. And because it IS built on x86 architecture, the price makes it affordable to the small-to-medium sized business.

But how is this possible? How do you knit together multiple x86 CPUs into a seamless parallel processing environment? To get the correct perspective on that answer, we first need to digress with a short history lesson...

A Brief History of x86 Multi-Processor CPUs

When AMD began working on their next generation server CPU some years ago, they wanted to design it from the bottom up to support future technologies, such as 64-bit registers, direct on-die memory addressing, and networking of multiple CPUs on the same connective transport.

Because AMD did not previously have a serious dual or multi processor up to that time (there was the Athlon MP, but it never really made its mark again Intel's offerings), they had the luxury of starting from scratch in developing their new server CPU.

Intel, on the other hand, had the legacy of evolving their Xeon dual/multi processor CPU, which was originally designed around the Front-Side Bus (FSB) concept of addressing memory and system buses that was already showing signs of age.

The problem with the front-side bus architecture was that any CPU had to go through the memory controller hub (MCH)  housed off the CPU die in the "Northbridge" chipset. The MCH is often referred to as the Northbridge, although this chipset also handles the AGP port, PCI-X buses and so on.

The MCH would run at a certain frequency, once 100MHz, and now at 200MHz, but always below the clock-speed of the CPU . Having a certain bit-address space meant that only so many GB of RAM or system device information could be addressed at any one time. This is fine for a single CPU, such as the Pentium or Athlon, but when two or more CPUs have to ask for memory or data from other devices in the systemthrough this single hub, it can become a bottleneck. And so the MCH/FSB architecture is simply not scalable to multi-processing environments.

Realising this, AMD decided on a radically different architecture for addressing memory and system buses and sharing loads between processors in a multi-CPU system.

The Opteron's Secret - The "Hyper Transport"

At the core of the Opteron's design is the Hyper Transport (HT)bus. It's essentially a high-bandwidth interconnect running between 2 or more Opteron CPUs, in a multi-processor system. Because the Hyper-Transport operates at the same frequency as the core CPU clock speed, there is plenty of bandwidth available for sharing memory and other data between CPUs. This enables easy scalability of multi-processor systems, with the 800 Series Opterons capable of operating with 7 other CPUs (8 total per system).

Together with the Opteron's on-die Integrated Memory Controller, and the Direct Connect Architecture, memory is addressed directly by each CPU. If, during multi-processor operations, one CPU needs more memory than what is available in its own bank, it can request memory from other CPUs if available, and this is done across the high-bandwith Hyper Transport using the industry standard NUMA (Non-Uniform Memory Addressing) architecture found typically in much higher-end computing environments.

This architecture ensures that even in systems where 8 or more CPUs are operating, data bottlenecks do not occur. And so, with relatively cheap x86 processors, AMD are able to bring about large multi-processor "supercomputers" at a fraction of the cost of other proprietory systems.

If you think you might have a need for your own bit of "supercomputing" feel free to give us a call. We'll be happy to see if either the ANDROMEDA or its popular "baby" brother 4-CPU ORION system is the right solution for your needs!

Best regards,

Ben Ranson
Chief Systems Engineer