Supercomputing For The Masses
Original Article Date: 2005-03-11
The new ANDROMEDA Supercomputer
Once the preserve of CRAY and SGI, the
dream of supercomputing would typically require a budget the size of small
country's GDP. But with the introduction of the 800 Series AMD Opteron
CPUs, affordable supercomputing is now a reality.
Electronics Nexus brings you the
ANDROMEDA 5U rackmount server, which houses no less
that EIGHT Opteron CPUs on two mainboards connected via AMD
Hyper-Transport links, providing seamless parallel computing for the heaviest
enterprise server workloads.
With 8 CPUs at your disposal in a single machine, addressable by a single
operating system, new possibilities open up in the realm of enterprise class
servers, research computing, engineering modelling, data centers and
media streaming. With the equivalent processing power of FOUR Dual CPU
servers in a single box, the hassle of clustering, or running
different applications on different servers can be avoided.
The ANDROMEDA
rackmount server comes with all the expected enterprise server features, except
in this case the dial is turned up to "11"! 24 SCSI Hotswap Drive Bays
with RAID 0/1/10/5/50/JBOD support, four
independent PCI-X 133MHz slots, capacity for up to 64MB of
RAM (128MB with 4GB DIMMs), 8 Gigabit LAN ports and 1350W
of power to support all this muscle. You simply won't find a more powerful
server built using x86 architecture. And because it IS built on x86
architecture, the price makes it affordable to the small-to-medium sized
business.
But how is this possible? How do you knit together multiple x86 CPUs into a
seamless parallel processing environment? To get the correct perspective on
that answer, we first need to digress with a short history lesson...
A Brief History of x86 Multi-Processor CPUs
When AMD began working on their next generation server CPU some
years ago, they wanted to design it from the bottom up to support future
technologies, such as 64-bit registers, direct on-die memory addressing, and
networking of multiple CPUs on the same connective transport.
Because AMD did not previously have a serious dual or multi processor up to that
time (there was the Athlon MP, but it never really made its
mark again Intel's offerings), they had the luxury of starting from scratch in
developing their new server CPU.
Intel, on the other hand, had the legacy of evolving their Xeon
dual/multi processor CPU, which was originally designed around the Front-Side
Bus (FSB) concept of addressing memory and system buses that was
already showing signs of age.
The problem with the front-side bus architecture was that any CPU had to go
through the memory controller hub (MCH) housed off the
CPU die in the "Northbridge" chipset. The MCH is often referred to as the
Northbridge, although this chipset also handles the AGP port, PCI-X buses and
so on.
The MCH would run at a certain frequency, once 100MHz, and now at 200MHz, but
always below the clock-speed of the CPU . Having a certain
bit-address space meant that only so many GB of RAM or system device
information could be addressed at any one time. This is fine for a single
CPU, such as the Pentium or Athlon, but when two or more CPUs have to ask for
memory or data from other devices in the systemthrough this single hub, it can
become a bottleneck. And so the MCH/FSB architecture is simply
not scalable to multi-processing environments.
Realising this, AMD decided on a radically different architecture
for addressing memory and system buses and sharing loads between processors in
a multi-CPU system.
The Opteron's Secret - The "Hyper Transport"
At the core of the Opteron's design is the Hyper Transport (HT)bus.
It's essentially a high-bandwidth interconnect running between 2 or more
Opteron CPUs, in a multi-processor system. Because the Hyper-Transport operates
at the same frequency as the core CPU clock speed, there is
plenty of bandwidth available for sharing memory and other data between CPUs.
This enables easy scalability of multi-processor systems, with
the 800 Series Opterons capable of operating with 7 other CPUs
(8 total per system).
Together with the Opteron's on-die Integrated Memory Controller,
and the Direct Connect Architecture, memory is addressed
directly by each CPU. If, during multi-processor operations, one CPU needs more
memory than what is available in its own bank, it can request memory from other
CPUs if available, and this is done across the high-bandwith Hyper Transport
using the industry standard NUMA (Non-Uniform Memory
Addressing) architecture found typically in much higher-end computing
environments.
This architecture ensures that even in systems where 8 or more CPUs are
operating, data bottlenecks do not occur. And so, with relatively cheap x86
processors, AMD are able to bring about large multi-processor "supercomputers"
at a fraction of the cost of other proprietory systems.
If you think you might have a need for your own bit of "supercomputing" feel
free to give us a call. We'll be happy to see if either the
ANDROMEDA or its popular "baby" brother 4-CPU
ORION system is the right solution for your
needs!
Best regards,
Ben Ranson
Chief Systems Engineer