Contact us!
Find Us On Facebook

Free Ground Shipping

Pay by VISA, Mastercard, Discover, AMEX, Paypal or Bank Transfer

AMD Solution Provider
NVIDIA Partnerforce Program Member

General Purpose GPUs - 60x the Power of Today's CPUs?

Original Article Date: 2009-02-17


The nVidia Tesla S1070 - 960-cores and 1000GFLOPS in a 1U enough for you?

A couple of years ago I had a hunch, and I passed on this hunch to a number of you. Basically, I said that a third manufacturer would arise to challenge Intel and AMD in the CPU market. I had seen the phenomenal growth and maturity of nVidia, the graphics chip manufacturer, in the half-decade previous. I'd noted that a new powerhouse within the PC was rising to rival the computing power of the main CPU. That powerhouse was the Graphics Processing Unit (or "GPU").

Fuelled by the millions of dollars from insatiable gamers in need of faster frame-rates while playing, Doom, Quake, Half-Life and Far-Cry (in that chronological order...), nVidia developed more and more powerful chips that would render the complex series of floating-point calculations necessary to create a virtual world of objects and scenery within the PC.

Their great rival ATi, was also responding to gamers needs by developing faster and more powerful chips, but it was clear that they were no competition to the technological brains and management acumen of nVidia. As many of you know, ATi was acquired by AMD in 2006, but that's another story. It's just worth mentioning here the irony that nVidia's greatest rival was bought by a CPU manufacturer, in an article about nVidia morphing into a manufacturer of high performance "CPUs".

The graphics company parallelled their success in the gaming market with display cards aimed at the workstation sector, specifically the QuadroFX series, that are optimised for line and layer based graphical rendering common to 3D design, engineering and scientific displays, as opposed to heavily textured gaming graphics.

From Processing Graphics to Processing... Anything

Several years ago, out of the GeForce and QuadroFX product lines, a feature was quietly spawned that enabled programmers to write code in the C programming language that could run on the graphics card's GPU. And that code could be anything, not just graphics routines. It didn't occur to anyone at the time, perhaps, that this feature would lead to a revolution in the power of scientific and high-performance computing that is still in its early stages, but which promises to potentially change the world. For it seemed that graphics processors were particularly good at performing certain types of massively-parallel calculations. Much better, in fact, than a conventional CPU, which had evolved along a very different path due to the more general demands of managing an entire PC.

Perhaps this remarkable performance isn't surprising when one considers the architecture of the typical GPU. Multi-core CPUs are now the standard, but have only become so in the last couple of years, and in any case are limited currently to four cores. GPUs on the other hand, perhaps due to the unique problems of solving large-scale 3D graphics problems, evolved with multiple cores at a relatively early stage. Today, nVidia have a GPU with no less than 240 cores! That's sixty times that of a conventional quad-core CPU!

Couple this with the super-high memory bandwidth between a GPU and its RAM, and you have something that can theoretically deliver orders of magnitude of performance gain over a traditional CPU. To compare, the nVidia GeForce GTX285 advertises 159GB/s bandwidth between its 240 cores and 1GB of GDDR3 RAM. This compares with 15-20GB/s achievable with the fastest conventional CPUs today. So as long as you can write your code to fit within the RAM of the graphics card, then you're looking at an order of magnitude increase in data transfer.

So GPUs are no longer just for graphics, it seems. To reflect the difference in purpose between GPUs as graphics processors and GPUs as simple number crunchers for general computing, the term GPGPU (General Purpose Graphics Processing Unit) has been coined. But could GPGPUs really replace the CPU we know and love?

60x the Performance of a CPU? Seriously?

Could one expect to get sixty times the performance in parallel computing applications? In an industry where claims of performance gains of 20% or 50% are noticed, to suggest that something would be an order of magnitude (i.e. 10 times or 1000% greater) faster in performance would seem wild indeed. And yet this is exactly what nVidia are claiming. Performance improvements of 30x, 40x, even 100x are not only claimed by the manufacturer, but appear to be backed up by many end-users in the scientific community who have switched over to running their simulations on GPGPUs.

Sounds unbelievable, doesn't it? And yet it appears that orders of magnitude of performance gain are indeed achievable. And since we're talking about using commodity graphics cards, the price tag is quite reasonable. So what's the catch?

Well, firstly, to really get the benefit of the multi-core architecture of a GPGPU, your code has to be written in a massively multi-threaded (i.e. parallel) way. So general computing, which relies on complex interactions and wait states between different devices and modules is not especially suitable for this architecture. So don't say goodbye to your beloved CPU. But applications which allow massive delegation of repetitive code tasks out to multiple execution threads could benefit enormously. These include:

  • Scientific and engineering modelling of real-world phenomena, such as fluid dynamics, quantum mechanics, astrophysics, weather patterns, protein and molecular simulation etc.
  • Image and video processing and rendering - this is a special topic, which I'll cover in its own section below 
  • Financial modelling and economic systems analysis and prediction.

The second "catch" is that you have to write your code especially for the GPU. A regular compiler would simply run any code on the main system CPU and ignore all that power sitting on your graphics card. To make things easier, nVidia have developed a special C compiler that will run the code on the GPU, using a software framework called CUDA. Full details of this framework can be found at nVidia's CUDA website.

So, with a small number of exceptions (see below), it's not something you can just plug in overnight, and hope to suddenly deliver to you 60x the power of a traditional CPU in your software. You have to do some work. But with performance gains like this, most scientists, engineers, analysts and video professionals would think it would be worth at least investigating. And besides, scientists are used to writing their own code!

Lastly, those concerned with working in double-precision floating point math shouldn't get too excited, yet - the quoted performance figures are for single-precision numerics, double-precision performance is significantly lower (about one-tenth). nVidia say that they are addressing this issue and it is expected that future GPGPUs will perform much better in double-precision. But that said, even the double-precision power of a GPGPU may well exceed that of a traditional CPU.

Video and Image Rendering

The application of GPGPUs to video and image rendering is somewhat different to scientific computing in that it lends itself to commercial, off-the-shelf software solutions that allow the user to leverage the power of the GPGPU.

Adobe Creative Suite 4 has special plug-in for CUDA-enabled GPGPU rendering of video which is included with the nVidia Quadro CX graphics card. This card is essentially a Quadro FX4800 with the plug-in software included. According to nVidia and Adobe, HD video can be rendered with Premiere Pro in only 25% of the time compared with a traditional dual-core CPU, suggesting that the Quadro CX has the rendering power equivalent to about eight traditional CPU cores. That's impressive. With an eight CPU core system (i.e. 2 x quad-core Opterons or Xeons), therefore, including a Quadro CX would enable rendering of HD video in about half the time.

The Quadro CX provides additional benefits to other titles in Adobe's Creative Suite - nVidia has a web-page found here describing such features.

Adobe CS4 is just one off-the-shelf application that provides GPU acceleration right now to video professionals. But expect more video and effects software vendors to follow on from this lead in the coming months and years. This is because video processing lends itself so well to massively parallel computations that GPGPUs excel in, and a software vendor has it in their interests to offer packaged solutions for GPU computing that make their products more attractive to demanding video professionals.

Hardware Solutions for GPU Computing


nVidia Tesla C1060 - 240-cores and 4GB RAM. Where's the DVI connector?

In addition to the Quadro CX mentioned above, nVidia offer two hardware solutions specifically for General Purpose GPU Computing, under the Tesla product brand.

The first solution is for workstations. The Tesla C1060 is basically a Quadro FX5800 without the video output, or a GeForce GTX285 with extra RAM. It comes with 240-cores, 4GB of GDDR3 RAM, 102GB/s memory bandwidth and a single precision computing power of approximately 1000GFLOPS (one trillion floating-point operations per second, near enough). That is some serious number crunching - when compared with a current conventional quad-core CPU delivering, at most, 100GFLOPS. The C1060 mounts into a standard available PCI-Express*16 slot. Multiple C1060s can be incorporated into a workstation, the only limitation being the number of available PCI-Express*16 slots on the motherboard, and space (the card is double-width).

The main advantage to using a C1060 over the equivalent (and much cheaper) GeForce GTX285 card is that the C1060 has four times the RAM available. This allows more room for RAM-hungry, implicit, matrix-based calculations common to simulations. Explicit code, however, which computes on-the-fly, requires less RAM, and so you could do well on a budget with 3 or 4 GTX285s in your workstation (about the cost of a single C1060), if you can fit your code into a smaller memory footprint.

For rack-based compute clusters, nVidia have the 1U Tesla S1070 box (see picture at article head), which is four C1060s mounted horizontally on two central PCI-Express buses. These buses connect to your main server's PCI-Express slots using special cabling. The cards are powered by a power supply within the 1U box. Having four times the power of a C1060, the figures are impressive - 960-cores, 16GB GDDR3 RAM, 410GB/s bandwidth, 4TFLOPS computing power. Pretty beefy, and all within just 1U.

Both Tesla solutions, plus the lower cost GTX285s are now available on many of our workstation and server lines. Check out the new GPU Computing section, for instance, on our two biggest selling items - the CADIZ Dual Opteron workstation and VEGA Dual Xeon rack server. More importantly, perhaps, is that we have especially specced out our MESSINA Workstation with a 4-way PCIE 2.0 slot board, that can enable you to operate up to FOUR GPGPUs in a single box!

Summary

As it stands currently, one has to either develop one's own code to run on the massively-parallel GPGPUs in nVidia's graphics cores, or pick through many of the open-source libraries appearing in the fledgling CUDA community. Over time, however, I expect that more and more software vendors will package functionality in off-the-shelf software that will unlock the power of GPGPUs. Adobe, for example, already has the CUDA-enabled plug-in for Creative Suite 4, and the popular math suite MATLAB has a software development kit (SDK) for CUDA.

GPU computing is one of those rare technological advances that does not come with a hefty price tag. So whilst most GPU computing is still currently in the hands of specialist programmers, rapid adoption of this new technology is likely, as long as it is disseminated broadly. I don't think it'll be too long, therefore, before most workstation and high-performance computing users will be referring to the graphics chip as often as their main processor when it comes to "getting work done".

With the advent of computing power an order of magnitude over what was previously available, the door opens to many more scientists, engineers, video professionals and statisticians being able to do things with numbers that were previously impossible on their budgets. More creative, realistic and gorgeous looking video effects could entertain us at the movies or on TV. And as simulation plays a pivotal role in scientific advance today, a quantum leap in computing power leads to more complex and more realistic simulations of physical phenomena. These in turn could directly lead to new discoveries, new understandings and new inventions that could change our lives, and quite possibly, the world.

Best regards,


Ben Ranson
Chief Systems Engineer
Electronics Nexus
http://elnexus.com
ben@elnexus.com
1-877-773-5366