Quad-Core Xeon and Opteron go Head to Head

Original Article Date: 2008-06-20

This month, I decided that now AMD Opteron quad-core ("Barcelona") CPUs are shipping in volume, it was time to run a head-to-head comparison between them and Intel's "Harpertown" equivalent.

It's a view amongst most IT industry commentators that when Intel released their new architecture Xeons (Dempsey/Woodcrest) in 2006, they won back their crown as the price/performance leaders in the server/workstation CPU market. Intel had slipped behind AMD for several years after the release of the latter's Opteron CPU in 2003.

Now, the question is, with AMD's Barcelona finally shipping in volume, do AMD have what it takes to unseat Chipzilla, just like they did five years ago?

Comparison of Features

To determine this, let's start with a comparison of features between Intel's current 2-way Xeon processor 5400 series (aka "Harpertown") and AMD's Barcelona.

Feature Intel Xeon "Harpertown" AMD Opteron "Barcelona"
Clock Speed Range 2.0-3.2GHz 1.8-2.5GHz
L1 Cache 32KB 64KB
L2 Cache 12MB (2x6MB) 2MB (4x512MB)
L3 Cache n/a 2MB
Max Supported Memory Clock 667MHz (800MHz on 54X2 models) 667MHz
Process Technology 45nm 65nm
Power Consumption (TDP) 50, 80, 120 or 150W 55, 95 or 120W
Number of Cores 4 4
Number of Simultaneous CPU Sockets 1 or 2 (4 with X7XXX series) 1 or 2 (4 or 8 with 83XX series)

The first noticeable difference is that Barcelona has a limited clock speed range at this time, maxing out at 2.5GHz, whereas Intel have clocked Harpertowns at 3.2GHz. Whilst like-for-like comparisons of clock speed between Intel and AMD chips is always somewhat dangerous (AMD traditionally deliver more instructions per clock cycle), it is interesting that AMD are struggling to get up there when it comes to core frequency. AMD have promised that higher clock speeds are on the way, and lets hope that this happens sooner rather than later.

Another key difference is cache size and architecture. Intel have simply thrown silicon at the problem, putting two giant 6MB L2 caches on their CPU die, which actually take up more than 50% of the chip's real estate. These large cache size are one key reason that accounts for Harpertown's increased performance over the previous stepping (Woodcrest), which had only 2x4MB caches. As you may know, Harpertown (and Woodcrest before it), are not a true quad-core, but two dual-core dies stuck together. Whilst AMD like to decry this, citing their chip as being the only "true" quad-core on a single die, no-one using Harpertown or Woodcrest that I know is complaining about this, as it does not appear to impact performance.

Whereas Intel use a 6MB shared cache for each pair of cores, AMD use a single 512K cache dedicated to each core. Whilst this produces an overall much smaller cache of 2MB (compared with 12MB), the design is considered to be superior, and as we'll see later, the Barcelona is more than a match for Harpertown on key performance benchmarks. Barcelona also has an L3 shared cache of 2MB, but the jury is out amongst commentators on how significant an impact this extra block of chip memory has.

Intel have successfully made the move to a smaller 45nm process with the introduction of Harpertown, whereas AMD are still on 65nm. For Intel, a smaller chip process has meant more transistors per square millimeter, resulting in improvements in performance over 65nm, reduced power consumption, and a consequent higher clock speed boundary. AMD plan to move to 45nm sometime in the next 12-18 months, but still being at 65nm definitely leaves them with an extra hill to climb to match Intel's performance.

Lastly, and this is AMD's trump card, is Barcelona's ability to operate as a set of 4 or 8 processors (with the 83XX series), whereas Harpertown is restricted to 2-way operation. Intel do have the X7000 series processors that can run in 4-way operation, but these are comparatively expensive, and performance does not justify their use in my opinion, which is backed up in the following benchmarks.

Comparison of Pricing and Product Range

The following table shows the spread of pricing and clock speeds for the two rival CPU series.

Approx Retail Price Intel Xeon AMD Opteron
$300 E5405 (2.0GHz) 2346HE (1.8GHz), 2350 (2.0GHz)
$400 E5410 (2.33GHz), E5420 (2.5GHz) 2352 (2.1GHz)
$500 2347HE (1.9GHz)
$600 E5430 (2.66GHz) 2354 (2.2GHz)
$700
$800 2356 (2.3GHz)
$900 E5440 (2.83GHz)
$1,000 E5462 (2.8GHz, 1600FSB) 2358SE (2.4GHz)
$1,100
$1,200 E5450 (3.0GHz)
$1,300 E5472 (3.0GHz, 1600FSB)
$1,400 2360SE (2.5GHz)
$1,500 X5460 (3.16GHz)
$1,600 X5482 (3.2GHz, 1600FSB)

My main comment on this table goes back to my earlier mention of AMD lacking a good spread of clock speed figures. And yet, despite this, there is a huge price jump when going up just 300MHz from 2.2GHz ($600) to 2.5GHz ($1,400). I'd be very surprised to learn whether anyone would want to spend an extra 230% just to get a 15% boost in core CPU performance. The corresponding spread on Intel's side seems more justified, and indeed we have had quite a number of sales already on the X5482 for those who absolutely must have the fastest!

Early Benchmark Comparisons

Compared to their desktop counterparts, benchmarks for workstation and server CPUs on popular websites are much harder to come by. Thankfully, however, we do have SPEC, an industry led, but manufacturer-neutral organistion that provides performance evaluation of computer hardware through system vendor sponsorship. In my March article, I presented performance and price-performance data on the recent evolution of Intel CPUs. In this article, I've kept in a number of the current Harpertown benchmarks so as to provide a direct comparison between those and AMD's Barcelona.

Even SPEC was thin on the ground in terms of benchmark data for the Opteron, however. Only the 2356 and 8356 had extensive testing done on them. The reason for this is that vendors have only been able to get a hold of them in the last month or so. And it's important to note that any benchmarks carried out with the "B2" revision were withdrawn, because of the flaw in that chip, which has been covered extensively elsewhere in the IT press. These benchmarks are from the "B3" chip, which has the flaw corrected.

There are four main benchmarks that we're interested in:

  • single-threaded integer
  • single-threaded floating-point
  • multi-threaded integer
  • multi-threaded floating point.

Integer calculations are important in typical server applications such as mail, web and database servers, and are also important in 2D video and image rendering. Floating point, on the other hand, dominates real world modelling, which includes any 3D visual effects rendering, engineering design and simulation and scientific modelling. So when looking at these benchmarks, consider what your main application is!

The single threaded benchmarks show the raw computing power of a single core. The multi-threaded benchmarks (or "Rates" benchmark) assess the overall performance of the CPU and memory system when solving a multi-threaded problem. This latter benchmark is usually the best guide to overall system performance in a server/workstation environment.

A final note on these charts relates to the estimate benchmarks I added in for the Opteron 2350. This was estimated by taking the 2356 benchmarks and reducing them by a factor of 2.0/2.3 (i.e. 87%), which is on the assumption that a reduction in clock speed will result in a linear reduction in performance. This might not be the case, but will be close to the mark, and becomes important in assessing the price-performance options of Opteron (see later).


Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation

Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation

The single threaded graph shows that Intel do better than AMD on raw processor core performance.

The more relevant (to server/workstation apps) multi-threaded "Rates" benchmark confirms that AMD are still king of floating point, whilst Intel beat AMD on integer performance. Particularly, note that the Opteron 2356 beats all Intel 2-way CPUs on floating point, whilst even the low end Intel E5410 is comparable to the top end AMD on integer.

Although not strictly within the bounds of this discussion, which focuses on comparing 2-way Opteron with 2-way Xeon, I decided to include the 4-way Opteron 8356 and compare it with Intel's X7350 CPU. The benchmarks show that, as I expected, the Opteron just runs away from the Xeon on floating point, and even slightly outperforms on integer also. This is mainly due to the superior inter-CPU (Hyper-Transport) and Direct Memory Access (on-chip RAM controller) architecture, which was designed from the bottom up to work together with 4 or 8 CPUs. Intel's 4-way solution, in my option, is a bit of a kludge (which is why we don't sell it), and these figures indicate that it's not a strong performer, especially when taking into account the very expensive CPU pricing.

Price vs Performance

Just like in March's article, I've incorporated the pricing of each CPU into the benchmarks to show a comparison of price vs performance. The two charts below have been "normalised" at the Opteron 8356, and reflect how well a CPU performs for every dollar spent on it, compared with the 8356. A longer line indicates better price/performance.


Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation

Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation

The winners in the assessment of "best bang for the buck" are the two low end CPUs from both AMD and Intel (the Opteron 2350 and the Intel E5410). [Note that the 2350 is an estimated benchmark, pro-rated from that of the 2356.] In terms of multi-threaded floating-point price-performance, in fact, the 2350 runs away with it. So, at least at this clock frequency, AMD are offering a good deal for those working on 3D rendering or engineering/scientific simulation, and compare equally with Intel on integer price-performance.

For raw performance, however, you will need to consider spending a bit more and getting the Intel X5482 for maximum integer performance, or the AMD 2356 for maximum floating-point performance.

Early Conclusions

It must be stressed that SPEC floating point, integer, single and multi-threaded benchmarks are only one measure of performance, and that real-world database and web serving applications may lead to differing results. And even the SPEC benchmarks with Barcelona still only amount to a single CPU model (the 2356), but then one can relatively safely extrapolate, that the 2350-2354 chips will benchmark similarly, but reduced in a pro-rated fashion per their lower clock speeds.

Because AMD's Barcelona is exceeding Intel's Harpertown on floating point, even when CPU price is taken into consideration, AMD represent a better deal for anyone that is needing an engineering/scientific modelling or 3D special effects workstation or render server. This is because these applications are very heavily dependent upon CPU floating point performance. Alternatively, if your workstation is focused on HD video editing or Photoshop, you're working mainly in integer space where Intel continue to shine.

All web, email, volume and database servers depend upon integer performance, and so it's really a no-brainer to stick with Intel Xeon for these applications, especially when considering the greater dependability and warranty support of all-Intel server solutions.

If you have High-Performance Computing requirements, my advice is to consider going with either an Intel Xeon 2-way based cluster, or a single AMD 8-way box. Intel are representing better value, not to mention priceless reliability, and the extra hassle of setting up the cluster may well be worth it. That said, AMD are very strong on floating-point performance (as they always have been). Combined with the convenience of a single 4-way or 8-way package, which will also yield even stronger performance than a cluster, the extra money needed for such an AMD solution may be well spent indeed, especially considering most HPC applications are floating-point intensive.

My final two cents is that AMD need to ramp up their clock speeds, and provide more reasonable pricing on the higher end as they do so. If this doesn't happen soon, AMD may slip further behind in the server/workstation space as their perceived value is currently weak.

In short, if you're a workstation user, chances are you're dependent upon AMD's strong suit - the floating-point, so try your luck with the Opteron - I think you'll be in for a pleasant surprise. If you're sticking with standard server applications, however, there would be little justification in moving away from dependable Chipzilla.

Best regards,


Ben Ranson
Chief Systems Engineer
Electronics Nexus
http://elnexus.com
ben@elnexus.com
1-877-773-5366