Quad-Core Xeon and Opteron go Head to Head
Original Article Date: 2008-06-20
This month, I decided that now AMD Opteron quad-core ("Barcelona") CPUs are
shipping in volume, it was time to run a head-to-head comparison between them and Intel's
"Harpertown" equivalent.
It's a view amongst most IT industry commentators that when Intel released their
new architecture Xeons (Dempsey/Woodcrest) in 2006, they won back their crown as
the price/performance leaders in the server/workstation CPU market. Intel had slipped
behind AMD for several years after the release of the latter's Opteron CPU in 2003.
Now, the question is, with AMD's Barcelona finally shipping in volume,
do AMD have what it takes to unseat Chipzilla, just like they did five years ago?
Comparison of Features
To determine this, let's start with a comparison of features between Intel's current
2-way Xeon processor 5400 series (aka "Harpertown") and AMD's Barcelona.
|
Feature |
Intel Xeon "Harpertown" |
AMD Opteron "Barcelona" |
|
Clock Speed Range |
2.0-3.2GHz |
1.8-2.5GHz |
|
L1 Cache |
32KB |
64KB |
|
L2 Cache |
12MB (2x6MB) |
2MB (4x512MB) |
|
L3 Cache |
n/a |
2MB |
|
Max Supported Memory Clock |
667MHz (800MHz on 54X2 models) |
667MHz |
|
Process Technology |
45nm |
65nm |
|
Power Consumption (TDP) |
50, 80, 120 or 150W |
55, 95 or 120W |
|
Number of Cores |
4 |
4 |
|
Number of Simultaneous CPU Sockets |
1 or 2 (4 with X7XXX series) |
1 or 2 (4 or 8 with 83XX series) |
The first noticeable difference is that Barcelona has a limited clock speed range
at this time, maxing out at 2.5GHz, whereas Intel have clocked Harpertowns at 3.2GHz.
Whilst like-for-like comparisons of clock speed between Intel and AMD chips is always
somewhat dangerous (AMD traditionally deliver more instructions per clock cycle),
it is interesting that AMD are struggling to get up there when it comes to core
frequency. AMD have promised that higher clock speeds are on the way, and lets hope
that this happens sooner rather than later.
Another key difference is cache size and architecture. Intel have simply thrown
silicon at the problem, putting two giant 6MB L2 caches on their CPU die, which
actually take up more than 50% of the chip's real estate. These large cache size
are one key reason that accounts for Harpertown's increased performance over the previous stepping (Woodcrest),
which had only 2x4MB caches. As you may know, Harpertown (and Woodcrest before it),
are not a true quad-core, but two dual-core dies stuck together. Whilst AMD like
to decry this, citing their chip as being the
only "true" quad-core on a single die, no-one
using Harpertown or Woodcrest that I know is complaining about this, as it does
not appear to impact performance.
Whereas Intel use a 6MB shared cache for each pair of cores, AMD use a single 512K
cache dedicated to each core. Whilst this produces an overall much smaller cache
of 2MB (compared with 12MB), the design is considered to be superior, and as we'll
see later, the Barcelona is more than a match for Harpertown on key performance
benchmarks. Barcelona also has an L3 shared cache of 2MB, but the jury is out amongst
commentators on how significant an impact this extra block of chip memory has.
Intel have successfully made the move to a smaller 45nm process with the introduction
of Harpertown, whereas AMD are
still on 65nm. For Intel, a smaller chip process has meant more transistors per
square millimeter, resulting in improvements in performance over 65nm, reduced power
consumption, and a consequent higher clock speed boundary. AMD plan to move to 45nm sometime
in the next 12-18 months, but still being at 65nm definitely leaves them with an
extra hill to climb to match Intel's performance.
Lastly, and this is AMD's trump card, is Barcelona's ability to operate as a set
of 4 or 8 processors (with the 83XX series), whereas Harpertown is restricted to
2-way operation. Intel do have the X7000 series processors that can run in 4-way
operation, but these are comparatively expensive, and performance does not justify
their use in my opinion, which is backed up in the following benchmarks.
Comparison of Pricing and Product Range
The following table shows the spread of pricing and clock speeds for the two rival
CPU series.
|
Approx Retail Price |
Intel Xeon |
AMD Opteron |
|
$300 |
E5405 (2.0GHz) |
2346HE (1.8GHz), 2350 (2.0GHz) |
|
$400 |
E5410 (2.33GHz), E5420 (2.5GHz) |
2352 (2.1GHz) |
|
$500 |
|
2347HE (1.9GHz) |
|
$600 |
E5430 (2.66GHz) |
2354 (2.2GHz) |
|
$700 |
|
|
|
$800 |
|
2356 (2.3GHz) |
|
$900 |
E5440 (2.83GHz) |
|
|
$1,000 |
E5462 (2.8GHz, 1600FSB) |
2358SE (2.4GHz) |
|
$1,100 |
|
|
|
$1,200 |
E5450 (3.0GHz) |
|
|
$1,300 |
E5472 (3.0GHz, 1600FSB) |
|
|
$1,400 |
|
2360SE (2.5GHz) |
|
$1,500 |
X5460 (3.16GHz) |
|
|
$1,600 |
X5482 (3.2GHz, 1600FSB) |
|
My main comment on this table goes back to my earlier mention of AMD lacking a good
spread of clock speed figures. And yet, despite this, there is
a huge price jump
when going up just 300MHz from 2.2GHz ($600) to 2.5GHz ($1,400). I'd be very surprised to learn whether anyone would want to spend an extra 230% just to get a 15% boost
in core CPU performance. The corresponding spread on Intel's side seems more justified,
and indeed we have had quite a number of sales already on the X5482 for those who
absolutely must have the fastest!
Early Benchmark Comparisons
Compared to their desktop counterparts, benchmarks for workstation and server CPUs on popular websites are much harder to
come by. Thankfully, however, we do have SPEC, an industry
led, but manufacturer-neutral organistion that provides performance evaluation of
computer hardware through system vendor sponsorship. In my March article, I presented
performance and price-performance data on the recent evolution of Intel
CPUs. In this article, I've kept in a number of the current Harpertown benchmarks
so as to provide a direct comparison between those and AMD's Barcelona.
Even SPEC was thin on the ground in terms of benchmark data for the Opteron, however.
Only the 2356 and 8356 had extensive testing done on them. The reason for this is
that vendors have only been able to get a hold of them in the last month or so.
And it's important to note that any benchmarks carried out with the "B2" revision were withdrawn, because of the flaw in that chip, which has been covered extensively
elsewhere in the IT press. These benchmarks are from the "B3" chip, which has the
flaw corrected.
There are four main benchmarks that we're interested in:
- single-threaded integer
- single-threaded floating-point
- multi-threaded integer
- multi-threaded floating point.
Integer calculations are important in typical server
applications such as mail, web and database servers, and are also important in 2D
video and image rendering. Floating point, on the other hand, dominates
real world modelling, which includes any 3D visual effects rendering,
engineering design and simulation and scientific modelling. So when looking at these
benchmarks, consider what your main
application is!
The single threaded benchmarks show the raw computing power of
a single core. The multi-threaded benchmarks (or "Rates" benchmark)
assess the overall performance of the CPU and memory system when solving a multi-threaded
problem. This latter benchmark is usually the best guide to overall system performance
in a server/workstation environment.
A final note on these charts relates to the estimate benchmarks I added in for the
Opteron 2350. This was estimated by taking the 2356 benchmarks and reducing them
by a factor of 2.0/2.3 (i.e. 87%), which is on the assumption that a reduction in
clock speed will result in a linear reduction in performance. This might not be
the case, but will be close to the mark, and becomes important in assessing the
price-performance options of Opteron (see later).
 Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation |  Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation |
The single threaded graph shows that Intel do better than AMD on raw processor core
performance.
The more relevant (to server/workstation apps) multi-threaded "Rates" benchmark
confirms that AMD are still king of floating point, whilst
Intel beat AMD on integer performance. Particularly, note that the
Opteron 2356 beats all Intel 2-way CPUs on floating point, whilst even
the low end Intel E5410 is comparable to the top end AMD on integer.
Although not strictly within the bounds of this discussion, which focuses on comparing
2-way Opteron with 2-way Xeon, I decided to include the 4-way Opteron 8356
and compare it with Intel's X7350 CPU. The benchmarks show that,
as I expected, the Opteron just runs away from the Xeon on floating point, and even
slightly outperforms on integer also. This is mainly due to the superior inter-CPU
(Hyper-Transport) and Direct Memory Access (on-chip RAM controller) architecture,
which was designed from the bottom up to work together with 4 or 8 CPUs. Intel's
4-way solution, in my option, is a bit of a kludge (which is why we don't sell it),
and these figures indicate that it's not a strong performer, especially when taking
into account the very expensive CPU pricing.
Price vs Performance
Just like in March's article, I've incorporated the pricing of each CPU into the
benchmarks to show a comparison of price vs performance. The two charts below have
been "normalised" at the Opteron 8356, and reflect how well a CPU performs for every
dollar spent on it, compared with the 8356. A longer line indicates better price/performance.
 Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation |  Image (c)2008 Electronics Nexus, Inc.
Source data (c)2003-2008 courtesy of Standard Performance Evaluation Corporation |
The winners in the assessment of "best bang for the buck" are the two low end CPUs
from both AMD and Intel (the Opteron 2350 and the Intel E5410). [Note that the 2350
is an estimated benchmark, pro-rated from that of the 2356.] In terms of
multi-threaded floating-point price-performance, in fact, the 2350 runs away with
it. So, at least at this clock frequency, AMD are offering a good deal for those
working on 3D rendering or engineering/scientific simulation, and compare equally
with Intel on integer price-performance.
For raw performance, however, you will need to consider spending a bit more and
getting the Intel X5482 for maximum integer performance, or the AMD 2356 for maximum
floating-point performance.
Early Conclusions
It must be stressed that SPEC floating point, integer, single and multi-threaded
benchmarks
are only one measure of performance, and that real-world database and web serving applications may lead to differing results. And even the SPEC benchmarks with Barcelona
still only amount to a single CPU model (the 2356), but then one can relatively
safely extrapolate, that the 2350-2354 chips will benchmark similarly, but reduced
in a pro-rated fashion per their lower clock speeds.
Because AMD's Barcelona is exceeding Intel's Harpertown on floating point, even
when CPU price is taken into consideration, AMD represent a better deal for anyone
that is needing an engineering/scientific modelling or 3D special
effects workstation or render server. This is because these applications
are very heavily dependent upon CPU floating point performance.
Alternatively, if your workstation is focused on HD video editing or Photoshop,
you're working mainly in integer space where Intel continue to
shine.
All web, email, volume and database servers depend upon integer
performance, and so it's really a no-brainer to stick with Intel Xeon for
these applications, especially when considering the greater dependability and warranty
support of all-Intel server solutions.
If you have High-Performance Computing requirements, my advice
is to consider going with either an Intel Xeon 2-way based cluster, or a single
AMD 8-way box. Intel are representing better value, not to mention priceless reliability,
and the extra hassle
of setting up the cluster may well be worth it. That said, AMD are very strong on
floating-point performance (as they always have been). Combined with the convenience
of a single 4-way or 8-way package, which will also yield even stronger performance
than a cluster, the extra money needed for such an AMD solution may be well spent indeed, especially considering most HPC applications are floating-point
intensive.
My final two cents is that AMD need to ramp up their clock speeds,
and provide more reasonable pricing on the higher end as they do so. If
this doesn't happen soon, AMD may slip further behind in the server/workstation
space as their perceived value is currently weak.
In short, if you're a workstation user, chances are you're dependent upon AMD's
strong suit - the floating-point, so try your luck with the Opteron - I think you'll
be in for a pleasant surprise. If you're sticking with standard server applications,
however, there would be little
justification in moving away from dependable Chipzilla.
Best regards,
Ben Ranson
Chief Systems Engineer
Electronics Nexus
http://elnexus.com
ben@elnexus.com
1-877-773-5366