Best Practices

Transcription

Best Practices
KINGSTON.COM
Best Practices
Server: Performance Benchmark
Memory channels, frequency and performance
Although most people don’t realize it, the world runs on many different types of
databases, all of which have one thing in common, the need for high performance
memory to deliver data fast and reliably.
From the time we wake up to a phone call processed by our cellular service providers’
customer record database, to our weekly electronic shopping payment being
processed by the financial institutions transaction database and our late night movie
matinee streaming experience serving us a database of movie recommendations
based on our viewing habits; databases serve many of our daily queries and need to
perform consistently fast and scale dynamically to meet customer demand. [1]
Serving data with consistent performance and transaction integrity is no easy task and
often requires in-memory databases to serve viewing recommendations and relational
data near instantaneously to multiple users.
In-memory databases (IMDB) rely primarily on the use of high capacity and most
importantly high performance DRAM (Dynamic Random Access Memory). They can
service a high volume of requests up to x times faster than traditional disk-bound
databases and serve as the backbone in any scenario that requires fast response times
when querying useful data and can be used to complement big data applications.
DDR3 SDRAM (Double Data Rate Synchronous Dynamic Random Access Memory)
technology memory DIMMs (Dual In-line Memory Module) are available in different
capacities and speeds. The speed of a memory module is often referred to as memory
frequency and is denoted using MegaHertz (MHz).
Memory frequency does have a direct relationship with memory performance, thus as
the memory frequency increases, so does the memory performance.
DRAM is, however, only one piece of the pie for achieving optimal memory subsystem
performance. A memory controller is needed to manage the memory subsystem and
different population rules governing the memory controller will affect the frequency/
speed and latency a memory module can addressed at.
Newer generation memory controllers are embedded into the processors for best
performance but require attention as some memory controllers can only run the
memory subsystem at a maximum memory bandwidth of 800MHz.
Best Practices Server: Performance Benchmark
Using the Intel® Romley platforms’ available 24 DIMM (Dual Inline Memory Module)
sockets connected to the Intel® Xeon® E5 family memory subsystem, we can gauge the
sustained memory bandwidth in different memory configurations using SiSoft Sandra
2012 integrated STREAM memory benchmark with different memory channel population
and memory clock speeds. [2]
The Intel® Xeon® E5 family features numerous performance improvements over the
previous generation of Xeon® 5500 and Xeon® 5600 Server processors, including two
important performance related upgrades discussed in this paper, quad channel memory
addressing and support for 1600 MHz (MegaHertz) DDR3 (Double Data Rate) memory
speeds with faster 8 GT/s (GigaTransfers per second) QuickPath Interconnect (QPI) microarchitecture that benefits the connectivity bandwidth available for the reduced latency to
the memory array. [3]
Channel population performance
Figure 1 Channel population performance measured using SiSoft Sandra 2012
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with
two Xeon E5-2665 2.40GHz processors and 64GB of memory (2 x KVR16R11D4K4/32 @1600 MHz) installed. CPU Hyperthreading and power saving features disabled.
As seen in Figure 1, the performance of the memory subsystem increases near-linearly from
the slowest configuration, a single memory channel populated on either Xeon processor
memory controller by a single 8 Gigabytes (GB) DDR3 1600 MHz memory module; to
the fastest, using a quad channel (also known as 1 DIMM per channel (DPC)) populated
memory subsystem with four 8 GB 1600 MHz memory modules populating each memory
socket in the first available memory bank of either processor.
Best Practices Server: Performance Benchmark
Even with the increased electrical load of a quad channel configuration (1 DPC), a near fourfold increase in memory subsystem performance to ~70 GB/s compared to a single channel
configuration is observed, an ideal solution for applications requiring high performance for
resource intensive applications such as IMDB.
Memory frequency performance
Figure 2 Relative memory frequency performance measured using SiSoft Sandra 2012
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with two
Xeon E5-2665 2.40GHz processors and 192GB of memory (2 x KVR16R11D4K4/32) installed. CPU Hyper-threading and
power saving features disabled.
In Figure 2 we utilize the same eight 8 GB DDR3 memory modules running at four
different memory speeds (MHz) symmetrically across both Intel® Xeon® E5 family memory
subsystems to achieve a balanced configuration and showing the best case performance
at all memory speeds.
Running the memory modules at 800 MHz we see the slowest performance with ~40 GB/s
sustained transfer speeds measured using SiSoft Sandra 2012 integrated STREAM memory
benchmark.
As we scale the frequency higher we can see memory performance increase near-linearly
up to the maximum of ~70 GB/s when running at 1600 MHz, ideal for scenarios where
resources written to memory require the highest achievable performance to remain
efficient.
Best Practices Server: Performance Benchmark
Memory capacities versus frequency performance
Figure 3 Memory capacities versus frequency performance measured using SiSoft Sandra 2012
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with two
Xeon E5-2665 2.40GHz processors and 192GB of memory (KVR16R11D4K4/32) installed. CPU Hyper-threading and power
saving features disabled.
To conclude our research into memory performance, in Figure 3 we look at the performance
of a memory subsystem populated with 192GB of memory running at 1066 MHz versus a
configuration using 128 GB and 64 GB, both running at 1600 MHz.
Increased memory capacities running at the same 1600 MHz memory speeds using either
128 GB (16x 8GB) or 64 GB (8x 8GB) spread symmetrically across both memory subsystems
shows approximately the same ~70 GB/s sustained performance.
A larger, 192 GB memory capacity (24x 8GB), albeit running at a slower 1066 MHz, shows a
negligible ~17 GB/s drop in sustained performance as a trade-off for an increased memory
capacity.
Conclusion
Obeying the channel population rules specific to the server processor and memory
controller allows us to easily strike the right balance in optimizing our memory for best
performance using simple steps like populating all four memory channels, thus increasing
memory performance up four times, increasing the ROI (Return on investment) while
simultaneously reducing the TCO (Total cost of ownership) over the life-cycle of the server.
Best Practices Server: Performance Benchmark
References:
[1] Predicting User Preference for Movies using NetFlix database, Department of Electrical
and Computer Engineering Carnegie Mellon University
http://users.ece.cmu.edu/~dbatra/publications/assets/goel_batra_netflix.pdf
[2] SiSoft Sandra Q & A - Memory Benchmark, SiSoftware
http://www.sisoftware.co.uk/?d=qa&f=ben_mem&l=en&a=
[3] Intel® Xeon® Processor E5-2600 Product Family News Fact Sheet, Intel®
http://download.intel.com/newsroom/kits/xeon/e5/pdfs/Intel_Xeon_E5_Factsheet.pdf
©2013 Kingston Technology Corporation, 17600 Newhope Street, Fountain Valley, CA 92708 USA.
All rights reserved. All trademarks and registered trademarks are the property of their respective owners. MKF-549