How to Quantify Oracle Database Scalability Examples Dr. Neil J. Gunther Perfdynamics
Transcription
How to Quantify Oracle Database Scalability Examples Dr. Neil J. Gunther Perfdynamics
How to Quantify Oracle Database Scalability Examples Dr. Neil J. Gunther Perfdynamics Peter Stalder [email protected] Hotsos Symposium M h 7 – 11, March 11 2010 Basel · Baden · Bern · Lausanne · Zurich · Düsseldorf · Frankfurt/M. · Freiburg i. Br. · Hamburg · Munich · Stuttgart · Vienna About me Senior Consultant at Trivadis AG in Zurich,, Switzerland à ZH-IMS (Infrastructure Managed Services) à [email protected] Focus à Application Performance Management (APM) à Predictive Performance Management (PPM) à Capacity Management Recent presentations à DOAG, Dec 2009 – Server Consolidation using analytical modeling à DOAG ITIL Days, Sept. 2009 – Ressourcen und Kapazitätsanalysen im Oracle-Umfeld à ukCMG, kCMG May M 2009 – Oracle O l M Metrics i ffor Si Sizing i Universal Law of Computational Scaling 2 © 2010 How to Quantify Oracle Database Scalability Why quantifying? How quantifying? Response Time Scalability Conclusion Data are always part of the game. Universal Law of Computational Scaling 3 © 2010 Swingbench example from a blog Results are taken from à http://oracledoug.com/.../1470-Time-Matters-Throughput-vs.-Response-Time.html 1 Avg. g Response Time (ms) 79 2 108 6,772 , 4 133 10,481 8 198 13,346 12 244 13,639 16 310 14,798 20 337 14,749 24 369 14,176 28 428 15 181 15,181 32 563 13,278 36 533 14,151 40 0 587 58 13,302 3,30 Concurrentt C Sessions Universal Law of Computational Scaling 4 Transactions T ti Completed 4,203 © 2010 Blog comments, comments opinions and discussions (1) “The results for the throughput g p of the system y are not consistent” “Typically throughput will start to drop when we are approaching the limit of a resource resource, disk, disk CPU, CPU software etc. etc This is just simple queuing theory” “Once Once the individual response time issue is improved improved, the overall throughput also improves.” “The retrograde behavior of your throughput swingbench tests is fairly typical. It comes from coherency issues as opposed to contention contention” “Do I care more about throughput or response times?” Universal Law of Computational Scaling 5 © 2010 Efficiency - consistent and valid The efficiencyy does not exceed 100% à a first confidence of the measurements Measured Users (N) 1 2 4 8 12 16 20 24 28 32 36 40 Universal Law of Computational Scaling 6 Trx / Sec X(N) 4'203 6'772 10'481 13'346 13'639 14'798 14 798 14'749 14'176 15'181 13'278 14'151 14 151 13'302 RelCap Efficiency C=X(N)/X(1) C/N 1.00 1.00 1.61 0.81 2.49 0.62 3.18 0.40 3.25 0.27 3.52 0.22 3.51 0.18 3.37 0.14 3.61 0.13 3.16 0.10 3 37 3.37 0 09 0.09 3.16 0.08 © 2010 Error Bars - consistent and valid The error bars show the error spread p in the data à The spread gives us confidence in the measurements Universal Law of Computational Scaling 7 © 2010 Deviation - consistent and valid A deviation from < 10% is fair à This gives us further confidence in the measurements Measured Trx / Sec Users (N) X(N) 1 4203 2 6772 4 10481 8 13346 12 13639 16 14798 20 14749 24 14176 28 15181 32 13278 36 14151 40 13302 Universal Law of Computational Scaling 8 Capacity Modeled 4203 6961 10270 13168 14221 14566 14585 14437 14200 13916 13608 13290 © 2010 Error % 0.00 2 80 2.80 -2.01 -1.33 4.27 -1.56 -1.11 1.84 -6.46 4.81 -3.83 -0.08 Blog comments, comments opinions and discussions (2) “The results for the throughput g p of the system y are not consistent” “Typically throughput will start to drop when we are approaching the limit of a resource resource, disk, disk CPU, CPU software etc. etc This is just simple queuing theory.” “Once Once the individual response time issue is improved improved, the overall throughput also improves.” “The retrograde behavior of your throughput swingbench tests is fairly typical. It comes from coherency issues as opposed to contention contention” “Do I care more about throughput or response times?” Universal Law of Computational Scaling 9 © 2010 Ideal Load Test Case Both t/put data X and latency data R are nonlinear functions of load N S t ti Saturation Throughput (X) data Load N is the independent variable, representing the number of sessions Latency (R) data Oracle data should look like this N R( N ) = −Z X (N ) Queueing really kicks in Near origin, X and R appear to be independent. But this is an illusion due to no queueing (waiting). Graph provided by Neil Gunther. Thanks! Universal Law of Computational Scaling 10 © 2010 Real data from the blog Throughput g p and response p time data from load limited systems y has to have this characteristic à Throughput is limited and represented by a concave curve à Response Time is unlimited and represented by a convex curve Universal Law of Computational Scaling 11 © 2010 Relationship between X and R Therefore,, reducing g R means increasing g X by y constant N à Tuning an individual session results in a higher throughput à A higher throughput can be archived by tuning individual sessions parts of the application, e.g. interest engine Universal Law of Computational Scaling 12 © 2010 Blog comments, comments opinions and discussions (3) “The results for the throughput g p of the system y are not consistent” “Typically throughput will start to drop when we are approaching the limit of a resource resource, disk, disk CPU, CPU software etc. etc This is just simple queuing theory” “Once Once the individual response time issue is improved improved, the overall throughput also improves.” “The retrograde behavior of your throughput swingbench tests is fairly typical. It comes from coherency issues as opposed to contention contention” “Do I care more about throughput or response times?” Universal Law of Computational Scaling 13 © 2010 Really fairly typical? Contention α: 20% Trendline Parameters Super Quadratic Coefficients Parameter à Very high à starving on CPU in this case? à Not responsible for a retrograde t/put Universal Law of Computational Scaling a b c 14 2.40E-03 0.2051 0.0000 α Serial Values 0.2027 0.0118 20 5 β Nmax Nopt p © 2010 Yes may fairly typical Yes, typical, but .. Coherencyy β β: 0.0118 Trendline Parameters Super Quadratic Coefficients Parameter à Fair in this case à Responsible for the retrograde t/put Universal Law of Computational Scaling a b c 15 2.40E-03 0.2051 0.0000 α Serial Values 0.2027 0.0118 20 5 β Nmax Nopt p © 2010 Blog comments, comments opinions and discussions (4) “The results for the throughput g p of the system y are not consistent” “Typically throughput will start to drop when we are approaching the limit of a resource resource, disk, disk CPU, CPU software etc. etc This is just simple queuing theory” “Once Once the individual response time issue is improved improved, the overall throughput also improves.” “The retrograde behavior of your throughput swingbench tests is fairly typical. It comes from coherency issues as opposed to contention contention” “Do I care more about throughput or response times?” Universal Law of Computational Scaling 16 © 2010 This is the real core of the Blog Trendline Parameters Super Quadratic Coefficients Parameter a b c Universal Law of Computational Scaling α 2.40E-03 0.2051 0.0000 β Nmax Nopt 17 Serial Values 0.2027 0.0118 20 5 © 2010 Recap The USL tells us à à à à à If the b/m is consistent and valid If the workload is contention-limited and / or coherency-limited The theoretical maximum throughput The optimal number of sessions The maximal number of sessions The USL quantification reduces qualitative discussion We are forced to explain the shape of the curves We are forced to explain the size of α and β Equation: USL = BAAG (Battle against any guess) Universal Law of Computational Scaling 18 © 2010 How to Quantify Oracle Database Scalability Why quantifying? How quantifying? Response Time Scalability Conclusion Data are always part of the game. Universal Law of Computational Scaling 19 © 2010 Case study – another Swingbench experiment Transactions à à à à à Customer Registration Browse Products Order Products Process Orders Browse Orders Load Ratio 10 Load Ratio 50 Load Ratio 10 Load Ratio 10 Load Ratio 50 Think Time: 3ms UserLoad 1 2 3 6 9 12 15 24 36 TPS 132.61 239.31 358.18 602 09 602.09 732.31 779.26 807.16 803.63 815 00 815.00 Universal Law of Computational Scaling RTT 0.0070 0.0079 0.0080 0 0099 0.0099 0.0120 0.0154 0.0180 0.0290 0 0440 0.0440 Think 0.0030 0.0030 0.0030 0 0030 0.0030 0.0030 0.0030 0.0030 0.0030 0 0030 0.0030 20 © 2010 Interactive Response p Time Law N R( N ) = −Z X (N ) Z 1 … N DB R Z is included in Swingbench‘s R (it‘s the round trip time or RTT) Universal Law of Computational Scaling 21 © 2010 Response p Time Measurements – User Load 1 Users (N) 1 2 3 6 9 12 15 24 36 Measured Measured RTT RTT - Z 0.0070 0.0040 0.0079 0.0049 0.0080 0.0050 0 0099 0.0099 0 0069 0.0069 0.0120 0.0090 0.0154 0.0124 0.0180 0.0150 0.0290 0.0260 0.0440 0.0410 It‘s not the R in the DB, it‘s the RTT! Universal Law of Computational Scaling 22 © 2010 Response p Time Measurements – User Load 9 Users (N) 1 2 3 6 9 12 15 24 36 Measured Measured RTT RTT - Z 0.0070 0.0040 0.0079 0.0049 0.0080 0.0050 0 0099 0.0099 0 0069 0.0069 0.0120 0.0090 0.0154 0.0124 0.0180 0.0150 0.0290 0.0260 0.0440 0.0410 We doubled the R in the DB Universal Law of Computational Scaling 23 © 2010 Response p Time Measurements – User Load 36 Users (N) 1 2 3 6 9 12 15 24 36 Universal Law of Computational Scaling 24 Measured Measured RTT RTT - Z 0.0070 0.0040 0.0079 0.0049 0.0080 0.0050 0 0099 0.0099 0 0069 0.0069 0.0120 0.0090 0.0154 0.0124 0.0180 0.0150 0.0290 0 0 90 0.0260 0 0 60 0.0440 0.0410 © 2010 Apply pp y Interactive Response p Time Law N R( N ) = −Z X (N ) Users (N) 2 3 6 9 12 15 24 36 Predicted Predicted Measured C(N) Capacity Capacity 1 00 1.00 132 61 132.61 133 1.84 243.47 239 2.54 336.64 358 4.06 538.39 602 4.99 662.30 732 5 57 5.57 738 64 738.64 779 5.92 784.72 807 6.24 827.72 804 6.01 797.55 815 Universal Law of Computational Scaling Error Measured Predicted Calculated Error % R R (USL) from t/p % 0 00 0.00 0 0040 0.0040 0 0045 =L50/O50-0 0.0045 =L50/O50-0.003 003 1.74 0.0049 0.0052 0.0054 -5.30 -6.01 0.0050 0.0059 0.0054 -14.78 -10.58 0.0069 0.0081 0.0070 -15.88 -9.56 0.0090 0.0106 0.0093 -14.72 -5.21 5 21 0 0124 0.0124 0 0132 0.0132 0 0124 0.0124 -6.52 6 52 -2.78 0.0150 0.0161 0.0156 -6.92 3.00 0.0260 0.0260 0.0269 0.02 -2.14 0.0410 0.0421 0.0412 -2.70 25 © 2010 Measured Response p Time Universal Law of Computational Scaling 26 © 2010 Calculated Response p Time from Throughput g p Universal Law of Computational Scaling 27 © 2010 Predicted Response p Time byy USL Universal Law of Computational Scaling 28 © 2010 Big gp picture: ctu e Throughput oug put a and d Response espo se Time e Sca Scalability ab ty Trendline Quadratic a b c Parameters Coefficients 0.0016 0.0878 0.0000 Universal Law of Computational Scaling Super Parameter α β Nmax Nopt p Serial Values 0.0862 0.0181 25 12 29 © 2010 Recap Response Time Scalability The Response p Time can be calculated by y using g the interactive Response Time Law à Derived from Little‘s Law R is given by N R( N ) = −Z X (N ) If the Response Time is measured, it can be used to validate the model If the Response Time is not measured, we still can rely on the math Universal Law of Computational Scaling 30 © 2010 How to Quantify Oracle Database Scalability Why quantifying? How quantifying? Response Time Scalability Conclusion Data are always part of the game. Universal Law of Computational Scaling 31 © 2010 Conclusion (1) Neil Gunther’s model adds a new p parameter to the more familiar Amdahl’s law The additional parameter β, β representing coherence coherence-related related delays, enables Gunther’s formula to model behavior where the performance of a parallel program can actually degrade at higher and d hi higher h llevels l off parallelization ll li ti à If β = 0, the USL reduces to Amdahl Universal Law of Computational Scaling 32 © 2010 Conclusion (2) Behind the data,, the hidden and useful information are visible only by the USL model à Theoretical maximum throughput à Number of economically sensible Users or CPUs or Nodes (in case of RAC) à Let us know, if the controlled measurements are consistent The law indicates whether scalability would be limited by contention and / or coherencyy effects Universal Law of Computational Scaling 33 © 2010 Conclusion (3) It is no complex p queueing q g theoryy needed Response Time can easily predicted from throughput The USL quantification reduces qualitative discussion We are forced to explain the shape of the curves We are forced to explain the size of α and β USL = BAAG Universal Law of Computational Scaling 34 © 2010 Resources Literature à Book: Guerrilla Capacity Planning (2007), Neil J. Gunther à Book: Analyzing Computer System Performance with Perl::PDQ (2005) Neil JJ. Gunther (2005), Online Google doc à http://spreadsheets google com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en http://spreadsheets.google.com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en Downloads à EXCEL sscalc.xls sscalc xls spreadsheet à www.perfdynamics.com/Classes/Materials/sscalc-class.xls Universal Law of Computational Scaling 35 © 2010 Thank you! Peter Stalder Peter stalder@trivadis com [email protected] Basel · Baden · Bern · Lausanne · Zurich · Düsseldorf · Frankfurt/M. · Freiburg i. Br. · Hamburg · Munich · Stuttgart · Vienna