How to Monitor Performance
Transcription
How to Monitor Performance
How to Monitor Performance How to Monitor Performance Overview / CQ / AEM 5.6 / How To / The following lists common performance issues which occur, together with proposals on how to spot and counteract them. Recognizing common performance problems Area Symptom(s) To increase capacity... To reduce volume... Client High client CPU usage. Install a client CPU with higher performance. Simplify (HTML) layout. Low server CPU usage. Upgrade to a faster browser. Improve client-side cache. CPU usage low on both servers and clients. Remove any network bottlenecks. Improve/optimize the configuration of the client cache. Browsing locally on the server is (comparatively) fast. Increase network bandwidth. Reduce the "weight" of your web pages (e.g. less images, optimized HTML). CPU usage on the webserver is high. Cluster your webservers. Reduce the hits per page (visit). Some clients fast, some slow. Server Network Web-server Use a hardware loadbalancer. Application Server CPU usage is high. Cluster your CQ5 instances. Search for, and eliminate, CPU and memory hogs (use code review, timing output, etc). High memory consumption. Improve caching on all levels. Low response times. Optimize templates and components (e.g. structure, logic). Repository Cache Performance issues may stem from a number of causes that have nothing to do with your website, including temporary slowdowns in connection speed, CPU load, and many more. It may also impact either all your visitors, or only a subset of them. All this information needs to be obtained, sorted and analyzed before you can either optimize the general performance or solve specific issues. • Before you experience a performance issue: © 2012 Adobe Systems Incorporated. All rights reserved. Page 1 Created on 2014-09-15 How to Monitor Performance • • collect as much information as possible to build up a good working knowledge of the system under normal circumstances When you experience a performance issue: • try to replicate it with one (or preferably more) standard web-browsers, on a different client that you know has good general performance and/or on the server itself (if possible) • check whether anything (related to the system) has changed within an appropriate time-space, and if any of these changes could have impacted the performance • ask questions such as: • does the issue only occur at specific times? • does the issue only occur on specific pages? • are other requests impacted? • collect as much information as possible to compare with your knowledge of the system under normal circumstances: TOOLS FOR MONITORING AND ANALYZING PERFORMANCE The following gives a short overview of some of the tools available for monitoring and analyzing performance. Some of these will be dependent on your operating system. Tool Used to analyze... Usage / More information... request.log Response times and concurrency. Interpreting the request.log. truss/strace Page Loads Unix/Linux commands to trace system calls and signals. Increase the log level to INFO. Analyze the number of page loads per request, which pages, etc. Thread dumps Observe JVM threads. Identify contentions, locks and longrunners. Dependent on the operating system: - Unix/Linux: kill -QUIT <pid> - Windows (console mode): CtrlBreak Analysis tools are also available, such as TDA. Heap Dumps Out of Memory issues that cause slow performance. Add the: -XX: +HeapDumpOnOutOfMemoryError option to the java call to CQ. See the Troubleshooting Guide for Java SE 6 with HotSpot VM. System calls Identify timing issues. Calls to System.currentTimeMillis() or com.day.util.Timing are used to generate timestamps from your code, or via HTML-comments. Note: These should be implemented so that they can be activated / deactivated as required; when a system is running smoothly the overhead © 2012 Adobe Systems Incorporated. All rights reserved. Page 2 Created on 2014-09-15 How to Monitor Performance of collecting statistics will not be needed. Apache Bench Identify memory leaks, selectively analyze response time. Search Analysis basic usage is: ab -k -n <requests> -c <concurrency> <url> See Apache Bench and the ab man page for full details. Execute search queries offline, identify response time of query, test and confirm result set. JMeter Load and functional tests. http://jakarta.apache.org/jmeter/ JProfiler In-depth CPU and memory profiling. http://www.ej-technologies.com/ JConsole Observe JVM metrics and threads. Usage: jconsole See jconsole and Monitoring Performance using JConsole. Note: With JDK 1.6, JConsole is extensible with plug-ins; for example, Top or TDA (Thread Dump Analyzer). Java VisualVM Observe JVM metrics, threads, memory and profiling. Usage: jvisualvm or visualvm See jvisualvm, visualvm and Monitoring Performance using (J)VisualVM. Note: With JDK 1.6, VisualVM is extensible with plug-ins. truss/strace, lsof In depth kernel call and process analysis (Unix). Unix/Linux commands. Timing Statistics See timing statistics for page rendering. To see timing statistics for page rendering you can use Ctrl-Shift-U together with ? debugClientLibs=true set in the URL. CPU and memory profiling tool Used when analyzing slow requests during development. For example, YourKit. Information Collection The ongoing state of your installation. Knowing as much as possible about your installation can also help you track down what might have caused a change in performance, and whether these changes are justified. These metrics need to be collected at regular intervals so you can easily see significant changes. © 2012 Adobe Systems Incorporated. All rights reserved. Page 3 Created on 2014-09-15 How to Monitor Performance INTERPRETING THE REQUEST.LOG This file registers basic information about every request made to CQ. From this valuable conclusions can be extracted. The request.log offers a built-in way to get a look at how long requests take. For development purposes it is useful to tail -f the request.log and watch for slow response times. To analyze a bigger request.log we recommend the use of rlog.jar which allows you to sort and filter for response times. We recommend isolating the "slow" pages from the request.log, then individually tuning them for a better performance. This is usually done by including performance metrics per component or using a performance profiling tool such as yourkit. Monitoring traffic on your website The request log registers each request made, together with the response made: 09:43:41 [66] -> GET /author/y.html HTTP/1.1 09:43:41 [66] <- 200 text/html 797ms By totaling all the GET entries within a specific periods (e.g. over various 24 hour periods) you can make statements about the average traffic on your website. Monitoring response times with the CQ request.log A good starting point for performance analysis is the request log: <cq-installation-dir>/crx-quickstart/logs/request.log The log looks as follows (the lines are shortened for simplicity): 31/Mar/2009:11:32:57 31/Mar/2009:11:32:57 31/Mar/2009:11:33:17 31/Mar/2009:11:33:17 +0200 +0200 +0200 +0200 [379] [379] [380] [380] -> <-> <- GET 200 GET 200 /path/x HTTP/1.1 text/html 33ms /path/y HTTP/1.1 application/json 39ms This log has one line per request or response: • The date at which each request or response was made. • The number of the request, in square brackets. This number matches for the request and the response. • An arrow indicating whether this is a request (arrow pointing to the right) or a response (arrow to the left). • For requests, the line contains: • the method (typically, GET, HEAD or POST) • the requested page • the protocol • For responses, the line contains: • the status code (200 means “success”, 404 means “page not found” • the MIME type • the response time Using small scripts, you can extract the required information from the log file and assemble the statistics you want. From these, you can see which pages or types of pages are slow, and if the overall performance is satisfactory. Monitoring search response times with the CQ5 request.log Search requests are also registered in the log file: 31/Mar/2009:11:35:34 +0200 [338] -> GET /author/playground/en/tools/search.html? query=dilbert&size=5&dispenc=utf-8 HTTP/1.1 31/Mar/2009:11:35:34 +0200 [338] <- 200 text/html 1562ms So, as above, you can use scripts to extract the relevant information and build up statistics. However, once you have determined the response time, you may need to analyze why the request is taking the time it does, and what can be done to improve the response. Further information about the underlying search functionality of CRX can be found at Searching in CRX. Monitoring the number and impact of concurrent users Again the request.log can be used to monitor concurrency and the system's reaction to it. Tests must be made to determine how many concurrent users the system can handle before a negative impact is seen. Again scripts can be used to extract results from the log file: © 2012 Adobe Systems Incorporated. All rights reserved. Page 4 Created on 2014-09-15 How to Monitor Performance • • monitor how many requests are made within a specific time span e.g. one minute test the effects of a specific number of users all making the same requests at (as close as possible) the same time; e.g. 30 users clicking Save at the same time. 31/Mar/2009:11:45:29 +0200 [333] -> GET /author/libs/Personalize/content/statics.close.gif HTTP/1.1 31/Mar/2009:11:45:29 +0200 [334] -> GET /author/libs/Personalize/content/statics.detach.gif HTTP/1.1 31/Mar/2009:11:45:30 +0200 [335] -> GET /author/libs/CFC/content/imgs/ logo.rZMNURccynWcTpCxyuBNiTCoiBMmw000.default.gif HTTP/1.1 31/Mar/2009:11:45:32 +0200 [335] <- 304 text/html 0ms 31/Mar/2009:11:45:33 +0200 [334] <- 200 image/gif 31ms 31/Mar/2009:11:45:38 +0200 [333] <- 200 image/gif 31ms 31/Mar/2009:11:45:42 +0200 [336] -> GET /author/libs/CFC/content/imgs/ logo.rZMNURccynWcTZRXunQbbQtvuuCMbRRBuWXz0000.default.gif HTTP/1.1 31/Mar/2009:11:45:43 +0200 [337] -> GET /author/titlebar_bg.gif HTTP/1.1 31/Mar/2009:11:45:43 +0200 [336] <- 304 text/html 0ms 31/Mar/2009:11:45:44 +0200 [337] <- 304 text/html 0ms USING RLOG.JAR TO FIND REQUESTS WITH LONG DURATION TIMES CQ includes various helper tools located in: <cq-installation-dir>/crx-quickstart/opt/helpers One of these, rlog.jar, can be used to quickly sort request.log so that requests are displayed by duration, from longest to shortest time. The following command shows the possible arguments: $java -jar rlog.jar Request Log Analyzer Version 21584 Copyright 2005 Day Management AG Usage: java -jar rlog.jar [options] <filename> Options: -h Prints this usage. -n <maxResults> Limits output to <maxResults> lines. -m <maxRequests> Limits input to <maxRequest> requests. -xdev Exclude POST request to CRXDE. For example, you can run it specifying request.log file as a parameter and show the 10 first requests that have the longest duration: $ java -jar ../opt/helpers/rlog.jar -n 10 request.log *Info * Parsed 464 requests. *Info * Time for parsing: 22ms *Info * Time for sorting: 2ms *Info * Total Memory: 1mb *Info * Free Memory: 1mb *Info * Used Memory: 0mb -----------------------------------------------------18051ms 31/Mar/2009:11:15:34 +0200 200 GET /content/geometrixx/en/company.html text/ html 2198ms 31/Mar/2009:11:15:20 +0200 200 GET /libs/cq/widgets.js application/x-javascript 1981ms 31/Mar/2009:11:15:11 +0200 200 GET /libs/wcm/content/welcome.html text/html 1973ms 31/Mar/2009:11:15:52 +0200 200 GET /content/campaigns/geometrixx.teasers..html text/ html 1883ms 31/Mar/2009:11:15:20 +0200 200 GET /libs/security/cq-security.js application/xjavascript 1876ms 31/Mar/2009:11:15:20 +0200 200 GET /libs/tagging/widgets.js application/x-javascript 1869ms 31/Mar/2009:11:15:20 +0200 200 GET /libs/tagging/widgets/themes/default.js application/ x-javascript 1729ms 30/Mar/2009:16:45:56 +0200 200 GET /libs/wcm/content/welcome.html text/html; charset=utf-8 1510ms 31/Mar/2009:11:15:34 +0200 200 GET /bin/wcm/contentfinder/asset/view.json/ content/dam? _dc=1238490934657&query=&mimeType=image&_charset_=utf-8 application/json 1462ms 30/Mar/2009:17:23:08 +0200 200 GET /libs/wcm/content/welcome.html text/html; charset=utf-8 You may need to concatenate the individual request.log files if you need to do this operation on a large data sample. REQUEST COUNTERS Information about request traffic (number of requests during a specific time period) gives you an indication of the load on your instance. This information can be extracted from request.log, though using counters will automate data collection to let you see: © 2012 Adobe Systems Incorporated. All rights reserved. Page 5 Created on 2014-09-15 How to Monitor Performance • • • significant differences in activity (ie differentiate between "many requests" and "low activity" when an instance is not being used any restarts (counters are reset to 0) To automate information collection you can also install a RequestFilter to increment a counter on every request. Multiple counters can be used for different time periods. The information gathered can be used to indicate: • significant changes in activity • a redundant instance • any restarts (counter reset to 0) HTML COMMENTS It is recommended that every project includes html comments for server performance. Many good public examples can be found; select a page, open the page source for viewing and scroll to the bottom, code such as the following can be seen: </body> </html> <!-Page took 58 milliseconds to be rendered by server --> APACHE BENCH To minimize the impact of special cases (such as garbage collection, etc), it is recommended to use a tool such as apachebench (see for example, ab for further documentation) in the following way: $ ab -c 5 -k -n 1000 "http://localhost:4503/content/geometrixx/en/company.html" This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests Server Software: Server Hostname: Server Port: Day-Servlet-Engine/4.1.8 localhost 4503 Document Path: Document Length: /content/geometrixx/en/company.html 14246 bytes Concurrency Level: 5 Time taken for tests: 54.595 seconds Complete requests: 1000 Failed requests: 943 (Connect: 0, Receive: 0, Length: 943, Exceptions: 0) Write errors: 0 Keep-Alive requests: 0 Total transferred: 14391487 bytes HTML transferred: 14242487 bytes Requests per second: 18.32 [#/sec] (mean) Time per request: 272.974 [ms] (mean) Time per request: 54.595 [ms] (mean, across all concurrent requests) Transfer rate: 257.43 [Kbytes/sec] received Connection Times (ms) © 2012 Adobe Systems Incorporated. All rights reserved. Page 6 Created on 2014-09-15 How to Monitor Performance Connect: Processing: Waiting: Total: min 0 121 114 121 mean[+/-sd] median 1 2.6 0 271 72.9 258 256 69.3 244 272 72.9 260 max 40 653 628 654 Percentage of the requests served within a certain time (ms) 50% 260 66% 290 75% 310 80% 324 90% 368 95% 411 98% 453 99% 491 100% 654 (longest request) The numbers above are taken from a standard, single cpu, dual-core, intel laptop accessing the geometrixx company page, as included in a default CQ installation. The page is very simple, but not optimized for performance. apachebench also displays the time per request as the mean, across all concurrent requests; see Time per request: 54.595 [ms] (mean, across all concurrent requests). You can change the value of the concurrency parameter -c (number of multiple requests to perform at a time) to see any effects. MONITORING PERFORMANCE USING JCONSOLE The tool command jconsole is available with the JDK. 1. 2. 3. Start your CQ5 instance. Run jconsole. Select your CQ instance and Connect. 4. From within the Local application, double-click com.day.crx.quickstart.Main; the Overview will be shown as default: After this you can select other options. © 2012 Adobe Systems Incorporated. All rights reserved. Page 7 Created on 2014-09-15 How to Monitor Performance MONITORING PERFORMANCE USING (J)VISUALVM Since JDK 1.6, the tool command jvisualvm is available. After you have installed JDK 1.6 you can: 1. Start your CQ5 instance. NOTE If using Java 5 you can add the -Dcom.sun.management.jmxremote argument to the java command line that starts your JVM. JMX is enabled per default with Java 6. 2. 3. Run either: • jvisualvm: in the JDK 1.6 bin folder (tested version) • visualvm: can be downloaded from VisualVM (bleeding edge version) From within the Local application, double-click com.day.crx.quickstart.Main; the Overview will be shown as default: After this you can select other options, including Monitor: You can use this tool to generate thread dumps and memory head dumps. This information is often requested by the technical support team. © 2012 Adobe Systems Incorporated. All rights reserved. Page 8 Created on 2014-09-15 How to Monitor Performance INFORMATION COLLECTION Knowing as much as possible about your installation can help you track down what might have caused a change in performance, and whether these changes are justified. These metrics need to be collected at regular intervals so you can easily see significant changes. The following information can be useful: • How many authors are working with the system? • What is the average number of page activations per day? • How many pages do you currently maintain on this system? • If you use MSM, what is the average number of rollouts per month? • What is the average number of Live Copies per month? • If you use CQ DAM, how many assets do you currently maintain in CQ DAM? • What is the average size of the assets? • How many templates are currently used? • How many components are currently used? • How many requests per hour do you have on the author system at peak time? • How many requests per hour do you have on the publish system at peak time? How many authors are working with the system? To see the number of authors that have used the system since installation use the command line: cd <cq-installation-dir>/crx-quickstart/logs cut -d " " -f 3 access.log | sort -u | wc -l To see the number of authors working on a given date: grep "<date>" access.log | cut -d " " -f 3 | sort -u | wc -l What is the average number of page activations per day? To see the total number of page activations since server installation use a repository query; via CRXDE Tools - Query: • Type XPath • Path / • Query //element(*, cq:AuditEvent)[@cq:type='Activate'] Then calculate the number of days that have elapsed since installation to calculate the average. How many pages do you currently maintain on this system? To see the number of pages currently on the server use a repository query; via CRXDE - Tools - Query: • Type XPath • Path / • Query //element(*, cq:Page) If you use MSM, what is the average number of rollouts per month? To determine the total number of rollouts since installation use a repository query; via CRXDE - Tools Query: • Type XPath • Path / • Query //element(*, cq:AuditEvent)[@cq:type='PageRolledOut'] Calculate the number of months that have elapsed since installation to calculate the average. © 2012 Adobe Systems Incorporated. All rights reserved. Page 9 Created on 2014-09-15 How to Monitor Performance What is the average number of Live Copies per month? To determine the total number of Live Copies made since installation use a repository query; via CRXDE Tools - Query: • Type XPath • Path / • Query //element(*, cq:LiveSyncConfig) Again use the number of months that have elapsed since installation to calculate the average. If you use CQ DAM, how many assets do you currently maintain in CQ DAM? To see how many DAM assets you currently maintain, use a repository query; via CRXDE - Tools - Query: • Type XPath • Path / • Query /jcr:root/content/dam//element(*, dam:Asset) What is the average size of the assets? To determine the total size of the /var/dam folder: 1. Use WebDAV to map the CQ repository to the local file system. 2. Use the command line: cd /Volumes/localhost/var du -sh dam/ To get the average size, divide the global size by the total number of assets in /var/dam (obtained above). How many templates are currently used? To see the number of templates currently on the server use a repository query; via CRXDE - Tools - Query: • Type XPath • Path / • Query //element(*, cq:Template) How many components are currently used? To see the number of components currently on the server use a repository query; via CRXDE - Tools Query: • Type XPath • Path / • Query //element(*, cq:Component) How many requests per hour do you have on the author system at peak time? To determine the requests per hour you have on the author system at peak time: 1. To determine the total number of requests since installation use the command line: © 2012 Adobe Systems Incorporated. All rights reserved. Page 10 Created on 2014-09-15 How to Monitor Performance cd <cq-installation-dir>/crx-quickstart/logs grep -R "\->" request.log | wc -l 2. To determine the start and end dates: vim request.log G / 1G: for the last/first lines Use these values to calculate the number of hours that have elapsed since installation, then the average number of requests per hour. How many requests per hour do you have on the publish system at peak time? Repeat the above procedure on your publish instance. Analyzing Specific Scenarios The following is a list of suggestions on what to check if you start experiencing certain CQ performance problems. The list is not (unfortunately) fully comprehensive. CPU AT 100% If the CPU of your system is constantly running at 100% then see: • The Knowledge Base: • Analyze Slow and Blocked Processes OUT OF MEMORY Although such errors should be detected during Development and Testing, certain scenarios can slip through. If your system is running out of memory this can be seen in various ways, including performance degradation and error messages including the subtext: java.lang.OutOfMemoryError In these cases check: • the JVM settings used to start CQ • The Knowledge Base: • Analyze Memory Problems DISK I/O If your system is either running out of diskspace, or you notice disk thrashing starting see: • Optimizing Tar Files and Optimizing Tar Files in a Cluster • Whether you have disabled collection of debug information; this can be configured in various locations, including: • Apache Sling JSP Script Handler • Apache Sling Java Script Handler • Apache Sling Logging Configuration • CQ HTML Library Manager • CQ WCM Debug Filter • Loggers • Whether and how you have configured Version Purging • The Knowledge Base: • Too Many Open Files • Journal consumes too much diskspace © 2012 Adobe Systems Incorporated. All rights reserved. Page 11 Created on 2014-09-15 How to Monitor Performance REGULAR PERFORMANCE DEGRADATION If you see the performance of your instance deteriorating after each reboot (sometimes a week or more later), then the following can be checked: • Out of Memory • The Knowledge Base: • Unclosed Sessions JVM TUNING The Java Virtual Machine (JVM) has significantly improved in respect to tuning (especially since Java 7). Because of this, specifying a reasonable fixed JVM size and using the defaults will often be suitable. If the default settings are not suitable, then it is important to establish a method to monitor and assess GC performance before attempting to tune the JVM; this can involve monitoring factors including, heap size, algorithm and other aspects. Some common choices are: • VerboseGC: -verbose:gc \ -Xloggc:$LOGS/verbosegc.log \ -XX:+PrintGCDetails \ -XX:+PrintGCDateStamps The resulting log can be ingested by a GC visualizer such as: http://www.ibm.com/developerworks/library/j-ibmtools2/ Or JConsole: • These settings are for a "wide open" JMX connection: -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=8889 \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false • Then connect to the JVM with the JConsole; see: http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html This will help you see how much memory is being used, what GC algorithms are being used, how long they take to run, and what effect this has on your application performance. Without this, tuning is just "randomly twiddling knobs". NOTE For Oracle's VM there is also information at: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/server-class.html © 2012 Adobe Systems Incorporated. All rights reserved. Page 12 Created on 2014-09-15