Hidden Treasures in Mainframe Performance
Transcription
Hidden Treasures in Mainframe Performance
Hidden Treasures in Performance Hidden Treasures in Mainframe Performance A real life case study 20. April 2016 Günter Priller About the Presentation This presentation was developed by: • Contigon informationstechnologie + consulting gmbh • Vogelsberg Consulting GmbH The presentation aims for: • Showing an extended approach in performance analysis • Describing how workload pattern recognition leads to optimizations • Encouraging a deeper dive into performance data analytics 09.05.2016 2 About the Presentation Developers • Specialized in performance and tuning • More than 20 years of experience • Focused on Mainframe performance 09.05.2016 3 The starting point • We had optimized a LPAR by approximately 35% with several changes • On the system level changes to MQ, LE, IMS, SMS and z/OS had been applied • On the application level we applied SQL and COBOL changes • Best practice was implemented, the biggest CPU burners had been eliminated and some recommendations were waiting to be implemented within the next weeks • This was the starting point for a deeper dive because the ‚undergrowth‘ was removed 09.05.2016 4 The starting point 09.05.2016 5 The Challenge Service Class BATCHNO • The CPU consumption seemed to be unusually high compared to other LPAR workloads • The LPAR itself was more an online LPAR rather than a batch LPAR • Further investigation was required as it seemed that there where hidden CPU savings which had been invisible before the first optimization phases • The real challenge was: usually you stop after 35% CPU reduction for one LPAR but we thought – THE REAL FUN STARTS NOW! 09.05.2016 6 Address Space Level Breakdown 09.05.2016 7 CATALOG Analysis When we saw the high CATALOG CPU we decided to analyze the SMF42 records for existing anomalies • The first observation was a high usage of load libraries in terms of EXCPs • A quick check of the JCL showed that every job was using the JOBLIB statement • We also observed periodic high usage of datasets every full hour • The root cause was the HSM interval migration and some ‚miscellaneous‘ MGMTCLAS definitions • The last indicator for an anomaly was a high execution frequency of a few batch jobs per hour 09.05.2016 8 CATALOG Conclusions The following anomalies could be determined as an indicator of high CPU usage in the CATALOG and application address spaces • The JOBLIB statement causing unnecessary CPU consumption in jobstep initiator times and in uncaptured CPU • The HSM interval migration per hour along with MGMTCLAS definitions (datasets with 0 days before migration) • The frequency of a few jobsteps (every 5 minutes) which were using the fast migrated datasets • 3 application programs responsible for most of the dataset usage 09.05.2016 9 Initiator CPU times According to the observation that the JOBLIB statement was used in every batch job we needed to investigate on initiator CPU time • SMF30 subtype 4 provides the new field ICU which reports on initiator CPU times per jobsteps • An expected ratio between initiator CPU and complete CPU time would be 1 : 25 or even 1 : 30 • Our workload shows a different behavior 09.05.2016 10 Initiator CPU times 09.05.2016 11 Initiator CPU times application program1 09.05.2016 12 Initiator CPU times application program2 09.05.2016 13 Initiator CPU Analysis on application • Most of the initiator CPU time was concentrated on 3 application programs • An analysis of the purpose of these programs showed the following • Program1 copied one dataset to another • Program2 concatenated different datasets according to JCL • Program3 was sending messages via WTO to the job output • Our first conclusion was: • Replace Program1 by ICEGENER or SORT • Replace Program2 by IEFBR14 • Replace Program3 by JCL abilities 09.05.2016 14 Initiator CPU Analysis on application • Tests for the desired replacements showed improvements of 60% CPU reduction for the 3 programs • The next step was to analyze the frequency and occurrence • As these programs were considered to be self written utilities they occurred in nearly every job in different jobsteps • At least 5 jobs were scheduled every 5 minutes • These jobs contained at least 30 occurrences on step level of these self written utilities • Now it was time to question the frequency 09.05.2016 15 Initiator CPU Analysis on application • The high frequency of the execution for the jobs/programs could be explained by: we run it every 5 minutes and if there is nothing to do it will not fail • Changing this time driven behavior to an event driven schedule could save 80% or more of the executions • The number of executions per program could be reduced by 80% • Along with the replacement of the self written utilities by standard utilities this meant a 90% CPU reduction for the execution of these processes 09.05.2016 16 Final Conclusions During the complete analysis we tried to aim for easy to apply changes which would have a reasonable impact in terms of CPU savings • The recommended changes could be categorized as non intrusive and concentrated on JCL and scheduling • Replacement of application programs in JCL by common utilities • Elimination of the JOBLIB statement • Adjustment of scheduling plans (event driven versus time driven) • Adjustment of HSM processing parameters • Re-design of MGMTCLAS settings in SMS affecting HSM processing The changes applied to savings in CATALOG, application jobsteps, HSM CPU, uncaptured CPU and to a simpler scheduling control 09.05.2016 17 Final Conclusions • The workload picture at the starting point did not really promise substantial savings • One KPI was leading to the further analysis – the CATALOG address space CPU consumption • By using a combination of SMF data, parameter settings and JCL design in a context we could get to the bottom of the issue • The projected savings in CPU consumption amounted to more than 10% CPU savings including batch workload, CATALOG, HSM and uncaptured CPU The approach of combining different data sources to a complete picture succeeded and showed even further opportunities to go for. 09.05.2016 18 End of Presentation Thank you for your attention and patience Q/A 09.05.2016 19