Latest Intel® VTune™ Amplifier XE 2016 Improvements

Transcription

Latest Intel® VTune™ Amplifier XE 2016 Improvements
Intel® VTune™ Amplifier XE Performance Profiler
Latest 2016 Beta Improvements
David Anderson, May 2015
Agenda
• Intel® VTune™ Amplifier XE Performance Profiler
• Intel® VTune™ Amplifier XE 2016 Beta:
• Improvements in Intel® OpenMP Analysis
• Better Support of MPI+OpenMP Hybrid Applications
• Enhanced Intel® HD Graphics Profiling
• Latest Ease-of-use Improvements
• New IDE and OS Support
• Additional Material
• Demo
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
2
Tune Applications for Scalable Multicore Performance
Intel® VTune™ Amplifier XE Performance Profiler
Is your application slow?
Does its speed scale with more cores?
Tuning without data is just guessing
 Accurate CPU, GPU1 & threading data
 Powerful analysis & filtering of results
 Easy set-up, no special compiles
“Last week, Intel® VTune™ Amplifier XE
helped us find almost 3X performance
improvement. This week it helped us
improve the performance another 3X.”
1
Windows* only.
Claire Cates
Principal Developer
SAS Institute Inc.
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
http://intel.ly/vtune-amplifier-xe
Optimization Notice
3
Two Great Ways to Collect Data
Intel® VTune™ Amplifier XE
Software Collector
Hardware Collector
Uses OS interrupts
Uses the on chip Performance Monitoring Unit (PMU)
Collects from a single process tree
Collect system wide or from a single process tree.
~10ms default resolution
~1ms default resolution (finer granularity - finds small functions)
Either an Intel® or a compatible processor
Requires a genuine Intel® processor for collection
Call stacks show calling sequence
Optionally collect call stacks
Works in virtual environments
Works in a VM only when supported by the VM
(e.g., vSphere* 5.1)
No driver required
Requires a driver
- Easy to install on Windows
- Linux requires root
(or use default perf driver without stacks)
No special recompiles - C, C++, C#, Fortran, Java, Assembly
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
4
A Rich Set of Performance Data
Intel® VTune™ Amplifier XE
Software Collector
Basic Hotspots
Hardware Collector
Advanced Hotspots
Which functions use the most time?
Which functions use the most time?
Where to inline? – Statistical call counts
Concurrency
General Exploration
Tune parallelism.
Colors show number of cores used.
Locks and Waits
Tune the #1 cause of slow threaded performance:
– waiting with idle cores.
Any IA86 processor, any VM, no driver
Where is the biggest opportunity?
Cache misses? Branch mispredictions?
Advanced Analysis
Dig deep to tune access contention, etc.
Higher res., lower overhead, system wide
No special recompiles - C, C++, C#, Fortran, Java, Assembly
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
5
Improvements in Intel® OpenMP Analysis
Realizing Gain with Less Pain
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
6
Identify Where to Invest Effort in OpenMP Applications
• Intel OpenMP analysis enhancements
• Serial time vs. parallel time
• OpenMP parallel region potential improvements
Is serial time of my application significant to prevent scaling?
How efficient is my parallelization towards ideal parallel execution?
How much theoretical gain I can get if invest in tuning?
Which regions have the
highest potential ROI?
Links to grid view for more
details on inefficiency
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
7
What is Hindering Parallel Performance?
VTune Amplifier Identifies Parallel Region Inefficiencies
• Details of efficiency in Bottom-up view
• Precise, trace-based imbalance calculation especially useful for small region
instances
Imbalance
Likely culprit:
Dynamic scheduling overhead
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
8
Jump to Parallel Region Source Code
• View data specific to the region at the source code level
• Tip: use ‘-parallel-source-info=2’ compiler option to embed source file name
in region name (enables drill down to source file)
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
9
Understanding Large Parallel Regions
New Barrier-to-Barrier Analysis
•
Essential for regions with more than 1 barrier (implicit or explicit)
•
•
Critical for model with 1 parallel region and work-sharing constructs inside with
pragma single to do sequential work
Intel OpenMP runtime library traces sub-region segments from region fork
or previous barrier points
Parallel Region
Sub-region 4
Sub-region 1
Sub-region 3
Sub-region 2
User Barrier
omp single
Single barrier
omp for
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
omp for barrier
Region Join
Optimization Notice
10
Details on Barrier-to-Barrier Instances
• Use the “/Barrier-to-Barrier Segment” grouping
• Lexical loop constructs with different schedule types or chunk sizes displayed
separately
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
11
Better Support of MPI+OpenMP Hybrid
Applications
VTune Amplifier + Intel® MPI = Success!
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
12
MPI+OpenMP Hybrid Application Analysis
Optimize Where it Will be Most Effective
• MPI communication spinning metrics for Intel® MPI
• Showing OpenMP metrics per process sorting by processes laying on critical
path of MPI execution
• Don’t optimize OpenMP, if MPI communication is the bottleneck!
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
13
Deeper and Wider Analysis
Get the Details and the Big Picture
• Hyperlinks in Summary jump to Bottom-up view with ‘/Process /OpenMP
Region/ …’ groupings to get details of the OpenMP metrics aggregated perprocess
• MPI communication spin time can highlighted on the timeline
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
14
Insight Provided by Multiple Tools!
• Intel® MPI provides ‘-gtool’ option in version 5.0.2 and higher to simplify
selective rank profiling
• Intel® Trace Analyzer 9.0.2 and higher can identify CPU-bound processes and
generate VTune Amplifier command line
• Modifies ITA command for use with VTune Amplifier
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
15
Enhanced Intel® HD graphics Profiling
Visual GPU Execution
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
16
Review of GPU Profiling Support
Hardware and OpenCL* Metrics
• Collect Intel® Integrated
Graphics hardware metrics
• Details regarding OpenCL™
activity on the GPU
• Presented correlated with CPU
processes and threads
• Available on Windows*
* See this Getting Started article
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
17
Intel® HD Graphics Profiling on Linux*
• Analyze OpenCL* kernels on Linux systems
• GPU hardware metrics not yet available
• Intel® Media Server Studio support
• Extended counter set for 5th generation Intel® Core™ processors (code name:
Broadwell)
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
18
Enhanced Intel® HD Graphics Profiling
Visualize Hardware Interaction
• GPU Architecture Diagram annotated with metrics
• Available for OpenCL* Compute tasks
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
19
Latest Ease-of-Use Improvements
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
20
Analyze with Confidence!
Low Sample Counts Reduce Metrics Validity
• Summary, Bottom-up, and Source views gray out metrics when amount of
collected data puts their validity in question
• Currently in hardware-based sampling General Exploration data
• Hovering over cell provides explanation
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
21
Platform Tab Replaces Tasks and Frames
Can now Display Additional Information
• Will include:
• GPU Usage/Queue
• Bandwidth
• and CPU Freq ratio
• depending on collection
options
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
22
More “Big Picture” Support
• “Super Tiny” bird’s-eye view to help recognize application phases and
behavioral patterns
• Use context menu to select
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
23
Better Linux* Support
• Linux ‘perf’ supported for users without root access
• Linux* operating systems based on kernel 2.6.32 or higher
• That export CPU PMU programming details over
/sys/bus/event_source/devices/cpu/format file system
• VTune Amplifier hardware-based sampling driver provides additional
features, such as:
• Stacks†
• Uncore events
• Multiple, precise events
• New events for the latest processors, even on older operating systems
† Newer
Linux releases include support for stacks-collection with PMU events
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
24
New IDE and OS Support
• Microsoft* Visual Studio* 2015 integration support
• Microsoft Windows* 10 “Threshold” tech preview build #9926 tested with
hardware-based sampling
• Software-based sampling types not supported (basic hotspots, concurrency, and
locks and waits)
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
25
Summary
• Intel® VTune™ Amplifier XE 2016 Beta:
• Improvements in Intel® OpenMP Analysis
• Better Support of MPI+OpenMP Hybrid Applications
• Enhanced Intel® HD Graphics Profiling
• Latest Ease-of-use Improvements
• Let us know what you think, participate in the Beta Program today!
• Register at bit.ly/psxe2016beta
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
26
Additional Material
Intel® VTune™ Amplifier – Performance Profiler:





Product page – overview, features, FAQs…
Training materials – movies, tech briefs, documentation…
Evaluation guides – step by step walk through
Reviews
Support – forums, secure support…
Additional Analysis Tools:
 Intel® Inspector XE - memory and thread checker / debugger
 Intel® Advisor XE – thread prototyping tool for architects
Additional Development Products:
 Intel® Software Development Products
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
27
Demo
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
28
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of
that product when combined with other products.
Copyright © 2014, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
29