Getting Your Groove

Transcription

Getting Your Groove
Getting Your Groove:
Step-by-Step Performance Profiling
Juan Guardado
GDC Europe Tutorial Day
Agenda
What to profile
How to profile your game
System setup
System tweaks
VTune
NVPerfHUD
Increasing performance
Guidelines
FX Studio
What is your target platform?
GeForce FX 5900 Ultra
Ferrari F1
GeForce FX 5200
Ferrari Modena 360
End-user survey
If a new game that you really wanted to play came out and it required that you upgrade all
or part of your PC, how likely would you be to upgrade your:
100%
90%
80%
70%
60%
50%
40%
30%
Unlikely
Unlikely
Likely
Likely
RAM (25757)
GPU (25901)
Unlikely
Likely
20%
10%
0%
CPU (25769)
Identify your target platform
5900, 5600, and 5200 products
Matched CPU and memory
Identify settings you care about
Common resolution?
Smooth multisampling?
Nice filtering?
Fancy shadows?
etc
Preliminary checks
Direct3D errors and performance warnings are a
bad sign
Must use debug runtime
Slide up validation a few notches
“Cavere profilare”
Switch Direct3D to retail runtime
Vertical sync sucks, use graph driver, PowerStrip,
or D3DPRESENT_INTERVAL_IMMEDIATE
Choose scene with nominal frame rate
Choose a control-group application
System tweaks: FSB / AGP clocks
Warning: CPU clock depends on FSB clock
Readjust clock multiplier
Let’s you know when bus traffic is a bottleneck
BasicHLSL (AGP VB’s)
183 fps @ AGP8X
133 fps @ AGP~3X (800MB/s)
BasicHLSL (SysMem VB’s)
40 fps @ 166MHz memory
53 fps @ 333MHz memory
System tweaks: CPU clocks
Works as multiplier of FSB clock
166MHz * 11 = 1.826GHz
166MHz * 5.5 = 918MHz
Freedom Fighters
114 / 96 / 53 fps @ 1.826GHz
80 / 76 / 47 fps @ 918MHz
Control app: BasicHLSL (SW VP)
45 fps @ 1.826GHz
23 fps @ 918MHz
System tweaks: GPU clocks
Powerstrip (www.entechtaiwan.com/ps.htm)
Warning: safer to underclock than overclock
Freedom Fighters
114 / 96 / 53 fps @ 450MHz
70 / 56 / 30 fps @ 225MHz
Control app: BasicHLSL
180 fps @ 450MHz
93 fps @ 225MHz
VTune 6.0
VTune likes symbol files next to binaries
Copy ../DX90SDK/Extras/Symbols map files to
../Windows/System32
Application
Good times if working in parallel with GPU
D3D9.DLL
Bad times, spending too much time wondering what
to do
NV4_DISP.DLL / NV4_MINI.SYS
Depends on performance characteristics
VTune results (application limited)
VTune results (driver limited)
NVPerfHUD
Overlay that shows various
vital statistics as the
application runs
Quick shader bottleneck
check
Quick texture bottleneck
check
Especially useful to
corroborate your bottleneck
theory
NVPerfHUD graph descriptioin
Top graph shows :
Number of draw calls – Draw*Primitive*()
Memory allocated – AGP and video
Bottom graph shows :
GPU idle – Graphics HW not processing anything
Driver time – Driver doing work (states and resource
management, shader compilation)
Driver idle – Driver waiting for GPU to finish
Frame time – Milliseconds for frame time
NVPerfHUD demo
MultiAnimation: draw call graph
Freedom Fighters: 1x1 textures
Not texture bound
BasicHLSL: verify GPU bound
Texturing performance
1x1 toggle easily identifies overall bottleneck
Manually toggle individual stages for better
analysis
Pair equivalently filtered texture lookups
Bilinear + Bilinear
Trilinear + Trilinear
Aniso + Aniso
FX Studio
Architecture and scheduling
GeForce FX architecture
Core
Tex
Tex
ALU
or
x4 pipelines!
8 tex/clk and 8 math/clk
ALU
RGB
A
ALU
RGB
A
ADD, MUL,
MAD, DP3,
DP4, DPH,
(MOV)
or
12 math/clk
Conclusion
Don’t be fooled by debug settings
Use the tools available
VTune
NVPerfHUD
FX Studio
(Graphics Performance Analyzer, D3DSpy)
Don’t by shy, talk to us
Questions, comments, feedback?
Juan Guardado, [email protected]
http://developer.nvidia.com