an energy and bandwidth efficient ray tracing architecture

Transcription

an energy and bandwidth efficient ray tracing architecture
AN ENERGY AND
BANDWIDTH EFFICIENT RAY
TRACING ARCHITECTURE
Daniel Kopta, Konstantin Shkurko, Josef Spjut,
Erik Brunvand, Al Davis
The Goal
2
! 
Can we reduce ray tracing energy consumption
without hurting frame rate?
TRaX – SPMD Ray Tracing GPU
3
Cycle accurate simulator
!  Tiled “Thread Multiprocessors” (TMs)
! 
Instruction Cache
Instruction Cache
Threads
Local Store
Registers
Integer/Branch
L1 Data Cache
Floating Point Units
Where Does Energy Go?
4
Energy/Frame (Joules)
! 
Energy
estimates
from Cacti
and Synopsys
L1
0.22
L2
1.31
DRAM
2.11
0.57
0.94
1.2
~200 Watts at 30 FPS
Instruction Cache
Register File
Compute (XUs)
What can we do?
5
! 
Consider an ASIC
!  Great
! 
Configurable persistent pipelines
!  Almost
! 
energy/delay, but not flexible
as good, but more flexible
Coherent data access
!  Stay
in pipeline mode longer
!  Reduce bandwidth requirements
Macro Instruction Pipelines
6
General Purpose
Reconfigurable Pipeline
Instruction fetch
Instruction fetch
Register File
Register File
Instruction fetch
XU
XU
XU
XU
Register File
Register File
Register File
Ray - Box/Triangle Dataflow
7
TRaX TM
8
TPs
Shared XUs
TPs
Reconfigured Pipeline
9
Special Regs
Box Unit
TPs
Shared FUs
TPs
Pipeline Persistence
10
! 
Opportunities in RT
!  Box
test (traversal)
!  Triangle test (intersection)
!  Others?
! 
Streaming, Ray sorting, etc…
!  StreamRay:
Gribble, Ramani, 2008, 2009
!  Treelet decomposition: Aila & Karras, 2010
Treelets
11
! 
Sized to fit in L1 cache
= box stream
= triangle stream
1
2
3
4
STRaTA
12
! 
Streaming Treelet Ray Tracing Architecture
! 
HW support for treelet ray streams
!  Repurposes
most of L2 cache
!  L1 hit rate increases, off-chip access decreases
! 
Enables pipelines to persist longer
!  Reduce
instruction fetch and register access
Test Scenes
13
Hairball
!  R
Sibenik
Vegetation
Where Does Energy Go?
14
Energy/Frame (Joules)
L1
Treelets
L2
DRAM
Persistent pipelines
0.22
1.31
2.11
Instruction Cache
0.57
Register File
Compute (XUs)
0.94
1.2
Results (ICache + RF)
15
I$ + Register File enery (Joules)
4.7
Baseline
Treelets Only
STRaTA
4.4
3.1
2.95 3.0
2.6
2.9
2.5
2.2
Hairball
Sibenik
Vegetation
Results (L2 + DRAM energy)
16
L2 Dcache + DRAM energy (Joules)
2.18
Baseline
STRaTA
1.64
1.54
1.18
.72
.43
Hairball
Sibenik
Vegetation
Results
17
Energy/Frame (Joules)
Hairball
Sibenik
0.14
1.1
1.5
0.53
Vegetation
1.2
2.6
0.33
0.22
0.85
1.4
0.8
0.29
1.6
0.91
0.13
1.7
0.48
1.1
0.3
1.8
1
L1
L2/Stream
DRAM
Instruction Cache
Register File
Compute (XUs)
Saved
Conclusions
18
! 
Up to 38% combined energy reduction
!  (memory
hierarchy, I$, RF)
! 
No significant change in performance
! 
Simple HW/SW modifications
Thanks!
19
Thanks to:
Samuli Laine for Hairball and Vegetation,
Marko Dabrovic for Sibenik
Anonymous reviewers
Utah Hardware Ray Tracing Group