PDF Version - Doktorandi

Transcription

PDF Version - Doktorandi
Mitglied der Helmholtz-Gemeinschaft
GPU Implementations of
Online Track Finding
Algorithms at PANDA
HK 57.2, DPG-Frühjahrstagung 2014, Frankfurt
21 March 2014, Andreas Herten (Institut für Kernphysik, Forschungszentrum Jülich) for the PANDA Collaboration
1
Mitglied der Helmholtz-Gemeinschaft
PANDA — The Experiment
13 m
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
2
PANDA — The Experiment
Magnet
STT
Mitglied der Helmholtz-Gemeinschaft
MVD
13 m
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
2
PANDA — Event Reconstruction
• Triggerless read out
– Many benchmark channels
– Background & signal similar
7/s
Event
Rate:
2
•
10
•
Raw Data Rate:
200 GB/s
Mitglied der Helmholtz-Gemeinschaft
Reduce by
~1/1000
(Reject background events,
save interesting physics events)
Disk Storage Space for
Offline Analysis: 3 PB/y
3
PANDA — Event Reconstruction
• Triggerless read out
– Many benchmark channels
– Background & signal similar
7/s
Event
Rate:
2
•
10
•
Raw Data Rate:
200 GB/s
Mitglied der Helmholtz-Gemeinschaft
Reduce by
~1/1000
GPUs
(Reject background events,
save interesting physics events)
Disk Storage Space for
Offline Analysis: 3 PB/y
3
PANDA — Tracking, Online Tracking
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
Mitglied der Helmholtz-Gemeinschaft
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Mitglied der Helmholtz-Gemeinschaft
PANDA
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Mitglied der Helmholtz-Gemeinschaft
PANDA
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Mitglied der Helmholtz-Gemeinschaft
PANDA
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
PANDA — Tracking, Online Tracking
Usual HEP experiment
Trigger
• But computational
intensive software
trigger
→ Online Tracking
Detector layers
• PANDA: No
hardware-based
trigger
Mitglied der Helmholtz-Gemeinschaft
PANDA
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
4
GPUs @ PANDA — Online Tracking
• Port tracking algorithms to GPU
– Serial → parallel
– C++ → CUDA
Mitglied der Helmholtz-Gemeinschaft
• Investigate suitability for online performance
• But also: Find & invent tracking algorithms…
• Under investigation:
– Hough Transformation
– Riemann Track Finder
– Triplet Finder
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
5
Algorithm: Hough Transform
• Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
• Solve rij line equation for
– Lots of hits (x,y,ρ)i and
– Many αj ∈ [0°,360°) each
Hough Transform — Princip
• Fill histogram
• Extract track parameters
y
r
Mitglied der Helmholtz-Gemeinschaft
Mitglied der Helmholtz-Gemeinschaft
y
→ Bin
giv
α
x
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
x
6
Algorithm: Hough Transform
• Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
rij = cos↵j · xi + sin↵j · yi + ⇢i
• Solve rij line equation for
– Lots of hits (x,y,ρ)i and
– Many αj ∈ [0°,360°) each
i: ~100 hits/event (STT)
rij: 180—000
Hough Transform
Princip
j: every 0.2°
• Fill histogram
• Extract track parameters
y
r
Mitglied der Helmholtz-Gemeinschaft
Mitglied der Helmholtz-Gemeinschaft
y
→ Bin
giv
α
x
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
x
6
r
Hough transformed
Algorithm: Hough Transform
68 (x,y)0 points
0.6
Entries
2.2356e+08
25
0.5
Mean x
90
Mean y
0.02905
0.4
RMS x
51.96
RMS y
0.1063
20
0.3
0.2
15
0.1
0
10
-0.1
Mitglied der Helmholtz-Gemeinschaft
-0.2
5
-0.3
-0.4
0
20
40
60
80
100
120
140
160
180
α
Angle / °
0
PANDA STT+MVD
1800 x 1800 Grid
7
r
Hough transformed
Algorithm: Hough Transform
68 (x,y)0 points
0.6
Entries
2.2356e+08
25
0.5
Mean x
90
Mean y
0.02905
0.4
RMS x
51.96
RMS y
0.1063
20
0.3
0.2
15
0.1
0
10
-0.1
Mitglied der Helmholtz-Gemeinschaft
-0.2
5
-0.3
-0.4
0
20
40
60
80
100
120
140
160
180
α
Angle / °
0
PANDA STT+MVD
1800 x 1800 Grid
7
Algorithm: Hough Transform
Two Implementations
Thrust
Plain CUDA
• Performance: 3 ms/event
• Performance: 0.5 ms/event
– Independent of α granularity
– Reduced to set of standard routines
– Built completely for this task
• Fitting to every problem
•
Fast (uses Thrust‘s optimized algorithms)
•
Customizable
•
Inflexible (has it‘s limits, hard to customize)
•
A bit more complicated at parts
– No peakfinding included
Even possible?
•
Adds to time!
• Using: Dynamic Parallelism, Shared
Memory
Mitglied der Helmholtz-Gemeinschaft
•
– Simple peakfinder implemented
(threshold)
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
8
Algorithm: Riemann Track Finder
• Idea: Don‘t fit lines (in 2D), fit planes (in 3D)!
• Create seeds
– All possible three hit combinations
• Grow seeds to tracks
Continuously test next hit if it fits
– Use mapping to Riemann paraboloid
• Summer student project (J. Timcheck)
x
x
y
x
x
x
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
y
x
x
x
x
y
x
x
x
Mitglied der Helmholtz-Gemeinschaft
z‘
9
Algorithm: Riemann Track Finder
• GPU Optimization: Unfolding loops
for () {for () {for () {}}}
int ijk = threadIdx.x + blockIdx.x * blockDim.x;
⌘
1 ⇣p
nLayerx =
8x + 1 1
2
p
p p
3
3 243x2 1 + 27x
1
p
pos(nLayerx ) =
+
p
p p
3
3
32/3
3
3 243x2
1 + 27x
1
→ 100 × faster than CPU version
Mitglied der Helmholtz-Gemeinschaft
• Time for one event (Tesla K20X): ~0.6 ms
10
Algorithm: Triplet Finder
• Idea: Use only sub-set of detector as seed
– Combine 3 hits to Triplet
– Calculate circle from 3 Triplets (no fit)
• Features
– Tailored for PANDA
– Fast & robust algorithm, no t0
Mitglied der Helmholtz-Gemeinschaft
• Ported to GPU together with NVIDIA Application Lab
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
11
Mitglied der Helmholtz-Gemeinschaft
Triplet Finder — Time
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
12
Triplet Finder — Optimizations
• Bunching Wrapper
Mitglied der Helmholtz-Gemeinschaft
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Triplet Finder — Optimizations
• Bunching Wrapper
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Mitglied der Helmholtz-Gemeinschaft
Hit
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Triplet Finder — Optimizations
• Bunching Wrapper
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Event
Mitglied der Helmholtz-Gemeinschaft
Hit
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Triplet Finder — Optimizations
• Bunching Wrapper
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Event
Mitglied der Helmholtz-Gemeinschaft
Hit
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Triplet Finder — Optimizations
• Bunching Wrapper
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Hit
Event
Mitglied der Helmholtz-Gemeinschaft
Bunch
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Triplet Finder — Optimizations
• Bunching Wrapper
– Hits from one event have similar timestamp
– Combine hits to sets (bunches) which fill up GPU best
Hit
Event
Bunch
Mitglied der Helmholtz-Gemeinschaft
𝒪(N2) → 𝒪(N)
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
13
Mitglied der Helmholtz-Gemeinschaft
Triplet Finder — Bunching
Performance
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
14
Triplet Finder — Optimizations
GPU
CPU
• Compare kernel launch strategies
Dynamic
Parallelism
Joined
Kernel
Host
Streams
Triplet
Finder
Triplet
Finder
Triplet
Finder
thread/
1
thread
bunch
bunch
1 1thread//bunch
Calling
Calling
Calling
kernel
kernel
kernel
block
1block
block//bunch
1
bunch
1
/bunch
Joined
Joined
Joined
kernel
kernel
kernel
TF Stage #1
stream/
1 stream
bunch
1
bunch
1 stream//
bunch
Combining
Combining
Calling
stream
stream
stream
TF Stage #1
Mitglied der Helmholtz-Gemeinschaft
TF Stage #1
TF Stage #2
TF Stage #2
TF Stage #2
TF Stage #3
TF Stage #3
TF Stage #3
TF Stage #4
TF Stage #4
TF Stage #4
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
15
Triplet Finder — Kernel Launches
Mitglied der Helmholtz-Gemeinschaft
Preliminary
(in publication)
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
16
Triplet Finder — Clock Speed / Chipset
Preliminary
(in publication)
K40 3004 MHz, 745 MHz / 875 MHz
K20X 2600 MHz, 732 MHz / 784 MHz
Mitglied der Helmholtz-Gemeinschaft
Memory Clock
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
Core Clock
GPU Boost
17
Summary
• Investigated different tracking algorithms
– Best performance: 20 µs/event
→ Online Tracking a feasible technique for PANDA
• Multi GPU system needed – 𝒪(100) GPUs
Mitglied der Helmholtz-Gemeinschaft
• Still much optimization necessary (efficiency)
• Collaboration with NVIDIA Application Lab
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
18
Summary
• Investigated different tracking algorithms
– Best performance: 20 µs/event
→ Online Tracking a feasible technique for PANDA
• Multi GPU system needed – 𝒪(100) GPUs
• Still much optimization necessary (efficiency)
• Collaboration with NVIDIA Application Lab
Mitglied der Helmholtz-Gemeinschaft
!
u
o
y
k
Than
rten
Andreas He
h.de
c
i
l
e
u
j
z
f
@
n
a.herte
Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2
18