Examples - Physikalisches Institut Heidelberg

Transcription

Examples - Physikalisches Institut Heidelberg
Examples
ASIC and FPGA digital designs in physics experiments
© V. Angelov
VHDL Vorlesung SS2009
1
Examples
•
•
•
•
•
Prototypes in the old times
Sweet-16 – a student RISC processor
TRAP chip for ALICE TRD – mixed mode ASIC
Optical Readout Interface (ORI) – CPLD
Detector Control System Board for ALICE
(TRD+TPC) – FPGA + ARM CPU core
• Power Distribution Box (PDB) – antifuse FPGA
• Global Tracking Unit for ALICE TRD – large
FPGA farm
© V. Angelov
VHDL Vorlesung SS2009
2
Prototypes yesterday...
Interface board for PHA (pulse high analysis) with 74’xxx
3 x SRAM 2k x 8
Battery for the SRAM
Full size ISA card for IBM/XT/AT
© V. Angelov
VHDL Vorlesung SS2009
3
Sweet-16
RAM
WAddr=Qb
WData=Qc
RAddr=Qb
RAM
CLK
Dq
WE
Pq
WE
WA
Rb,Rc
Status
Pq
ROM
© V. Angelov
Control
ProgCount
ProgCount
Status
C
Registerfile
Qc
MUX
Dq
ALU
Registerfile
Qb
• 16 bit RISC processor
• 1 clock/instruction
• easy to implement by students
without experience
• compact and portable to
different technologies, used in
FPGAs and ASICs
VHDL Vorlesung SS2009
4
A Large Ion Collider Experiment
• Pb-Pb Collision at
1.1 PeV/Nukleon
• Creation of Quark Gluon
Plasma
• TRD is used as a trigger
detector due to its fast
readout time (2 µs):
– Transversal Momentum
– Electron/Pion Separation
© V. Angelov
Inner Tracking System (ITS)
Time Projection Chamber (TPC)
Transition Radiation Detector (TRD)
VHDL Vorlesung SS2009
5
Transition Radiation Detector
TRD - Transition Radiation Detector
• used as trigger and tracking detector
• > 24000 particles / interaction in
acceptance of detector
• up to 8000 charged particles within the
TRD
• trigger task is to find specific particle
pairs within 6 μs.
ITS - Inner Tracking System
• event trigger
• vertex detection
TPC - Time Projection Chamber
• high resolution tracking detector
• but too slow for 8000 collisions / second
© V. Angelov
VHDL Vorlesung SS2009
6
TRD Structure
stack
1.2 million channels
1.4 million ADCs
peak data rate: 16 TB/s
~65000 MCMs
computing time: 6 µs
PASA
6 plane
s
•
•
•
•
•
MCM
TRAP
18 channels
TR-detector
φ
r
z
5
m
ax
.1
6
pa
dr
ow
s
O
RI
O
RI
m
od
ul
e
rin
gs
B
=0
.4
T
ve
rt
ex
module
18 Supermodules in azimuth
8 MCM ≡ 144 channels
1080 optical links @2.5Gbps
© V. Angelov
MCM performs amplification,
digitization (10bit 10MHz),
straight line fit, readout network
VHDL Vorlesung SS2009
7
Partitioning, Data Flow &
Reduction
MCM - Multi Chip Module
TRD
PASA
ADC
Tracklet
Tracklet
Preprocessor
Processor
TPP
TP
L1 trigger
to CTP
Network
Interface
NI
GTU
to HLT
& DAQ
event buffer
store raw data until L1A
detector
6 layers
1.2 million
analog channels
time:
data / event:
peak rate:
mean rate:
reduction:
© V. Angelov
charge
sensitive
preamplifier
shaper
10 Bit ADC
digital filter
10 MSPS
preprocess data
21 channels
event buffer
during first 2 µs (drift time)
33 MB
16 TB/s
257 GB/s
1
fit tracklets for
merge tracklets
builds readout
trigger
into tracks for
tree
functionality
trigger
for
process raw data
process raw data
trigger & raw data
monitoring
for HLT
after 3.5 µs
after 4.1 µs
after 6 µs
max. 80 KB
some bytes
600 GB/s
~ 400
to trigger decision
VHDL Vorlesung SS2009
8
FEE development
How to design the FEE?
- fast and low latency
- low power - precise power control (1mW/channel → 1kW)
- low cost
- avoid using connectors
- use some simple chip package (MCM + ball grid array)
- standard components? which process? IP cores? Layout
(TPC vs. TRD)?
- flexible, as much as possible, as the exact processing not known
- make everything configurable
- use CPU core for final processing
- reliable, no possibility to repair anything later
- redundancy, error and failure protection
- self diagnostic features
© V. Angelov
VHDL Vorlesung SS2009
9
FEE development(2)
Chip Design flow
1. Detector simulations to understand the signals and what kind
of processing we need
2. Select PASA shaping time, ADC sampling rate and resolution
3. Behavior model of the digital processing including the bitprecision in every arithmetic operation
4. Estimate the processing time and select the clock speed of
the design (multiple of LHC and ADC sampling clock)
5. Code the digital design, simulate it, synthesize it, estimate the
timing and area, optimize again…
6. Submit the chip, this is the point of no return!
7. Continue with the simulations, find some bugs and think about
fixes
8. Prepare the test setup
And so on, TRAP1, 2, TRAPADC, TRAP3, TRAP3a (final)
© V. Angelov
VHDL Vorlesung SS2009
10
Readout Boards
Half-chamber
2.5 Gbps
850 nm
MCM as Readout Network
data source only
data source and data merge
data merge only
© V. Angelov
VHDL Vorlesung SS2009
11
Detector Readout
Drift 2
Processing 4
Transmission
time
4.6
1,200,000 channels, 1,400,000 ADCs
10 bit 10 MSPS, 20 samples/event
preprocessing
A
+
D
65,000 MCMs, 18+3 channels/MCM,
4 CPUs at 120 MHz
4,100 Readout Boards, 16+1(+1) MCMs
8 bit 120MHz DDR readout tree
1,080 Optical Readout Interface links
2 links/chamber at 2.5 Gb/s
90 Track Matching Units (GTU)
1 TMU/module, Xilinx FPGA based
MCM
3+own:1
4:1 BM
Merger
0
4:1 HCM
ORI
...
12 optical links
5:1
TMU
GTU
Pretrigger
SMU
DDL
GTU
finished 6
Central Trigger Processor
time [µs]
© V. Angelov
x18
...
TGU
CTP
VHDL Vorlesung SS2009
12
Multi Chip Module
4 cm
PASA
Internal ADCs (Kaiserslautern)
Digital Frontend
and Tracklet Preprocessor
Master
State
Machine
CPU cores, memories & periphery
Global
I/O-Bus
Serial
Interface
slave
Network Interface
© V. Angelov
VHDL Vorlesung SS2009
External
Pretrigger
Serial
Interface
Readout
Network
13
10 bit 10 MHz,
12.5 mW, low
latency
TRAP block diagram
21 ADCs
Nonlinearity Correction
Filter
Pedestal Correction
Digital
Filters
Gain Correction
64 samples
Event
Buffer
Tail Cancellation
Crosstalk Suppression
Memory:
4 x 4k for instr.
1k x 32 Quad
ported for data
Hamming prot.
Hit Detection
CPU
Hit Selection
CPU
IMEM
Fitting Fitting Fitting Fitting
Unit Unit Unit Unit
DMEM
GRF
CPU
IMEM
Fit Register File
IMEM
CFG
CPU
Flags
IMEM
NI
4 x 8 bit 120 MHz
DDR inputs
GSM
Standby
Armed
Acquire
Process
Send
FIFO
FIFO
24 Mb/s serial network
SCSN
FIFO
PC
Decoder
FRF
PRF
GRF
CONST
FIFO
DMEM
IMEM
TRAP
0.18µm UMC
mixed-mode process
© V. Angelov
ALU
Pipe 1
8 bit 120 MHz DDR
Pipe 2
Bus
(NI)
4x RISC CPU @ 120 MHz
VHDL Vorlesung SS2009
14
Filter and Tracklet Preprocessor
ADC DFIL
ADC DFIL
Event Buffer
Condition Check
Digital FILter
64 timebins deep
Non- Offs Gain Tail- CrossLin
canc talk
Q
hit
Condition Check
Event Buffer
18+1 channels
ADC DFIL
Condition Check
Q
hit
COG
Position
Calc
LUT
Parameter
Calc
COG
Position
Calc
LUT
Parameter
Calc
COG
Position
Calc
LUT
Parameter
Calc
COG
Position
Calc
LUT
Parameter
Calc
Event Buffer
ADC DFIL
Condition Check
Event Buffer
ADC DFIL
© V. Angelov
Event Buffer
Q
hit
COG =
AL − AR
AC
FIT Register File and tracklet selection
ADC DFIL
Q
hit
Hit Select Unit (max. 4 hits)
Event Buffer
CPU0
CPU1
CPU2
CPU3
FIT register file is for
the CPUs a readonly
register file
VHDL Vorlesung SS2009
15
10
© V. Angelov
12
12
8
8
amp
R
y-1 y y+1
COG
pos
C
VHDL Vorlesung SS2009
to processor
max. 4 cand.
calculate sums
for regression
select candidates
max. 4 position
channel selection
position calculation
position correction
21x
event buffer
filter & event buffer
time bins
10
history fifo
nonlinearity corr.
pedestal corr.
gain adjustment
tail cancellation
crosstalk cancell.
21 digital channels
from ADCs (10 MHz)
Filter & Preprocessor
preprocessor
232
232
deflection
L
origin
16
Tracking Arithmetics
Hit Selection
Timer
Pedestal
Substraction
Qi +1 (t ) − Qi −1 (t )
Qi (t )
Look-Up
Table
+
+ Q (t) + Y(t) + Y2 (t) + X(t)·Y(t) + X2(t) + X(t)
+1
Fit Register File
© V. Angelov
VHDL Vorlesung SS2009
17
The MIMD Architecture
Local Bus
CPU 0
•
•
Local Bus
IMEM
Const.
•
CPU 1
Interrupt
IMEM
•
•
Evt. Buffer
FIT
Bus
Cnt/Timer
GRF
4x10
Network
Interface
D-MEM
10
•
•
IMEM
CPU 2
IMEM
•
•
CPU 3
4
Local Bus
© V. Angelov
Local Bus
Config.
4
•
Four RISC CPU's
Coupled by Registers (GRF)
and Quad ported data Memory
Register coupling to the
Preprocessor
Global bus for Periphery
Local busses for
Communication, Event Buffer
read and direct ADC read
I-MEM: 4 single ported SRAMs
Serial Interface for
Configuration
IRQ Controller for each CPU
Counter/Timer/PsRG for each
CPU and one on the global bus
Low power design, CPU clocks
gated individually
VHDL Vorlesung SS2009
18
MIMD Processor
Preprocessor, 4 sets of fit data
MIMD processor
4 CPUs
shared memory / register file
global I/O bus arbiter
separate instruction memory
coupled data & control paths
IMEM
CPU
•
•
•
•
•
Harvard style architecture
two stage pipeline
32 Bit data path
register to register operations
fast ALU
• 32x32 multiplication
• 64/32 radix-4 divider
•
•
maskable interrupts
synchronization mechanisms
© V. Angelov
DMEM
decoder
PC
interrupt
pipeline
register
•
•
•
•
•
CPU0
CON
GRF
FIT
PRF
select operands write back
ALU
clks
local I/O busses
rst
power
control
external
interrupts
I/O bus
arbiter
VHDL Vorlesung SS2009
global I/O bus
19
Local and Global IO
•
req
we
busf
CPU 0, 1, 2, 3
r/w addr
w data
r data
r/w addr
w data
r data
•
req
Arbiter
•
Configuration
unit
16
32
32
Global Bus Devices
•
•
Local bus uses the same r/w address and w_data signals.
The read data register on the global bus is a read only
device in the local bus.
© V. Angelov
Load/Store
Instructions
No tri-state, the output
data are ORed, the
non-selected devices
respond with 0
Synchronously
read/write on the
Global Bus (Arbiter),
the access time can
be programmed.
Read has priority over
write, the
configuration unit has
priority over CPU 0, 1,
2, 3
VHDL Vorlesung SS2009
20
SCSN - Slow Control Serial
Network
serial
Slave
ring 1
(DCS)
bridged
Slave
© V. Angelov
(DCS)
SCSN
•
Slave
Master
Slave
Slave
•
•
•
Slave
ring 1
Master
Slave
Slave
ring 0
ring 0
Slave
Up to 126 slaves per
ring
CRC protected
24 MBit/s transfer rate
16 addr., 32 databits/frame
VHDL Vorlesung SS2009
Slave
Slave
Slave
21
NI Datapath
10
CPU 3
CPU 4
port2
port1
port0
16
16
16
16
I/O 0
16
local I/O 1
I/O 1
16
local I/O 2
I/O 2
16
local I/O 3
I/O 3
16
global I/O
I/O G
16
16
16
16
clk
config
port3
16
local I/O 0
Network Interface
10
FiFo
64x16
16
16
FiFo
64x16
16
clk
GRF
CPU 2
10
clk
IMEM
CPU 1
global bus arbiter
DMEM
Network Interface
clk
Processor
10
FiFo
64x16
FiFo
64x16
• local & global I/O interfaces
• input port with data resync.
and DDR decoding
• input FIFOs (zero latency)
• port mux to define readout
order
• output port with DDR encoding
and programmable delay unit
16
port4
Delay units
10
© V. Angelov
VHDL Vorlesung SS2009
22
TRAP development – a long long
way
MCM for 8 channels with the
first prototypes of the Digital
chip (FaRo-1) and Preamplifier,
commercial ADCs
Beg 2001
© V. Angelov
First tested TRAP chip, in „spider“ mode
Summer of 2002
In total ≈60,000 lines (synthesis) and
18,000 lines (simulation) of VHDL code
VHDL Vorlesung SS2009
23
TRAP1 bonded on a MCM
1
2
6
3
On-chip FuseID
4
5
1. ADC
2. Filter/Preprocessor
3. DMEM
4. CPUs
5. Network Interface
6. IMEM
© V. Angelov
VHDL Vorlesung SS2009
24
13 ch.evt.buf
21 ADC Channels
TRAP Layout
8 ch.evt.buf
DataBuffer
21 independent
Digital filter
channels
CPU 3 CPU 2
Quad
Port
Memory
GRF
5x7
mm
CPU 0 CPU 1
IMEM 0
© V. Angelov
IMEM 2
IMEM 3
IMEM 1
network
IF FiFo
VHDL Vorlesung SS2009
25
TRAP internal tests
4 x 4k x 24
IM
0
1k x 32
DM
ADC
256 x 32
DB
N
P
G
T
C
© V. Angelov
r0
CPU0
21x
Event
buffer
Port 3
Global bus
In total 434
configuration
registers
r1
StateM (~25)
NI (~12)
IRQ(64)
Counters(8) ~130
Const(20)
Arbiter/
SCSN slv
LUT-nonl(64)
Gain corr(42)
LUT-pos(128)
~280
Fil/Pre(~44)
r1
VHDL Vorlesung SS2009
r0
26
Test flow of the MCM testing
• Apply voltages, control the
Programmable
currents
supply voltages,
• JTAG connectivity test
current control
LVDS
• Basic test using SCSN (serial
8 bit
configuration bus)
MCM0 MCM1
120MHz • Test of all internal components
Progr.
DDR
using the CPUs
Step
• Test of the fast readout
Gener.
MCM
• Test of the ADCs by applying
FPGA1
FPGA2
(DUT)
200 kHz Sin-wave
Progr.
• Test of the PASA by applying
Sin-wave
voltage steps through serial
Gener.
MCM3 MCM2
capacitors
• Store all data for each MCM in
separate directory, store in XML
SCSN
file the essential results
PCI(PC) • Export the result for MCM
marking and sorting
+1.8Vd
+3.3Vd
+1.8Va
+3.3Va
© V. Angelov
VHDL Vorlesung SS2009
27
MCM Tester and results
• test of 3x3 or 4x4 MCMs
• digital camera with pattern
recognition software for precise
positioning using an X-Y table
• vertical lift for contacting
• about 1 min/MCM for positioning
and test
NM 3%
BM 6%
BAD 4%
CM 1%
T.Blank, FZK IPE
(Karlsruhe)
• store the result into a DB
• mark later the tested MCMs with serial
Nr. and test result code
© V. Angelov
GOOD 86%
VHDL Vorlesung SS2009
28
TRAP wafer test and results
Programmable power supply
A
Programmable
sin-wave
generator
Parallel
readout
TRAP
Serial configuration interface
576 TRAP chips/wafer
Fully automatic partial test of the TRAP
100%
06/06
Up to now produced
and tested 201 wafers
with ~129,000 TRAPs,
of them ~98,000 usable
07/01
07/03-2
08/09
25
25
07/03-1
76%
06/02
Prep
25
© V. Angelov
06/09
25
49
49
3
25
VHDL Vorlesung SS2009
29
Optical Readout Interface (ORI)
120MHz
8 bit
DDR
Magnetic
field &
radiation
tolerant!
© V. Angelov
+24 ns
+24 ns
+300 ns
Conf.
Mem.
125MHz
I2C
LVDS-TTL
HCM (TRAP)
latency
SERDES
2.5GBits/s
CPLD
Laser
Driver
16
DDR
SDR
Resynchronization,
status, counters
VCSEL
Laser Diode
850 nm
All 1200 produced
and tested, 1199 of
them fully
functional
VHDL Vorlesung SS2009
30
DCS Board
• ARM based
technology
• 100k FPGA
flexibility
• 32MB
SDRAM
• LINUX
system with
EasyNet
© V. Angelov
VHDL Vorlesung SS2009
31
Power Distribution Box
Actel
antifuse
FPGA
Switch on/off
the power
supply to 30
DCS boards
Control of 9
PDB/2
Altera
ARM+FPGA
540 DCS boards in ALICE TRD
© V. Angelov
VHDL Vorlesung SS2009
32
Design of VLSI Circuits using VHDL
The ALICE TRD
Global Tracking Unit
An Example of a
large FPGA-based System
in High-energy Phyics
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
A Typical Example
• High-energy & heavy ion experiments:
– Huge amounts of data,
selection of interesting events → triggers
– Performance limited by data processing
power of front-end electronics
• Requirements for electronics:
– Complex trigger algorithms, very short decision times
→ high-performance & low latency processing
– Advanced trigger interlacing strategies to minimize
detector dead times (multi-event buffering)
→ high bandwidth data paths
– Demands change quickly as research advances
→ flexibility
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
2 /48
The Large Hadron Collider
LHC
• p-p @ 14 TeV
• Pb-Pb
@ 1150 TeV
CMS
LHCb
ATLAS
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
3 /48
The Experiment ALICE
ALICE
• Research on
Quark-GluonPlasma
• Many detectors
covering a wide
momentum
range & PID
• Designed for
high multiplicity
events in Pb-Pb
collisions
CMS
LHCb
ALICE
ATLAS
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
4 /48
ALICE & TRD
Transition Radiation
Detector (TRD)
lead nucleus
~ speed of light
© V. Angelov, F. Rettig
lead nucleus
~ speed of light
540 drift chambers in
6 layers, 90 stacks, 18
super-modules, |η| ≤ 0.9
VHDL Vorlesung SS2009
5 /48
Task of the TRD
ALICE Event
Display
TRD
• High multiplicities:
up to 8,000 charged
tracks in acceptance
• Fast trigger detector:
L1 trigger after 6.2µs
• Barrel tracking
detector: raw data
High-pt
track
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
6 /48
The TRD Data Chain
ALICE
Global Tracking Unit
ALICE
DAQ
Raw Data
Tracklets
Online Track
Reconstruction
Triggers
Event Buffering
Raw Data
1080 optical fibres
2.7 TBit/s
ALICE
CTP
L1 Triggers
L2 Triggers
Inside Magnet
Fast
Detectors
T0
TOF
V0
ACC
PreTrigger
Pre-Trigger
L0/L1 Trigger
On-Detector
Front-End
Electronics
• On-Detector Front-End Electronics: 65,564 ASICs
• Global Tracking Unit: 109 FPGAs
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
7 /48
On-Detector Data Processing
Particle
track
deflection
3cm
32-Bit Tracklet Word
Deflection
Drift
Volume
Position y, z
PID
Parameterization
of stiff track segment
Hits
Radiator Material
y position
• 540 drift chambers,
6 stacked radially,
18 sectors in azimuth
• 1.4 million analog
channels
• 10 MHz sampling rate
© V. Angelov, F. Rettig
• 65,564 Multi-Chip modules,
262,256 custom CPUs
• Up to 20,000 tracklet words,
32-Bit wide
• Massively parallel calculations:
hit detection, straight line fit,
PID information
• Transmission out of magnet
via 1080 optical fibres
operating at 2.5 GBit/s
• tracklets available 4.5µs
after collision
• 2.1 TBit/s total bandwidth
VHDL Vorlesung SS2009
8 /48
Tight Timing Requirements!
< 6.2 µs
Level-0
Trigger
Collision
Level-1
Trigger
< 1.8 µs
Drift Time
Fit Calculation
Tracklet Building
Raw Data Shipping
Tracklet Shipping
Tracking & Trigger
TRD Level-1 Trigger
Contribution Shipping
0
1
2
3
4
5
6
7
8
Time after Collision [µs]
Collision
Level-0
Trigger
1.2
Level-1
Trigger
Level-2 Trigger Window
5.0 µs
73.8 µs
493.8 µs
Tracklets
Raw Data
Accept/Reject
Data Forward to DAQ
Time after Collision
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
9 /48
Global Tracking Unit
• Fast L1 trigger after 6.2µs
• Detection & reconstruction
high-momentum tracks
• Calculation of momenta
• Various trigger schemes:
di-lepton decays (J/ψ, ϒ),
jets, ultra-peripheral collisions,
cosmics
• Raw data buffering
• Multi-Event buffering & forwarding
to data acquisition system
• Support interlaced triggers, multievent buffering, dynamic sizes
• 109 boards with large
FPGAs in three 19" racks
outside of magnet
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
10 /48
Global Tracking Unit
GTU segment for one TRD supermodule
© V. Angelov, F. Rettig
Patch panel with 60 fibres for one TRD supermodule
VHDL Vorlesung SS2009
11 /48
3-Tier Architecture
Tracklets from Front-End (1,080 optical fibres, 2.1 TBit/s)
Processing
Node
Processing
Node
···18x···
Stack
High-pt
Tracks
Supermodule
TRD
Supermodule
Raw
Data
Concentrator
Node
High-pt Tracks +
Trigger Information
···18x···
all track segments of
one detector stack
Online Track
Reconstruction
90 Boards
2.9 GByte/s into each Node
Concentrator
Node
Trigger Stage 1
Raw Data → DAQ
<3.5 GByte/s to DAQ
···18x···
Top Trigger
Node
Trigger Stage 2
few bit, ~200 Hz
TRD L1/L2 Trigger
Contribution
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
12 /48
Processing Node
• Inputs: tracklets & raw data via
12 optical data streams at 2.5 GBit/s each
→ 2.9 GByte/s per node,
261 GByte/s total
• Data push architecture
→ capture at full bandwidth of 2.1 TBit/s
• Tasks:
– Online Track Reconstruction
– Multi-Event Buffering
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
13 /48
850nm
Transceiver
SFP
Custom LVDS I/O
72 Pairs
Processing Node
3 Parallel
Links (240
MHz DDR, 8
Bit LVDS)
to SMU
from left TMU
to right TMU
JTAG
Data from
one
Detector
Stack
CompactPCI Bus
DDR2 SRAM:
High Bandwidth
(28.8 GBit/s) Data
Buffer
© V. Angelov, F. Rettig
Virtex-4 FX100
FPGA
VHDL Vorlesung SS2009
14 /48
Configurable Logic Blocks (CLBs) (1)
CLB Overview
Device
Array (3)
Row x Col
Block RAM
Virtex-4 FX Family
R
Logic
Cells
Slices
Virtex-4RocketIO
Family Overview
PowerPC
Max
Max
Total Max
Processor
Distributed XtremeDSP
18 Kb
Transceiver
Block
I/O
User
The Configurable
Logic Blocks (CLBs) are
the main Ethernet
logic resource
for implementing
(2) Blocks RAM (Kb) DCMs PMCDs
Slices
Blocks
RAM (Kb)
Banks
I/O
sequential
as well as combinatorial circuits.
Each CLBMACs
element isBlocks
connected
to a switch
matrix to128
access 2,304
to the general
matrix
in Figure 5-1).
128
XC4VSX25
64 x 40FPGA
23,040
10,240
160 (Continued)
4 routing
0
N/A(shownN/A
N/AA CLB element
9
320
Table
1: Virtex-4
Family
Members
XC4VSX35
XC4VSX55
R
contains four interconnected slices. These slices are grouped in pairs. Each pair is
192
192
3,456 SLICEM
8
4
N/Apair of slices
N/A in the N/A
11and 448
Block
RAM
organized
as a column.
indicates
the
left column,
SLICEL
(1)
96 Configurable
x 40
34,560
240
Logic 15,360
Blocks (CLBs)
128 x 48
55,296
24,576
designates the pair of slices in the right column. Each pair in a column has an independent
512
PowerPC
Max
RocketIO
Max
Total
Max
384
320
5,760
8
4
N/A
N/A
N/A
13
640
carry chain;
a common
shift chain.I/O User
Processorhave
Ethernet
Distributed XtremeDSP
18 Kbhowever,
Transceiver
Block only the slices in SLICEM
PMCDs
32 (2) Blocks
Slices
Blocks
MACs
RAM
Blocks
(Kb) DCMs
Banks
I/O
86(Kb)
36 RAM
648
4
0
1
2
N/A
9
320
Device
XC4VFX12
Array (3)
Row
Col
64 xx 24
XC4VSX25
XC4VFX20
64 x 40
36
23,040
19,224
10,240
8,544
160
134
128
number identifies
a column4 of slices.
counts
from9the left
128
2,304
0 The number
N/A
N/A
N/A
320to the
32
68
1,224
1
2 up in sequence
8
XC4VSX35
XC4VFX40
96 x 40
52
34,560
41,904
15,360
18,624
240
291
192
48
XC4VSX55
XC4VFX60
128 x 48
52
55,296
56,880
24,576
384 Models
25,280
395
CLB / Slice Timing
XC4VFX12
XC4VFX100
64 xx24
160
68
12,312
94,896
5,472
42,176
86
659
19,224 63,168
8,544
142,128
134
987
Logic
Cells
12,312
Slices
5,472
The Xilinx® tools designate slices with the following definitions. An “X” followed by a
General Slice Timing Model and Parameters
XC4VFX20
XC4VFX140
64 xx36
192
84
A simplified Virtex-4 FPGA slice is shown in Figure 5-20. Some elements of the Virtex-4
FPGA slice are omitted for clarity. Only the elements relevant to the timing paths described
XC4VFX40
96 x 52
41,904 18,624
291
in
this section are shown.
Notes:
right. A “Y” followed by a number identifies the position of each slice in a pair as well as
192 The
3,456
N/A
N/A
11
448
144
2,592
8 counts
4 slicesN/A
2
4 the bottom
12 in sequence:
the CLB row.
“Y” number
starting
from
0, 1, 0, 1
(the
first
CLB
row);
2,
3,
2,
3
(the
second
CLB
row);
etc.
Figure
5-1
shows
the
CLB
located
512
320
5,760
8
4
N/A
N/A
N/A
640 in
128
232
4,176
12
8
2
4
16
13
576
the bottom-left corner of the die. Slices X0Y0 and X0Y1 constitute the SLICEM column-pair,
32
36
648X1Y112
4
0 the SLICEL
1
2
9SLICEM
320
160
8
2
4
20
768
and slices376
X1Y0 6,768
and
constitute
column-pair.
ForN/A
each CLB,15
indicates
the
pair
of
slices
labeled
with
an
even
number
–
SLICE(0)
or
SLICE(2),
and
32
68
1,224
4
0
1
2
8
9
320
192
552
9,936
20
8
2
4
24
17
896
SLICEL designates the pair of slices with an odd number – SLICE(1) or SLICE(3).
48
144
2,592
8
4
2
4
1.
One CLB = Four Slices = Maximum of 64 bits.
SLICEM
SLICEL
FX
2.XC4VFX60
Each XtremeDSP
x 18 multiplier,
and an accumulator
FXINA
MUXFX one 18
128 xslice
52 contains
56,880
25,280
395 an adder,128
232 or Distributed
4,176 RAM
12 or Shift8Register) 2(Logic Only)4
(Logic
3. FXINB
Some of the row/column array is used by the processors Yin the FX devices.
SHIFTIN
COUT
160
XC4VFX100 160 x 68 94,896 42,176
659
376
6,768
12
8
2
4
XC4VFX140
G
inputs
D
LUT
192D x 84
LUT
Q
142,128 63,168
FF/LAT
FF
CE
YQ
987
192
CLB
552
9,936
CLK
20
Notes:
System
Common
1.
One CLB = Blocks
Four Slices = Maximum
of 64 bits. to All Virtex-4 Families
SR REV
2.
3.
Each
XtremeDSP slice contains one 18 x 18 multiplier, an adder, and an accumulator
BY
Some of the row/column array is used by the processors in the FX devices.
Xesium Clock Technology
F5
MUXF5
Switch
Matrix
8
2
4 (3)
SLICE
Slice
X1Y1
11
448
16
13
576
20
15
768
24
17
896
SLICE (1)
Slice
X1Y0
500 MHz XtremeDSP
Slices
COUT
Interconnect
to Neighbors
Dedicated 18-bit x 18-bit multiplier,
CIN
SLICE (2)
multiply-accumulator,
or
multiply-adder
blocks
Precision
clock
deskew
and
phase
shift
Slice
FF/LAT
LUT
FF
X0Y1
CE
• Optional pipeline stages for enhanced performance
System
Blocks
Common
Flexible
frequency
synthesis CLK to All Virtex-4 Families
SR
REV
• Optional 48-bit accumulator for multiply accumulate
Dual operating modes to ease performance trade-off
SLICE
(0)
decisions
operation
Xesium
Clock Technology
500(MACC)
MHz
XtremeDSP
Slices
Slice
X0Y0
BX
Improved maximum input/output frequency
•• Integrated
complex-multiply
CE
Dedicated adder
18-bit xfor18-bit
multiplier, or multiply-add
• Up
twenty phase
Digitalshifting
Clock resolution
Manager (DCM) modules
-CLK to
Improved
operation
multiply-accumulator,
or multiply-adder blocksug070_5_01_071504
SHIFTOUT
CIN
- SR Precision
clock deskew
Reduced output
jitter and phase shift
•
Cascadeable
Multiply
or MACC
• Figure
Optional
pipeline stages
for enhanced performance
- Figure
Flexible
frequency
synthesis
5-20: Simplified
Virtex-4 FPGA
General SLICEL/SLICEM
Low-power
operation
5-1: Arrangement of Slices within the CLB
•
Up
to
100%
speed
improvement
over previous
Optional
48-bit
accumulator
for multiply
accumulate
Dual
operating
modes
to
ease
performance
trade-off
EnhancedVirtex-4
phase detectors
Timing Parameters
Slice
Virtex-4
Configurable
Logic
Block
(CLB)
decisions
generation
devices.
(MACC) operation
- showsWide
phase
shift
rangefor a majority of the paths in
Table 5-5
the general
slice timing
parameters
Figure-5-20. Improved maximum input/output frequency
•
Integrated
adder for complex-multiply
or multiply-add
• Companion Phase-Matched Clock Divider (PMCD)
500 MHz Integrated
Block Memory
5: General Slice Timing
- Parameters
Improved phase shifting resolution
operation
blocks
meter
Function
15 /48
© V. Angelov,
F. Rettigoutput jitterDescription
SS2009
• Up to 10VHDL
Mb of Vorlesung
integrated block
memory
Reduced
• Cascadeable Multiply or MACC
•
X
Up to twenty
Digital Clock Manager (DCM)
modules
LUT
F
inputs
D
D
Q
XQ
UG070_5_20_071504
•
12
Multi-Event Buffering
• Allows for significant reduction of detector
dead time due to:
– Interleaved 3-level trigger sequences
– Decoupling of front-end electronics operation from
data transmission to data acquisition, 2-stage readout
Detector
FEE
Single-event
buffers in
each chip
L1 Accept
GTU
L2 Accept
or discard
Data
Acquisition
& HLT
Multi-event
buffers for
each stack
– Dynamic buffer allocation for strongly varying event sizes
– Buffers: 4-MBit SRAMs, 64-bit 200 MHz DDR interface
– 12 independent 128 bit wide data streams at 200 MHz
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
16 /48
Multi-Event Buffering II
• 12 independent data streams via 2.5 GBit/s links,
in fabric as 16-bit streams at 125 MHz (net 1.94 GBit/s)
• De-randomizing/gap elimination,
merging to single dense 128-bit 200 MHz data stream
to SRAM (>94% of all clock cycles, 23.3 GBit/s)
• Allocation of separate memory regions for each link/ event
(12 independent ring buffers, 2 write+1 read pointers)
raw data
data streams
from detector
END
L0 trigger
© V. Angelov, F. Rettig
16
··· x12 ···
END
L1A
16
125 MHz
Stream Merger
commas
128
128-bit Lines
tracklet data
··· 12x ···
200 MHz
SRAM
VHDL Vorlesung SS2009
17 /48
Event Buffering Pipeline I
7.3 Der Event Shaper
G#;)"#%)
A10$).+
4#%$
56'%1)*
78++
?
G#;)"#%)
A10$).,
?-+
"#%).#%/):.;#;)/<+=
G#;)"#%)
A10$).F
"#%).#%/):.;#;)/<,=
?
"#%).#%/):.;#;)/<F=
+,>
?-+
?-+
!"#$%#%$
&'(()*
+,>
F-+
+,>
"#%)./010
+,> "#%)./010.;#;)/<F=
+,>
)%/."#%).;#;)/<F=
@@@
"#%).20"#/
+,-+
)%/."#%)
"#%).20"#/.;#;)/<,=
"#%).20"#/.;#;)/<F=
)2)%1.36B;")1).;#;)/<,=
)2)%1.36B;")1).;#;)/<F=
#
+,:
)%/.
B0*C)*. 4
*)03D)/.
("0$E
9
+,86*
A
@@@
"#%).0/20%3)
)2)%1./6%)
031#2)."#%CE<#=
Abbildung 7.5: Die ersten drei Pipeline-Stufen des Datenpfades im Event Shaper. Die
18 /48
© V. Angelov,
F. Rettig
VHDL
Vorlesung
SS2009
grau
unterlegten und blau hervorgehobenen Strukturen
sind
für jeden
Kanal separat
Event Buffering Pipeline II
Pufferung der Ereignisdaten
"#$%&'()(&*#*%'+I-
3#*%"#$%
4)(5%&.
7I8
7I8
"#$%&'()(&*#*%'+,"#$%&6("#'&*#*%'+,-
"#$%&6("#'&
*#*%'+I-
01#)%&'()(
01#)%&%$(2"%
;CC9%)&D"1
1%9%)
7IJ7
G(/&;CC9%)&G()DH
:
78
01#)%&(''1%99
78
;CC9%)&*#*%'
;CC9%)
'<("&*;1)&=>?@
01#)%
*;1)
:
ABBBA
;CC9%)&(''&;$%
'()(
78
'()(
%$(2"%
(''1%99
7I/
#
1%9%)
C#19)
1<$ 4
&";5#D >
(''1%99
2($EK1
. "#$%&#$'%/&
*#*%'+7-
.
BBB
2($EK0
"#$%&#$'%/&*#*%'+.-
.
1%('
*;1)
ABBBA
"#$%&6("#'
*#*%'+.-
© V. Angelov, F. Rettig
!
78
>L@
"#$%&#$'%/&
*#*%'+I-
2";DE
9#F%
>
7IJ7
BBB
2(9%&(''1%99%9
2(9%&(''1%99
SRAM Interface
3#*%"#$%
4)(5%&,
;CC9%)&*(99
"#$E&D;G*"%)%'+#-
%$'&"#$%
*#*%'+.-
"#$%&#$'%/&*#*%'+I-
!
"#$%&#$'%/&*#*%'+,-
.
"#$%&#$'%/&
*#*%'+.-
Abbildung 7.7: Der zweite Teil des Datenpfades
im Event
Shaper.SS2009
VHDL
Vorlesung
19 /48
Nfrei,wp<rp = rp − wp − 1
(7.2)
Event Buffering Pipeline III
Der untere Teil von Abbildung 7.13 zeigt die Implementierung der Berechnung zusammen mit den Komponenten des Interfaces zur Read-Out-Einheit.
234,/35,
6)78,9!
,++.+9K97D.+)9?/78-
#HI
,C,5)9DE??,+
,<4)F
GGG
H
,<4)F
#$
.??-,)9434,0
#$
@A@B
J,
,C,5)
35?.=3>
+,
,C,5)9,509+,70
/35:9;.<4/,),0=3>
7D.+)
&
L
HI
&
'
(
#$
#$
/35,9350,I
434,0=!>
!"#
+,7094.35),+-
&
!"#
#$
%"#
/.J9<,<.+F
?/78-
L
&
L
1
!"#
!
A5),+?7;,9).9M,70&BE)9N53)
/35:97;)3C,=3>
/35,9350,I
434,0=%>
)*+,-*./0-
© V. Angelov, F. RettigAbbildung 7.13: Interface zur Read-Out-Einheit und Arithmetik
VHDL Vorlesung
SS2009
zur Speicherüberlauf-
20 /48
Global Track Matching
•
3D track matching: find tracklets belonging to one track
•
Processing time less than approx. 1.5µs
•
Integer arithmetics, logic & look-up tables
20°
Particle
Track
Tracklet
extrapolated
Projection plane
Charge
Clusters
Tracklet
- track bendings and tracklet misorientations exaggerated © V. Angelov, F. Rettig
VHDL Vorlesung SS2009
21 /48
Global Track Matching II
•
Projection of tracklets to virtual transverse planes
•
Intelligent sliding window algorithm: ∆y, ∆αVertex, ∆z
•
Massively parallel hardware implementation
20°
extrapolated
tracklets
αVertex
∆y
Projection
plane
y
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
22 /48
Momentum Reconstruction
•
•
•
Assumption: particle origin is at collision point
const
a
Fast cut condition for trigger: const ≤ pt,min · a
Estimation of pt from line parameter a: pt =
y
x
a
b
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
23 /48
An Example...
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
24 /48
Online Track Matching I
7.1 Overview of the GTU Trigger Computations
Input unit
(Layer 2)
Input unit
(Layer 1)
Input unit
(Layer 0)
Z-channel 1
Z-channel 0
Z-channel 2
Z-channel 0
Track finder
Z-channel 2
Z-channel 0
Track finder
(Ref. layer 1)
(Ref. layer 3)
Z-channel 0
Z-channel 1
(Ref. layer 2)
Track finder
Z-channel 1
Z-channel 2
(Ref. layer 1)
(Ref. layer 3)
Z-channel 0
Z-channel 1
Track finder
Track finder
Z-channel 1
Z-channel 2
• Up to 240 track
segments/event
(Ref. layer 2)
Track finder
(Ref. layer 2)
Track finder
(Ref. layer 1)
Track
parameter
memory
TMU trigger design
To SMU trigger logic
(Layer 3)
Z-channel 2
Track finder
pt Reconstruction unit
Input unit
Z-channel 0
Merging unit
(Layer 4)
(Ref. layer 3)
Z-channel 2
12 optical links (from detector stack)
Input unit
Track finder
Z-channel 1
Z-channel 1
(Layer 5)
Z-channel 2
Z-channel 0
Input unit
• 18 matching
units running in
parallel
• Fully pipelined,
data push
architecture
• Fast integer
arithmetic and
pre-computed
look-up tables
used
• High precision
pt reconstruction
∆pt/pt < 2%
• 60 MHz clock
Figure 7.5: The architecture of the TMU trigger design. The TMU receives the track segment
data
a module stack, combines it to tracks using several processing
stages
and
© V. Angelov,
F. from
Rettig
VHDL
Vorlesung
SS2009
25 /48
Input & Track Finder Unit
Kombination der Spursegmente zu Spuren
5.3 Eingangseinheit
...
Z-Channel Unit (Layer 5)
6
10
addr2 Y
3
idx
6
addr
Z-Channel Unit (Layer 0)
8
alpha
6
10
addr2 Y
3
idx
6
addr
8
alpha
Track
Finder
2 Optical Links from 1 Module
8
y
Angle
Calc.
Unit
y
Proj.
Unit
8
alpha
10
Y
y
Calc.
Unit
cnt
4
Z
MEM
y!
13 addr
40×21 Bit
rd
10
Y
4
Z
Z-Channel 0,1,2 Units
addrA
6
Cnt
Reg B
6
Incr
2
inc(0)
6
hit_mask
wr
rd
6
addrB
MEM
B0
addr
rd
Combination Logic
enable
6×addr
6×
...
6
6
Merging Unit
6
addr
Kombination der Spursegmente zu Spuren
Output Register Stage
8
alpha
2
inc(5)
MEM
A0
Cnt
Reg A
...
6×
addr
P
"
MEM
B5
6
wr
ab_sel(0)
Z
8
P
idx_a(5)
Y_a(5)
alpha_a(5)
4
Z
wr
13
y
7
d
Incr
rd
6
addrB
6×
...
idx_b(0)
Y_b(0)
alpha_b(0)
Buffer / Merging Unit
6
addrA
wr
addr
rd
Cnt
Reg B
idx_a(0)
Y_a(0)
alpha_a(0)
32
MEM
A5
Cnt
Reg A
addr
32
6
wr
ab_sel(5)
Input
Unit
Input
Controller 0
idx_b(5)
Y_b(5)
alpha_b(5)
Input
Controller 1
6
addr
6
13 8
addr y!
P
pt Reconst.
Unit
Abbildung 5.18: Der Aufbau einer Spurfindeeinheit. Es werden parallel die Spursegmentdaten aus
allen sechs
Ebenen übernommen und in jeweils zwei Speichern abgelegt. Pro Speicherblock existiert
Buffer / Merging Unit
ein Zählerregister für die Leseadresse. Die Kombinationslogik vergleicht die aus den Speichern
7
13
d
y Sie stellt fest, ob die Spursegmente gemeinsam eine Spur bilden und legt in jedem
gelesenen
Werte.
C
[12..4]
const.
Takt die Erhöhung
der
Zählerregister fest.
9
9
"0"
8
×
18
Angle Calc. Unit
8
[17..10]
entscheidet
und
die Erfüllung der Vereinigungskriterien überprüft. Der Aufbau ist in Ab8
5.3: Der Aufbau einer Eingangseinheit als Blockschaltbild. Die Eingangseinheit empbildung 5.18 cidargestellt.
Da aus jeder Datenreihe gleichzeitig zwei aufeinanderfolgende
1
pursegmentdaten von einem Detektormodul und führt die Berechnungen aus, die für
+
Datenworte betrachtet
werden, werden Speicher mit zwei unabhängigen Leseports benö8
nt unabhängig durchgeführt werden können.
Abbildung 5.7: Der Aufbau der Berechnungseinheit für
tigt. Alternativ
wird hier der
Aufbau aus zwei unabhängigen Speichern dargestellt, die
[7..1]
den Ablenkwinkel. Durch Addition der entsprechend der
7
alpha
geeignet erhalten
skalierten y-Koordinate
wird aus jeweils direkt aus den Sorparallel beschrieben werden. Detektorebene
Die Speicher
ihre Daten
der Ablenkung des Spursegments in guter Näherung der
Register Stage
Ablenkwinkel gegen die Vertexrichtung.
tierern der entsprechenden Z-Kanal-Einheiten.
Die Schreibadresse kommt für jede Ebene
aus
einem
gemeinsamen
Zähler,
der
für
jedes
Spursegment
um eins erhöht wird.
Die Le26 /48
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
PID Signature
Pad Row
Y Position
Deflection Length
ist definiert als
Reconstruction Unit I
• fully pipelined
data push
architecture
• optimized for
low latency
• High precision
pt reconstruction
∆pt/pt < 2.5%
• Uses addition,
multiplication and
pre-computed lookup tables
• 60 MHz clock
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
27 /48
Linux
Operating
System
4 GByte
mass
storage
SD Card
Controller
SDRAM
Controller
SRAM Ctrl.
•
Bus components
−
−
−
UART
PLB · 100 MHz
64 MByte
memory
PowerPC
400 MHz
DCS Board
Network
Switch
Embedded PowerPC System
−
−
EMAC
BRAM
Bootloader
MGT
•
Two PowerPC Cores:
−
Linux Operation System:
monitoring & control
(PetaLinux/Monta Vista)
−
HW/SW-Codesign (planned):
Level-2 trigger calculations,
real-time monitoring & control
System
Backplane
L2 trigger
data path
© V. Angelov, F. Rettig
PowerPC
400 MHz
Config/Stat
L2 trigger
DDR2 SDRAM controller
UART, Gigabit-Ethernet
SD Card controller
SRAM controller
Configuration & status
interface
data path
VHDL Vorlesung SS2009
28 /48
TMU Design Resource Usage
• 38,601 slices occupied (91%)
PowerPC
Event Buffering
– 45,716 logic LUTs (54%)
– 53,500 LUTs total (63%)
– 29,936 FFs (35%)
• 4 DCMs (33%), 1 PMCD (12%),
17 BUFGs (63%)
• 165 BRAMs (43%)
MGT
• 12 MGTs (60%), 9 DSPs (5%),
2 PowerPCs
• 345 IOB (56%)
• Gate equivalent: 11,625,408
Trigger
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
29 /48
TMU Design Resource Usage
PowerPC
Event Buffering
MGT
Resource
Event
Buffering
Tracking
FF
10,921
8,858
LUT
5,940
24,086
BRAM
14
78
DMEM
19 / 0
93 / 1,128
Embedded PowerPC System:
4,003 FFs, 4,068 LUTs, 69 BRAMs
Trigger
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
30 /48
VHDL Code
Design Part
Number of non-blank lines
Total
204,445
TMU
86,693
Synthesis
35,458
Event Buffering
15,127
Tracking
41,144
Simulation
SMU
10,023
40,371
Synthesis
35,458
Simulation
49,130
TGU
16,878
Common/Shared
60,055
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
31 /48
GTU Tracking Timing
• Computation latency depending on number and
tracklet content - 550 ns offset, rising only slightly
• Total latency depending heavily on number of tracklets
e Analysis
9.3 GTU Timing Pe
• Full hardware simulation with ModelSim and Testbench
0.6
Normalized count
0.5
0.4
1600
6
12
30
60
120
180
1400
0.3
0.2
0.1
1000
800
600
400
200
0
200
400
600
800
1000
1200
1400
1600
Data transmission and TMU trigger computation time / ns
0
0
248
226
214
202
190
188
166
154
142
130
128
10
96
84
72
60
48
36
30
24
18
12
6
0
1200
Track reconstruction / TMU trigger calculation
Segment buffering and synchronization
Track segment transmission
Plots: PhD thesis by J. de Cuveland
Segments per stack:
Average processing time / ns
0.7
Segments per stack
The total trigger processing time in the TMU is notFigure
constant
butThe
depends
heavily processing time in the TMU can be divided into three
9.12:
total trigger
on the number of received track segments — and thus on average on
the
multiplicity
with different dependencies
on the number
per
/48stack.
© V. Angelov, F. Rettig
VHDL Vorlesung
SS2009of segments32
TRD Beam Test at CERN
1 TRD
Supermodule
1 GTU
Segment
•
Accelerator: CERN Proton Synchrotron (PS)
•
Particles: Electrons, Pions
(Transverse Momenta: 0.5 – 6 GeV/c)
•
Good statistics for detector calibration (More
than 1 Mio. events per momentum value)
•
8 days of continuous operation
•
First run with tracklets, consistent with raw
data
Performance Analysis
700
Mean: -0.0916 cm
RMS: 0.119 cm
600
500
400
300
3
2
1
0
100
3
2
1
0
(cm)
200
100
0-1
Tracklets from TRD
Count
Single
Tracklet Deflection Precision
deflection error L0
120
x
y
-0.8
-0.6
-0.4
-0.2
Fri Feb 29 10:56:41 2008
© V. Angelov, F. Rettig
0
0.2
0.4
0.6
0.8
1
/ cm
Deflection Error /deflcm
80
3
2
1
0
60
3
2
1
0
ADC Count
3
2
1
0
November 2007 Beam Test Setup
at CERN Proton Synchrotron
40
3
2
1
0
20
-40
-30
-20
-10
0
10
20
30
40
0
(not to
Figure 9.2: Display of a single event in a TRD stack. The color represents the measured
ADC count
scale)
for a given position. The red lines depict the parametrized track segments as received
33is/48
VHDL
Vorlesung
SS2009
by the GTU for the same
event.
The width
of the drift chambers
exaggerated to
Simplified Event
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
34 /48
Simplified Event II
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
35 /48
Simplified Event III
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
36 /48
Realistic Pb-Pb Event
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
37 /48
Concentrator Node
• Inputs: reconstructed tracks from first tier
& raw data
• Tasks:
– Apply trigger schemes
– Interface to data acquisition system,
process trigger sequences and read-out
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
38 /48
Concentrator Node
SFP modules
1000Base-SX
to switches
DDR2
SDRAM
64 MByte
Link to ALICE
DAQ system
© V. Angelov, F. Rettig
From TMU 0
Custom LVDS I/O
72 Pairs
SD Card Slot
4 GByte SDHC
Cards
To TGU
JTAG
From TMU 1
From TMU 2
From TMU 3
From TMU 4
5 Parallel Links
(240 MHz DDR,
8 Bit LVDS)
CompactPCI Bus
Interface to
ALICE TTC
system
VHDL Vorlesung SS2009
39 /48
FSM Example: Trigger Handling
reset
L1_window_ok
Idle
L0
L1_wi
Tr_flush
L0
Erroneous
L0 restarts
L1 window timer
L1_window_ok
L1_wv
L0
L1
L2_wi
L1
Assume L0 to be correct
Erroneous L1 restarts
L2 window timer
L2_window_ok
L2_wv
L2a
L2_window_ok
L2r
SoD_EoD_dec
L2r_rcv
SoD + EoD
SoD_EoD
L2a_rcv
eoe
eoe
no_error
+
eoe
L2_timeout
eoe
to Idle
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
40 /48
Trigger Schemes
• Cosmic Trigger
• Jet Trigger
– Simple jet definition: more than certain number of
high-pt tracks through a given detector volume
– Additional conditions: jet location, coincidences, ...
– Ntracks=1: single high-pt particle trigger
• Di-Lepton Decay Trigger
– Coincidence of high-pt e± tracks
– Calculation of invariant mass for higher selectivity
• Various Other Schemes
– Ultra-peripheral collisions
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
41 /48
Cosmics Trigger
• Chamber: min ≤ sum of charge/hits ≤ max
• Stack: min ≤ chambers hit ≤ max
• Supermodule: min ≤ stacks hit ≤ max
• Detector: coincidence between
supermodules
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
trg
© V. Angelov, F. Rettig
trg
trg
trg
trg
VHDL Vorlesung SS2009
trg
42 /48
Cosmic Event Triggered
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
43 /48
!4 super-modules were commissioned successfully last year with
!ALICE global cosmic runs performed (Dec. 2007, Feb. 2008, and
(last one continued as LHC run).
K. Oyama, Uni Heidelberg
Cosmic Event Triggered II
A cos
recor
ALIC
and T
©Mar.
V. Angelov,
F. Rettig
13, 2009
Vorlesung SS2009
TIPP09,VHDL
K. Oyama
44 /48
Jet Trigger
• Identify tracks with pt ≥ pt,threshold
within certain region
• Threshold conditions:
• Number of tracks
• Sum of momenta of tracks
"#$%&'()*%"+,-./0/1-%2,3/,0/1-%4$0$501+%,.%6$0%"+/77$+
C/5/$-5D
!
"(#)*%++%,%-.%/"0
.%6%-7.%8"09:
p-p @ 10 TeV
pt-hard 86-180 GeV
• Granularity: sub-stack-sized
1234%7<%6%-7.%8"09:
B. Bathen, Uni Münster
areas overlapping in z- and Φdirection
• Realizable at first trigger stage
• Multi-Jet coincidence at top
level
!"#$%&'
"248"+/77$+%9$$0/-7:%;<=;;=>??@
© V. Angelov,
F. Rettig
VHDL Vorlesung SS2009
45 /48
Di-Lepton Trigger
• Find e+e- pairs with invariant mass within
certain range (J/ψ, ϒ, ...)
• Huge combinatorics for Pb-Pb collisions
• Current work:
– Pre-selection of track candidates, application of
sliding window algorithms
– Massively parallelized invariant mass calculation
in FPGA hardware
– Fast trigger contribution for Level-1 (after 6µs)
more elaborate decision for Level-2 (80µs)
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
46 /48
Waiting For LHC Start-Up
LHC restart
scheduled for end of
year...
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
47 /48
The GTU People
Venelin Angelov,
Jan de Cuveland,
Stefan Kirsch,
and Felix Rettig
Former members:
Thomas Gerlach,
Marcel Schuh
Prof. Volker Lindenstruth
Chair of Computer Science
Kirchhoff Institute of Physics
University of Heidelberg
Germany
http://www.ti.uni-hd.de
© V. Angelov, F. Rettig
VHDL Vorlesung SS2009
48 /48