Mota`s Ph.D. Thesis - Paulo Moreira - Home Page

Transcription

Mota`s Ph.D. Thesis - Paulo Moreira - Home Page
UNIVERSIDADE TÉCNICA DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
Design and Characterization of CMOS
High-Resolution Time-to-Digital
Converters
Manuel José dos Reis Gaspar Seabra Mota
(Licenciado)
Dissertação para a obtenção do Grau de Doutor em
Engenharia Electrotécnica e de Computadores
Orientador: Doutor José de Albuquerque Epifânio da Franca
Presidente: Reitor da Universidade Técnica de Lisboa
Vogais:
Doutor Dinis Gomes Magalhães dos Santos
Doutor Moisés Simões Piedade
Doutor José de Albuquerque Epifânio da Franca
Doutor Diamantino Rui da Silva Freitas
Doutor António Manuel da Cruz Serra
Doutor João Paulo Calado Cordeiro Vital
Doutor Alessandro Marchioro
Outubro de 2000
UNIVERSIDADE TÉCNICA DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
Projecto e Caracterização Experimental
de Circuitos Integrados CMOS para
Medição de Intervalos de Tempo com
Alta Resolução
Manuel José dos Reis Gaspar Seabra Mota
(Licenciado)
Dissertação para a obtenção do Grau de Doutor em
Engenharia Electrotécnica e de Computadores
Orientador: Doutor José de Albuquerque Epifânio da Franca
Presidente: Reitor da Universidade Técnica de Lisboa
Vogais:
Doutor Dinis Gomes Magalhães dos Santos
Doutor Moisés Simões Piedade
Doutor José de Albuquerque Epifânio da Franca
Doutor Diamantino Rui da Silva Freitas
Doutor António Manuel da Cruz Serra
Doutor João Paulo Calado Cordeiro Vital
Doutor Alessandro Marchioro
Outubro de 2000
Abstract
The subject of this thesis is the development and evaluation of high-resolution
Time-to-Digital Converter architectures suitable for the measurement of very short time
intervals in the context of the Time-of-Flight detector of the ALICE experiment.
The selected architectures are able to measure time intervals with a Root Mean
Square (RMS) resolution better than 50ps and a large dynamic range. Apart from the
timing characteristics of such TDC’s, their architectures enable the design of highly
integrated multi-channel converter ASIC’s operating with low power dissipation.
The developed circuits are based on Delay Locked Loop (DLL) architectures. The
feedback control loop of the DLL ensures that the time measurements are permanently
calibrated in relation to a reference periodic signal. Schemes to obtain fine time
interpolation without penalty in terms of added power dissipation or increased sensitivity
to environmental changes (supply voltage or temperature) are investigated and
implemented. Two different approaches are selected and their detailed analysis carried
out. One uses several phase shifted DLL’s and the other a passive RC delay line. The
prototypes that implement these schemes were built in a standard 0.7µm CMOS
technology. In the first approach, an RMS resolution of 34.5ps across a dynamic range of
3.2µs was measured. For the second, an RMS resolution of 21ps was obtained.
Keywords
Time-to-Digital Converter (TDC), Delay Locked Loop (DLL), self-calibration,
high-resolution, multi-channel, passive RC delay lines.
Page i
Page ii
Resumo
O objectivo desta tese é a avaliação e desenvolvimento de arquitecturas de
Conversão Tempo para Digital com alta resolução temporal adequados à medição de
intervalos de tempo muito curtos, no âmbito do detector de Tempo de Voo da experiência
ALICE.
As arquitecturas seleccionadas são capazes de medir intervalos de tempo com uma
resolução melhor do que 50ps (Desvio Quadrático Médio - RMS) ao longo de uma larga
gama dinâmica. Além das características temporais destes conversores, as suas
arquitecturas permitem a implementação de circuitos integrados específicos multi-canal,
operando com baixa dissipação de potência.
Os circuitos desenvolvidos são baseados em Malhas de Aquisição de Atraso (DLL)
fechadas. A realimentação negativa da DLL garante que as medições temporais estão
permanentemente calibradas tendo como referência um sinal periódico. Foram
investigados e implementados esquemas que permitem uma interpolação temporal muito
fina sem aumentar significativamente a dissipação de potência ou a sensibilidade do
esquema à variação das condições ambientais (tensão de alimentação ou temperatura de
operação). Dois destes esquemas foram seleccionados e a sua análise detalhada levada a
cabo. Um dos esquemas usa várias DLL’s com um atraso de fase fixo e o outro utiliza
uma linha de atraso passiva RC. Os protótipos em que foram implementados estes
esquemas utilizam uma tecnologia CMOS de 0.7µm. Com estes protótipos obtiveram-se,
respectivamente, resoluções de 34.5ps (RMS) ao longo de uma gama dinâmica de 3.2µs e
de 21ps (RMS).
Palavras Chave
Conversor Tempo para Digital (TDC), Malha de Controlo de Atraso (DLL), autocalibração, alta resolução, multi-canal, linhas de atraso passivas RC.
Page iii
Page iv
Acknowledgements
It goes without saying that I am indebted to all the people whose contribution, small
and large, made my work and my life easier during the period that I spent working for this
thesis; the list of their names would be too long to write down. However, I wish to
acknowledge in particular the help of my colleagues Jorgen Christiansen and Paulo
Moreira who had the kindness and patience to answer all my questions and whose
guidance and experience helped me to advance this work in the best direction.
I will also acknowledge the help of my supervisor, José Epifânio da Franca who was
always attentive to my requirements, even the most pressing ones.
I thank Gaspar Barreira and Paulo Gomes who started it all and Alessandro
Marchioro and Mike Letheren who welcomed me into the microelectronics group at
CERN and provided me with the proper means and environment to proceed with my
work.
An acknowledgement is also due to JNICT, whose support made it all possible1 and
to LIP, where the brave new world of microelectronics and High Energy
Physics was first shown to me.
Since life is not only work, even when that work is exciting, I greet cheerfully the
friends I met in Geneva, whose warmth and imagination made life abroad very interesting.
A final word is reserved to my family and friends back in Portugal who always found the
right way to let me know they cared, even after being away for so much time.
1
The author is supported by a grant from the Junta Nacional de Investigação Científica e Tecnológica
(JNICT) under the “Sub-Programa Ciência e Tecnologia do 2o. Quadro Comunitario de Apoio”.
Page v
Page vi
Contents.
PART I. Introduction.
1
1. Introduction and Structure of this Work.
3
2. Time Interval Measurements in HEP Experiments – An Introduction.
9
2.1. High Energy Physics experiments.
2.1.1. A HEP experiment at CERN: ALICE.
2.2. High resolution time interval measurements in ALICE.
3. Conversion Basics.
9
10
13
17
3.1. Performance metrics.
18
3.2. Error sources.
21
3.3. Converter calibration.
24
4. Review of TDC Architectures.
27
4.1. Overview of TDC architectures.
27
4.1.1. Current integration techniques.
27
4.1.2. Counter techniques.
29
4.1.3. Delay line-based techniques.
30
4.1.4. Phase Locked Loop (PLL) techniques.
31
4.1.5. Delay Locked Loop (DLL) techniques.
32
4.2. Beyond the limits of the technology: techniques to improve resolution.
33
4.2.1. Analogue time expansion.
33
4.2.2. Vernier differences.
35
4.2.3. Analogue time interpolation.
38
4.2.4. Array of coupled oscillators.
40
4.2.5. Array of Delay Locked Loops.
41
4.2.6. Time interpolation using passive RC delay lines.
43
4.3. Summary of characteristics of the TDC architectures.
44
References for Part I.
PART II. A TDC Architecture based on an Array of Delay Locked Loops.
5. Architecture Overview.
45
49
53
5.1. The Delay Locked Loop (DLL).
53
5.2. The Array of DLL’s (ADLL).
55
5.3. Conversion dynamic range.
57
5.4. Time critical paths.
59
5.5. Measurement acquisition and storage.
59
5.6. Read-out architecture.
60
5.7. The prototype.
62
Page vii
5.7.1. Performance analysis.
6. Analysis of the Limits to the TDC Resolution.
6.1. Non-linearity due to cell mismatch.
63
65
65
6.1.1. Origins of mismatch.
65
6.1.2. Effects of cell delay mismatch.
66
6.2. Jitter due to internal phase noise.
68
6.3. Non-linearity due to static phase error.
69
6.3.1. Effects of phase detector’s phase error.
70
6.3.2. Effects of phase detector input path’s mismatch.
72
6.3.3. Effects of unbalanced conditions of the cells in the extremes of the delay chain.
72
6.3.4. Effects of propagation delay on the sampling signal path.
74
6.3.5. Overall non-linearity due to static phase error.
76
7. Detailed Implementation.
7.1. DLL building blocks.
79
79
7.1.1. Phase detector.
79
7.1.2. Charge-pump and loop filter.
82
7.1.3. Delay cell.
86
7.1.4. Delay chain.
92
7.1.5. Closed control loop.
93
7.1.6. Initialisation procedure.
94
7.2. The ADLL.
95
7.3. Channel memory.
96
7.3.1. The store sampling signal distribution.
8. Experimental Results.
99
101
8.1. Delay cell range selection and charge-pump current level.
101
8.2. Converter linearity.
102
8.3. Linear time sweeps.
106
8.4. Inter-channel crosstalk.
107
8.5. Double hit resolution.
108
8.6. Power dissipation.
108
8.7. Summary of results.
108
8.8. Conclusion.
109
References for Part II.
111
PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line.
9. Architecture Overview.
113
117
9.1. Time interpolation circuit.
118
9.2. Adjustable RC delay line.
119
Page viii
9.2.1. Adjustable delay line by tap selection.
120
9.2.2. Adjustable delay line by lumped capacitor selection.
121
9.3. Auto calibration.
122
9.4. The prototype.
122
9.4.1. Choice of technology.
122
9.4.2. Prototype characteristics.
123
9.4.3. Performance analysis.
125
10. Adjustable RC Delay Line using a Tap Selection Scheme.
10.1. RC delay line.
10.1.1. RC delay line simulation model.
10.2. Tap selection delay line.
10.2.1. Tap selection circuitry.
10.3. Auto calibration circuitry.
127
127
129
131
136
137
10.3.1. Calibration algorithms.
138
10.3.2. Hardware implementation.
142
11. Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
11.1. Lumped capacitor delay line.
11.1.1. Lumped capacitor selection circuitry.
11.2. Auto calibration circuitry.
145
145
147
149
11.2.1. Calibration algorithm.
150
11.2.2. Hardware implementation.
153
11.3. Comparing the two adjustment schemes.
12. Experimental Results.
12.1. Tap selection scheme.
12.1.1. The complete interpolator.
154
155
155
157
12.2. Lumped capacitor scheme.
162
12.3. Conversion time offset.
164
12.4. Power dissipation.
165
12.5. Summary of results.
165
12.6. Conclusions.
165
References for Part III.
PART IV. Conclusion.
13. Summary of Results.
167
169
171
13.1. The ADLL architecture.
171
13.2. The DLL & RC delay line architecture.
172
13.3. TDC characterisation.
173
14. Future Developments.
175
Page ix
PART V. Appendixes.
179
A. TDC Characterisation Test Bench.
181
B. Analysis of the DLL Closed Loop Behaviour.
187
C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL.
189
D. Number of Random Samples Required for TDC Characterisation.
193
E. TDC Characterisation Hit Frequency.
197
F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition).
201
G. DNL-aware Algorithms for the RC Delay Line Calibration.
203
References for the Appendixes.
209
Page x
List of Figures.
PART I. Introduction.
Chapter 1. Introduction and Structure of this Work.
Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction.
Figure 1: The CERN particle accelerator complex (simplified) [4].
10
Figure 2: Longitudinal and transverse view of ALICE detector [3].
11
Figure 3: The hierarchical trigger data reduction block diagram of ALICE experiment [3]. 12
Figure 4: Schematic view of the TOF detector front-end.
13
Figure 5: The error propagation chain.
14
Chapter 3. Conversion Basics.
Figure 1: Ideal transfer characteristic of a 3-bit converter.
18
Figure 2: Example of a converter transfer function illustrating the static performance
metrics.
20
Chapter 4. Review of TDC Architectures.
Figure 1: Block and timing diagram of a differential Current Integrating TAC (from [3]).
28
Figure 2: Delay line using double inverters as delay elements.
30
Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals
from an odd-numbered oscillator.
31
Figure 4: Delay Locked Loop and hit registers.
32
Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher
[33].
34
Figure 6: Time expander circuit and corresponding timing diagram.
35
Figure 7: Time expansion using two delay lines with different cell delay.
35
Figure 8: Circular vernier scheme for dynamic range expansion.
36
Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark
in the vernier scale (lower) lines up with a tick mark in the reference scale (upper)
[36].
38
Figure 10: Time interpolation using voltage sums.
39
Figure 11: Time to analogue converter using a time interpolation technique [38].
39
Figure 12: Coupled oscillators (time resolution of td * 2 / 3).
40
Figure 13: Array of DLL’s with phase shifting DLL.
42
Figure 14: A TDC converter based on a DLL and a RC delay line.
44
PART II. A TDC Architecture based on an Array of Delay Locked Loops.
Chapter 5. Architecture Overview.
Figure 1: Delay Locked Loop block diagram.
54
Figure 2: Delay Locked Loop used in a time base application.
54
Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition.
55
Figure 4: Interpolation limits due to cell mismatch.
57
Page xi
Figure 5: Dynamic range extension using two coarse time counters.
58
Figure 6: Example of the first level of a read-out buffering hierarchy.
61
Figure 7: The prototype block diagram.
62
Figure 8: Prototype circuit showing main functional blocks.
64
Chapter 6. Analysis of the Limits to the TDC Resolution.
Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1%
(ADLL: N=35 and F=4, single DLL: N=140).
68
Figure 2: Standard deviation curve resulting from a closed loop jitter of σjitter=0.1%
of the reference period (ADLL: N=35 and F=4, single DLL: N=140).
69
Figure 3: Detail of a delay locked loop depicting the important delays within the loop.
70
Figure 4: Illustration of the effect of the phase detector’s phase error (N=5).
71
Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch
(N=5).
72
Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay
chain (N=5).
73
Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay
chain (N=5).
73
Figure 8: Illustration of the effect of the propagation delay on the sampling signal path - case of the linear hit signal distribution network (N=5).
74
Figure 9: The T-shaped hit signal distribution network.
75
Figure 10: Illustration of the effect of the propagation delay on the sampling signal path - case of the T-shaped hit signal distribution network (N=5).
75
Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase
detector input path’s mismatch): DPD(C / K + τdiff)=0.1% of the reference period
(ADLL: N=35 and F=4, single DLL: N=140).
77
Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells
in the extremes of the delay chain: Din(δin)=1% and Dout(δout)=1% of the average
cell (ADLL: N=35 and F=4, single DLL: N=140).
77
Figure 13: DNL and INL curves resulting from the propagation delay on the sampling
signal path (linear hit signal distribution network): Dhit(−τhit)=0.1% of the
reference period (ADLL: N=35 and F=4, single DLL: N=140).
78
Figure 14: DNL and INL curves resulting from the propagation delay on the sampling
signal path (T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the
reference period (ADLL: N=35 and F=4, single DLL: N=140).
78
Figure 15: DNL and INL curves resulting from the combination of the previous
curves (ADLL: N=35 and F=4, single DLL: N=140).
78
Chapter 7. Detailed Implementation.
Figure 1: D-flip-flop operating as a two-state phase detector.
79
Figure 2: General and D-FF based two-state phase detector transfer characteristic.
80
Figure 3: Balanced D-flip-flop topology.
81
Figure 4: Balanced D-flip-flop topology featuring fast SR#1 operation.
82
Figure 5: Charge-pump and filter capacitor block diagram.
83
Figure 6: Charge-pump topologies (simplified).
84
Page xii
Figure 7: Rising edge propagation along the DLL delay line and corresponding current
consumption.
87
Figure 8: The self-biased differential delay cell (from [18]).
88
Figure 9: The current-starved inverter delay cell (simplified version).
88
Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the
differential and current-starved inverter structure.
89
Figure 11: Simplified representation of the delay range partition.
90
Figure 12: The selectable-range current-starved inverter cell.
91
Figure 13: The selectable delay ranges (simulation).
92
Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch
of the phase signals.
93
Figure 15: Schematic representation of the delay range partition illustrating the viable
locking regions.
95
Figure 16: The ADLL tap distribution arrangement.
96
Figure 17: Functional diagram of the channel memory controller [3].
97
Figure 18: The two-level hit register (1 bit).
97
Figure 19: Two-stage synchroniser using D flip-flops.
98
Figure 20: Alternative control signal distribution configurations within a channel
memory row.
99
Figure 21: Integrated error histogram for the two proposed distribution configurations
(simulation).
100
Chapter 8. Experimental Results.
Figure 1: DNL and INL graphs for the ADLL.
102
Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the delay cell,
DPD=-0.1% and Dhit=0.1% of the reference period).
103
Figure 3: DNL and INL graphs for the different Timing DLLs (LSBDLL=4·LSB).
103
Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL=5·LSB).
104
Figure 5: The ADLL auto-correlation graph.
105
Figure 6: DNL and INL graphs for the converter along four reference clock periods.
105
Figure 7: Error graph and histogram resulting from a delay sweep of two reference
periods (σ=0.39LSB).
106
Figure 8: DNL and INL graphs obtained from the linear delay sweep results.
106
Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL).
107
Figure 10: Delay sweep over the full dynamic range.
107
Figure 11: Measurement error due to crosstalk in the worst configuration.
108
PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line.
Chapter 9. Architecture Overview.
Figure 1: Detail of DLL signal propagation illustrating time interpolation through
multiple delay line samples (in this example the number of samples acquired
is M=5).
117
Figure 2: Time interpolation circuit.
119
Page xiii
Figure 3: Continuous delay adjustment scheme based on control of the distributed
parameters (simplified).
120
Figure 4: Adjustable delay line using a tap selection scheme.
121
Figure 5: Adjustable delay line using a variable lumped capacitor scheme.
121
Figure 6: Block diagram of the prototype.
123
Figure 7: Prototype circuit showing main functional blocks.
125
Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme.
Figure 1: RC line divided in two segments at access point x. R and C are, respectively
resistance and capacitance per unit length.
128
Figure 2: Delay line division into equally sized sections.
129
Figure 3: Electrical model of an infinitesimal segment of a transmission line (the
T-network).
130
Figure 4: Detail of the physical microstrip line and its equivalent simulation model.
130
Figure 5: Delay line segments’ length adjustment.
133
Figure 6: Adjustment function values.
134
Figure 7: Signal’s rise time along the original and the adjusted delay line, in typical
conditions (simulated).
134
Figure 8: Delay and cumulative delay of each line segment (from simulations).
135
Figure 9: The leading and trailing adaptation sections.
135
Figure 10: Segment delay sensitivity to operating conditions (from simulations). The
first and second graphs correspond, respectively, to the same line with and
without leading and trailing sections.
136
Figure 11: The access point selection circuitry.
137
Figure 12: Calibration procedure for the tap selection adjustment scheme.
140
Figure 13: Results of calibration for different conditions, using the iterative algorithm
(from simulation).
140
Figure 14: Results of calibration using the optimum linearity limit (from simulation).
141
Figure 15: Results of calibration for different conditions (from simulation).
142
Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
Figure 1: Adjustment function values (calculated and actually implemented).
146
Figure 2: Bin size (from simulation). The first graph compares different design corners.
The second graph shows the effects of extreme environment variations for the
typical process.
147
Figure 3: The unit capacitor bank.
148
Figure 4: The lumped capacitor selection circuitry.
148
Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation).
149
Figure 6: The coarse calibration procedure.
151
Figure 7: The fine calibration procedure.
152
Figure 8: Results of the coarse calibration step for different conditions using the
proposed algorithm (from simulation).
152
Page xiv
Figure 9: Results of the fine calibration for different conditions using restrictive
linearity limits (from simulation).
153
Chapter 12. Experimental Results.
Figure 1: Delay line calibration results: DNL and INL graphs.
156
Figure 2: Spread of the RC line tap delay over the DLL cells.
156
Figure 3: Temperature dependency of the RC delay line.
157
Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable
delay line).
157
Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows.
158
Figure 6: Comparison of the INL graphs of the DLL and of the complete converter.
159
Figure 7: Conversion error (σ=0.51LSB).
159
Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and
σ=0.52LSB/60oC).
160
Figure 9: DLL linear time sweep.
160
Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes
of the delay chain.
161
Figure 11: DLL conversion error (σ=0.29LSBDLL).
161
Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor
adjustment scheme).
162
Figure 13: DNL and INL graphs of the converter (using the lumped capacitor
adjustable delay line).
163
Figure 14: Comparison of the INL graphs of the DLL and of the complete converter.
163
Figure 15: Conversion error (σ=0.44LSB).
164
Figure 16: DLL conversion error (σ=0.29LSBDLL).
164
PART IV. Conclusion.
Chapter 13. Summary of Results.
Chapter 14. Future Developments.
Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC
with four times smaller LSB, using the same building blocks and an RC
delay line.
176
Figure 2: The general purpose TDC architecture.
176
Figure 3: Block diagram of the general purpose TDC.
177
PART V. Appendixes.
Appendix A. TDC Characterisation Test Bench.
Figure 1: The linear passive delay generator block diagram (computer controlled).
183
Figure 2: The linear passive delay generator block diagram (automated).
184
Appendix B. Analysis of the DLL Closed Loop Behaviour.
Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity
of a DLL.
Figure 1: Voltage controlled delay line with fixed length.
189
Page xv
Appendix D. Number of Random Samples Required for TDC Characterisation.
Figure 1: P(-zα/2 < Z < zα/2) = 1-α.
194
Appendix E. TDC Characterisation Hit Frequency.
Figure 1: The clock multiplying PLL.
199
Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition).
Figure 1: Detail of a delay locked loop depicting the important delays within the loop
(notice the alternative location of tap 0).
201
Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration.
Figure 1: Calibration procedure for the tap selection adjustment scheme.
204
Figure 2: The coarse calibration procedure.
206
Figure 3: The fine calibration procedure (first loop).
207
Figure 4: The fine calibration procedure (second loop).
208
Page xvi
List of Tables.
PART I. Introduction.
Chapter 1. Introduction and Structure of this Work.
Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction.
Chapter 3. Conversion Basics.
Chapter 4. Review of TDC Architectures.
Table 1: Comparison between the different architectures discussed in the chapter.
44
PART II. A TDC Architecture based on an Array of Delay Locked Loops.
Chapter 5. Architecture Overview.
Chapter 6. Analysis of the Limits to the TDC Resolution.
Chapter 7. Detailed Implementation.
Table 1: Summary of noise sensitivity and power consumption analysis.
90
Table 2: Summary of noise sensitivity and power consumption analysis for the proposed
cell.
92
Chapter 8. Experimental Results.
Table 1: Locking status for each working range, after the initialisation procedure.
101
Table 2: Summary of the linearity obtained for each DLL in the array (LSBDLL=4·LSB
and LSBDLL-PS=5·LSB).
104
Table 3: Characteristics of the TDC prototype.
109
PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line.
Chapter 9. Architecture Overview.
Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme.
Table 1: Comparison of the two proposed algorithms.
143
Table 2: Register (accumulator) requirements for the two proposed algorithms.
143
Table 3: Comparator requirements for the two proposed algorithms.
144
Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
Table 1: Register (accumulator) requirements for the present algorithm.
153
Table 2: Comparator requirements for the present algorithm.
153
Chapter 12. Experimental Results.
Table 1: Characteristics of the TDC prototype.
165
PART IV. Conclusion.
Chapter 13. Summary of Results.
Chapter 14. Future Developments.
Table 1: Timing specification of the general purpose TDC.
178
Page xvii
PART V. Appendixes.
Appendix A. TDC Characterisation Test Bench.
Appendix B. Analysis of the DLL Closed Loop Behaviour.
Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of
a DLL.
Appendix D. Number of Random Samples Required for TDC Characterisation.
Appendix E. TDC Characterisation Hit Frequency.
Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition).
Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration.
Page xviii
Glossary of Acronyms.
ADC
Analogue-to-Digital Converter
ADLL
Array of Delay Locked Loops
ALICE
A Large Ion Collider Experiment
ASIC
Application Specific Integrated Circuit
CDT
Code Density Test
CERN
European Organisation for Nuclear Research
CMRR
Common Mode Rejection Ratio
CMOS
Complementary Metal-Oxide-Silicon Field Effect Transistor Logic
CUT
Channel Under Test
DAQ
Data Acquisition System
D-FF
D-type Flip-Flop
DLL
Delay Locked Loop
DNL
Differential Non-Linearity
DUT
Device Under Test
HEP
High Energy Physics
HMPID
High-Momentum Particle Identification
HRTDC
High Resolution Time-to-Digital Converter
IC
Integrated Circuit
INL
Integral Non-Linearity
ITS
Inner Tracking System
JLCC
J-Leaded Chip Carrier
LADAR
Laser Radar
LHC
Large Hadron Collider
LIDAR
Light Detection and Ranging
LIP
Laboratório de Instrumentação e Física Experimental de Partículas
Page xix
LSB
Least Significant Bit
NMOS
N-Channel Metal-Oxide-Silicon Field Effect Transistor
PDF
Probability Density Function
PECL
Positive Emitter Coupled Logic
PHOS
Photon Spectrometer
PID
Particle Identification
PLCC
Plastic Leaded Chip Carrier
PLL
Phase Locked Loop
PMOS
P-Channel Metal-Oxide-Silicon Field Effect Transistor
RC
Resistive-Capacitive
RMS
Root Mean Square
TAC
Time-to-Amplitude Converter
TDC
Time-to-Digital Converter
T/D
Time-to-Digital
TOF
Time-of-Flight
TPC
Time Projection Chamber
VCDL
Voltage Controlled Delay Line
VCO
Voltage Controlled Oscillator
Page xx
PART I.
INTRODUCTION.
Page 1
Page 2
Chapter 1.
Introduction and Structure of this Work.
In this thesis we describe the development and demonstration of architectures
adapted for the accurate measurement of short time intervals. High-resolution time
measurements have been performed in the past using instruments based on analogue
measurement techniques. These instruments were built using discrete components or
using a single Integrated Circuit (IC) employing special high performance “analogue”
technologies.
Our goal is to evaluate and demonstrate architectures that are suitable for monolithic
integration and which can be built in a standard CMOS technology. The ability to share
the same time interpolator between several measurement channels is also a major aim of
the work. Furthermore, it is intended that these architectures be implemented together
with all the necessary digital signal processing circuitry to build a converter with full
functionality.
Although the emphasis of this work is the architecture development, we carried out
detailed analysis of the critical circuitry that determines the timing performance of the
converter.
Domain of application of this work.
The work was carried out at the “European Organisation for Nuclear Research”
(CERN), in Geneva, as a collaboration between the Microelectronics group and the
“Laboratório de Instrumentação e Física Experimental de Partículas” (LIP), Lisbon.
Therefore, emphasis is given to the specific requirements of the High-Energy Physics
experimental environment. Nevertheless, the conclusions we obtain from the work are
applicable in any domain where high-resolution time measurements are required, for
example in LIDAR (LIght Detection And Ranging) and LADAR (Laser rADAR)
applications. Our work contains contributions that can be useful in the domain of phase
and delay synthesis, in applications such as time bases for digital oscilloscopes, phase
modulation and demodulation as well as phase synchronisation.
Page 3
Structure of the thesis.
The structure of this thesis follows naturally the developments achieved along the
duration of the work. It is divided into four parts, each describing a major milestone of the
work.
In the first part of this thesis, we start with an introduction to the subject. It includes
a brief description of the goals of a High-Energy Physics experiment and the systems
needed to achieve them. The necessity of high-resolution time measurements is
emphasised together with the particular constrains of the experimental environment
(Chapter 2.).
A general overview of the interesting characteristics of a Time-to-Digital Converter
(TDC) is given in the form of the set of characterisation metrics that we used throughout
the work to evaluate the time performance of T/D converters. A short description of the
effects of the quantisation error and of the different noise sources that may be present is
also given (Chapter 3.).
We then present a brief review of the common types of time interval measurement
systems that have been used in the past, highlighting their advantages and disadvantages.
This review includes recent proposals that aim at the same goals as the ones pursued in
this work (Chapter 4.).
In the second part of this thesis we develop the analysis carried out to evaluate an
architecture based on an Array of Delay Locked Loops (ADLL). As a corollary of this
evaluation, a TDC demonstrator was built based on this architecture.
An overview of the time interpolation scheme resulting from the phase shifting of a
number of Delay Locked Loops (DLL) is presented. We review the main features of the
scheme, emphasising its inherent advantages and difficulties. A block diagram and a short
description of the TDC prototype is presented, together with the estimated timing
performance (Chapter 5.).
A detailed analysis of the causes of non-linearity that degrade the performance of a
DLL-based converter is derived and an analytical model that predicts their effects in the
conversion characteristic is presented. This analysis is extended to the ADLL-based
converter. A similar analysis is carried out for the phase noise generated due to the
dynamics of the DLL operation (Chapter 6.).
Having established a model for the causes and consequences of non-linearity and
phase noise, the critical circuit blocks are then described. Ways to improve their
performance and ensure that they match the required characteristics are proposed
(Chapter 7.).
We then proceed to present the experimental results obtained from the prototype
TDC that was built based on this architecture, and demonstrate that these results are in
accordance with the analysis carried out (Chapter 8.).
Page 4
Chapter 1: Introduction and Structure of this Work.
In the third part of this thesis, a new architecture suitable for low power operation is
proposed. The basic building block of this architecture is also a DLL, but finer time
interpolation is obtained using passive RC delay lines. The principle of operation of this
new architecture is described. The main characteristics of the architecture are detailed,
with an emphasis on the interesting properties of RC delay lines. Two alternative
adjustable delay line schemes are proposed. A block diagram and a short description of
the TDC prototype built using this architecture is presented and an estimation of the
timing performance exposed (Chapter 9.).
We then carry out the detailed analysis of the adjustable RC delay line based on a
tap selection scheme. We develop a simulation model of the distributed delay line that
includes all the significant devices (lumped or distributed) that contribute to its delay
characteristics. We propose a method to derive the dimensions of each of the segments
into which the line is divided based on the delay requirements as well as on the dimension
of the surrounding circuitry. A few calibration algorithms are also proposed and their
performance is illustrated based on simulated delay line conditions (Chapter 10.).
The same kind of analysis is performed for the adjustable RC delay line based on a
variable lumped capacitor scheme. We present different calibration algorithms (Chapter
11.).
As a corollary of this part of the work we present the experimental results obtained
from a demonstrator TDC built using this architecture. Based on these results, we validate
our analysis and confirm that this architecture performs as expected (Chapter 12.).
The concluding part of this work is divided into two chapters. In the first, we
highlight the contributions and developments carried out during this work (Chapter 13.).
In the second, we propose what amounts to be the logical conclusion of this work: a
general purpose TDC architecture using the DLL / RC delay line based architecture that
we developed. This TDC is able to perform alternatively low resolution measurements in
a large number of integrated channels or high-resolution time measurements in a small
number of integrated channels (Chapter 14.).
Finally a few appendices, complimentary to the main text, are included. They
expand and complete the explanations given in the main text. Of relevance is the
description of the test bench that we developed specifically for TDC characterisation. This
test bench was used throughout the work to evaluate the TDC prototypes that were built
(Appendix A.).
Main contributions of this work.
As the structure of the thesis makes clear, we will present two integrated circuits
that demonstrate two different solutions for the multi-channel, high-resolution time
measurement system requirements.
Page 5
• A four channel high-resolution TDC. This IC implements the Array of Delay
Locked Loops (ADLL) architecture. Apart from the extended dynamic range
time interpolation core this circuit also integrates digital logic to perform
important functions such as encoding, buffering and read-out management.
• A two channel high-resolution TDC. This IC implements a novel time
interpolation architecture, based on a DLL and a passive RC delay line. This
architecture allows for higher resolution with lower power operation.
Some important results were obtained while designing these circuits. They are
presented in this work:
• A detailed study of the behaviour of a Delay Locked Loop (DLL) was carried
out. We show how different error mechanisms affect the accuracy of the time
interpolation and propose solutions to minimise these effects.
• These studies are extended to the more complex case of the Array of DLL’s
(ADLL). We show that for a given device mismatch level, there is an optimal
interpolation factor (number of DLL’s in the array) that results in a consequent
improvement of the resolution of a converter built this way.
• An alternative architecture that avoids some of the limitations identified on the
ADLL-based architecture, such as power dissipation and maximum resolution
that can be obtained.
• A procedure to compensate for technological tolerances in tapped passive RC
delay lines is proposed. We proceed to present several methods to characterise
and adjust these lines. We then analyse the possibility of integrating the
adjustment algorithms in the same IC.
Related publications.
The contributions made during the course of this research led to the following
publications:
Mota, M., Christiansen, J., A high-resolution time interpolator based on a Delay
Locked Loop and an RC delay line, IEEE Journal of Solid-State Circuits, vol. 34, no. 10,
pp. 1360-1366, Oct. 1999.
Mota, M., Christiansen, J., A four channel, self –calibrating, high-resolution Timeto-Digital Converter, Proceedings of the 5th. IEEE International Conference on
Electronics, Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 1998.
Mota, M., Christiansen, J., A high-resolution Time-to-Digital Converter based on an
Array of Delay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHC
Experiments, London, UK, Sep. 1997.
Page 6
Chapter 1: Introduction and Structure of this Work.
Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49,
ALICE/2000-02 internal note/TOF, Mar. 2000.
Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EP
internal note, Geneva, Switzerland, 1997.
Contributions in the field of microelectronics applied to the High-Energy Physics
domain led to the following additional publications:
Mota, M., Gomes, P., Christiansen, J., MEC3 – A pipelined zero-suppression and
trigger matching chip, IEEE Transactions on Nuclear Science, vol. 42, no. 4, pt. 1, pp.
808-811, Aug. 1995.
Gomes, P., Mota, M., Christiansen, J., NANA – An integrated signal processor and
record builder for level-2 read-out of asynchronous event-filtering digital pipelines, IEEE
Transactions on Nuclear Science, vol. 42, no. 4, pt.1, pp. 849-853, Aug. 1995.
Page 7
Page 8
Chapter 2.
Time Interval Measurements in HEP
Experiments – An Introduction.
High-Energy Physics (HEP), or particle physics, is the discipline that explores and
tries to understand the deep structure of matter [1]. As the discipline evolved, some
models where developed to explain this structure. As in any scientific endeavour, the
particle physicist is not satisfied until his theoretical developments – the models – have
been demonstrated by experimental means. His experiments may, however, bring to light
finer, and not completely understood, phenomena. The cycle of scientific progress is now
closed: new models have to be developed which require the elaboration of new and more
performant experiments to verify them.
2.1.
High-Energy Physics experiments.
The quest for the structure of the matter has been a progressive effort. In parallel
with this effort, and enabling it, a big development effort has been dedicated to the design
of new and more powerful machines that act as “microscopes” exposing the ever smaller
and hidden constituents of the matter.
These “microscopes” take the form of particle accelerators, where bunches of
particles (for example ions, protons, electrons, etc) accelerated to very high energies are
made to collide. The interaction between these particles, due to the bunch collision, results
in the conversion of the original particles into a diversity of new particles, in a process
akin to the breaking up of a nucleus into its constituent protons and neutrons, when
bombarded by other energetic particles. It’s these new particles that are the object of the
attention of the physicist, since they explain how the original particle is made and how it
interacts with its environment.
Surrounding the interaction point (where bunches of particles collide) is a complex
set of detectors, sensitive to the different kinds of particles generated at the interaction
moment. As these resulting particles transverse the detectors, some of their energy is
captured by the detector, which converts it into an electrical signal (charge, current or
voltage). This signal is then amplified and processed by the front-end electronics from
where it is transferred to powerful computers.
Page 9
Traditionally, only the pre-amplifier would be mounted close to the respective
detector cell. Its function was to optimally shape the detector signal and drive it through
15 to 50 meters of cable up to the electronics hut, where all the front-end processing
would be performed. In modern experiments, where very high granularity is needed, with
well over 106 cells with independent sensors, this topology is no longer applicable.
Fortunately, state-of-the-art technology can be used to integrate the required front-end
electronics into a limited number, or even a single ASIC (Application Specific Integrated
Circuit) that can be directly mounted on the detector. In this way, a vast quantity of cables
is avoided and a higher function density and lower power dissipation is achieved [2].
All the phenomena that are studied in a HEP experiment abide to statistical laws.
The quantities that are to be measured with a detector sensor, either the amount energy
deposited or the moment and position of the particle crossing also include some
uncertainty in relation to their exact value. Therefore, multiple similar events must be
analysed, the standard deviation of their statistical distribution being of relevance to their
identification.
2.1.1.
A HEP experiment at CERN1: ALICE.
One of such detector systems is being developed in the context of the ALICE
collaboration (A Large Ion Collider Experiment) [3]. The main goal of this collaboration
is to study experimentally the collision of heavy ions (for example, lead ions) at high
energy densities.
Figure 1: The CERN particle accelerator complex (simplified) [4].
These ions are accelerated to very high energies by a group of accelerator machines
connected in series that culminate on the Large Hadron Collider (LHC), a 27Km
1
CERN: European Organisation for Nuclear Research, Geneva, Switzerland.
Page 10
Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction.
perimeter circular accelerator. The LHC will include the interaction point where the
ALICE detector will be built to observe the particle collision (see Figure 1).
The LHC accelerator itself is made of two identical rings where bunches of ions (or,
alternatively, protons) travel in opposite directions with high energy. In the interaction
points, the two rings intercept and the particle bunches are allowed to collide.
The detector system itself is a group of detectors [3], each optimised to observe
different ranges of particles emerging from the interaction point. These detectors comprise
an Inner Tracking System (ITS) with six layers of high-resolution silicon tracking
detectors, a cylindrical Time Projection Chamber (TPC) and finally a large area Particle
IDentification (PID) array of Time-Of-Flight (TOF) counters.
The TPC is the main tracking system of the experiment. The ITS in mainly used for
detailed reconstruction of the vertex of the interaction very close to its origin. Both of
them also aid the PID detector in the identification of particles.
In addition, a few specialised detectors are included: the electromagnetic calorimeter
(PHOS – PHOton Spectrometer), the High Momentum PID (HMPID), the muon
spectrometer and others. An outer magnet is necessary to bend the trajectory of charged
particles, thereby easing their identification (Figure 2).
Particle are identified by two different mechanisms. Low and medium momentum
particles are identified, respectively, in the ITS and in the TPC by the dE/dx technique
(the rate at which they loose energy as they transverse the detector). Higher momentum
particles are identified in the PID detector using the TOF technique (the time that the
particle takes to progress from the interaction point to the detector surface).
Figure 2: Longitudinal and transverse view of the ALICE detector [3].
The amount of data generated after each bunch collision (or event) is very large. To
reduce the bandwidth requirements on the data acquisition (DAQ) system, and also the
Page 11
amount of memory needed for data storage, on-line data reduction algorithms are applied
to the data.
The data reduction algorithms take advantage of the spatial and temporal
characteristics of the events: only a limited number of detector cells are actually crossed
by an emerging particle. The output of the other, idle, cells can safely be discarded since it
contains no information. This operation is called “zero-suppression”. Furthermore, not all
the events are interesting to study. It is possible to implement in hardware algorithms that
sample the data of selected detectors to decide if an event includes some interesting
characteristics that deserve further attention. Otherwise, all data pertaining to that event
may be discarded. This operation is called “trigger based data reduction”.
In general, several levels of trigger based data reduction are implemented. They
correspond to a hierarchy of data reduction algorithms that are progressively more
selective. However, they are also more complex and slow.
Figure 3: The hierarchical trigger data reduction block diagram of the ALICE experiment [3].
The principle of the trigger based data reduction hierarchy in ALICE is pictured in
Figure 3 [3]. A first level of data reduction (L0) is used simply to signal the existence of
an interaction as soon as possible. It is not a very selective filter. The second level of data
reduction (L1) already uses information on the quality of the event to produce a large
reduction in accepted event rate. Both of these trigger processors produce a decision with
a fixed latency. After the L1 trigger decision is taken, the read-out of the data from all
detectors is started, pending the more selective decision of the third level trigger (L2). At
Page 12
Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction.
that moment, the read-out of the detector’s data into the DAQ system can be finalised.
Overall, an event rate reduction of the order of 103 is obtained. Consequently, the
bandwidth of the DAQ system that is needed is proportionally reduced.
2.2.
High-resolution time interval measurements in ALICE.
The efficiency of the particle identification using the TOF technique is directly
related to its time resolution. This is especially critical in the higher momentum side of the
identification range [5]. As a consequence, the TOF detector in the ALICE experiment is
an array of sensors (counters) having a high time resolution (from σdet~40ps to 100ps,
depending on the detector technology chosen).
The detector sensor is only a small part of the system. The front-end electronics also
generate some time uncertainties that will add up to the intrinsic detector resolution,
limiting the overall time resolution of the system. A simplified view of the front-end
electronics proposed for the TOF detector is shown in Figure 4. The time of flight of the
particle resulting from the interaction is the difference between the instant when the
interaction occurred, t0, which is captured by a specialised detector (the t0 detector) and
the instant when the emerging particle transverses the TOF detector surface.
Traditionally, this time interval would be measured in a single device (a Time-toDigital Converter – TDC). However, the dimensions of the detector system (>150,000
cells distributed over ~100m2) render impractical the distribution of t0 over the whole
system. A better solution is to rely on the reference clock (clkref), which has to be
distributed anyway, as the time reference of the measurements. Each limit of the time
interval can then be measured individually and later subtracted digitally to obtain the
original interval.
7m
TDC
TOF detector cells
3.5m
Interaction
t0 detector
clkref
distribution
pre-amplifier
&
discriminator
time of flight
TDC
time of interaction
(bunch ID)
Figure 4: Schematic view of the TOF detector front-end.
The actual interaction and crossing instants are reflected in the timing characteristics
of the electrical signal that the respective detector generates. These signals are the object
Page 13
of some processing (amplification, discrimination, etc) in order to render them usable by
the TDC that converts the timing information they carry into a binary word.
The timing uncertainties created by such processing, and by the digital conversion
procedure, must be added to the intrinsic uncertainty of the TOF and t0 detectors (σdet and
σt0, respectively) in order to obtain the overall time resolution of the system.
σt0
σdet
detector cell (t0 / TOF)
σfe
σfe
front-end electronics
(pre-amp & discriminator)
σTDC
σTDC
TDC
σclk
σclk
clkref distribution
clkref
Figure 5: The error propagation chain.
In such a distributed system, it is reasonable to assume that all the time uncertainties
generated in the different blocks are uncorrelated. Therefore, following the error
propagation scheme of Figure 5, the time uncertainty of the TOF system is:
2
2
2
σTOF
= σ t20 + σ 2det + 2 ⋅ σ 2fe + 2 ⋅ σTDC
+ 2 ⋅ σ clk
,
where, for simplicity, the time uncertainty of the front-end block (σfe), of the T/D
converter (σTDC) and of the clock distribution network (σclk) were considered having the
same statistical properties in the two independent chains.
If the intrinsic time resolution of the detector is to be respected, it is important to
minimise the time uncertainty created by all the electronic components of the chain. The
overall contribution of the electronics should only be a small fraction of the time
uncertainty of the complete TOF system. To obtain an overall time uncertainty better than
σTOF=150ps, as required by the ALICE experiment, the resolution of the T/D converter
must be σTDC<50ps. It is assumed, as in [6], that the time uncertainty of the TOF counters
is σdet=100ps, and that the values for σt0, σfe and σclk are, respectively, 50ps, 10ps and
50ps.
Apart from the timing performance of the TOF electronics, the particular physical
constrains of these experiments (large number of detector cells, electronics mounted
directly on the detector), generate new demands on the electronics to be used. Commercial
Page 14
Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction.
components and instruments like low noise and fast amplifiers, low time-walk
discriminators, and high-resolution T/D converters exist, but their size and power
dissipation are seldom adapted to the specific requirements of modern HEP experiments
like the one described.
Page 15
Page 16
Chapter 3.
Conversion Basics.
The remarkable development of computers and other digital means of processing
data during the last few decades has enabled the creation of new and more powerful
instruments for observing and studying the world that surrounds us. Of course, this is
essentially an analogue world since observable quantities may suffer continuous time and
amplitude variations. Their translation into electric signals also results in analogue
quantities. The interfaces between the analogue domain and the digital domain are
performed by the Analogue-to-Digital Converters. They capture the analogue quantities
and convert them into their digital representations, which should be the exact counterpart
of the respective analogue quantity, independently of the properties of the converter used.
The capture of an analogue quantity in a discrete format by means of an electronic
converter is unfortunately not error-free. Indeed, some loss of information is inherent to
the amplitude quantising operation1. Furthermore, given the technological limitations and
the environment in which these converters operate, other sources of errors will indubitably
affect the conversion transfer function, making it different from the idealised one.
Several converter architectures and several implementations of these architectures
have been proposed over time. All of them have claimed their advantages by showing
different, and some times conflicting, performance parameters. A quick scan of the
literature [7],[8] and of commercial converter data-sheets shows that even if some
performance metrics are commonly used (INL, DNL, etc), their definition may differ. It is
therefore important to clarify which metrics will be used throughout this text to
characterise the converters, and what is their significance.
Furthermore, most of the performance metrics have been developed and used in the
context of conventional A/D converters. Some of these are not directly applicable to the
T/D converter characterisation, either because they are meaningless (maximum input
frequency, hold time, droop rate, etc), or because their meaning is different (maximum
sampling rate). Also some new performance parameters, adapted to the specific
application, must be developed.
1
Given some restrictions to the signal bandwidth B, The Nyquist criterion assures that the sampling
operation preserves all the characteristics of signal if the appropriate sampling frequency is used
(fsample=2·B).
Page 17
The performance metrics that will be used throughout this text are presented here.
Their meaning and significance will be explained, as well as the way they can be
measured, if relevant.
3.1.
Performance metrics.
A T/D converter performs the conversion of a time interval (a delay) into a binary
word. This operation inevitably includes an amplitude discretisation (quantisation), which
means that its transfer function is staircase shaped, as shown in Figure 1.
digital
output
tapi+1
tapi
bini
LSB
dynamic range
analogue input
Figure 1: Ideal transfer characteristic of a 3-bit converter.
An ideal converter is characterised by its Least Significant Bit (LSB) and the
conversion Dynamic Range. The LSB corresponds to the smallest delay that can be
discriminated and the Dynamic Range corresponds to the larger delay that can be
measured. After conversion, the delay is converted into a discrete number of Codes, each
corresponding to a “stair” of the transfer curve. A delay is said to belong to bini if its
length is smaller than the one corresponding to Codei+1 but not smaller than the one
corresponding to Codei. For applications such as the T/D converters based on the
architectures developed in this work, the definition of Code is interchanged with the more
meaningful definition of tap.
Departures from the ideal behaviour of the converters are usually characterised
using a given set of metrics, such as Differential and Integral non-linearity, Gain error,
Offset. Since some of these static performance metrics have different definitions
depending on the application, a set of appropriate definitions is given and briefly
discussed:
Differential Non-Linearity (DNL) is the deviation of the output bin size from its
ideal value of one least significant bit (LSB). For a given bini, the differential non-
Page 18
Chapter 3:
Conversion Basics.
linearity DNLi is given by the following equation, where di is the measured cumulative
delay from the origin to the tapi.
DNLi =
d i +1 − d i − LSB
, i= 0..N-1.
LSB
The result is usually presented as a graph representing all the N bins being
characterised, together with the standard deviation of the DNL.
Integral Non-Linearity (INL) is the deviation of the input/output characteristic and a
straight line of ideal gain (slope) that best fits the curve, obtained by adding an offset to
the ideal transfer characteristic. Using this definition, Gain error is zero, because its effect
is included in the INL result. The INL graph is usually presented, together with the
standard deviation of the INL.
This definition of INL does not exactly match the usual definitions, as summarised
in [7]. However, it satisfies the particular requirements of the T/D converters to which
these metrics are applied. The principle of operation of most of the T/D converters that
will be presented here relies in the concatenation of repeated images of a transfer function
with small LSB along the full dynamic range of the converter. The concatenation being
guided by an external reference signal that also serves as the overall reference to the
converter.
In this context, it is standard practice to characterise in great detail only a limited
section of the dynamic range, corresponding to one or more images of the above
mentioned transfer function. The performance measured in this section is then
extrapolated to the full dynamic range (which is itself a simple repetition of this section).
The definition of INL used must allow for this extrapolation operation, therefore the gain
error must be included in the INL measure.
The concatenation of the transfer function must be verified to confirm that the
extrapolation of the INL measure is valid. Given the principle of the operation of these
T/D converters, it is only necessary to check that all of the images are present and are not
superimposed. A coarse INL characterisation of the full dynamic range identifies any
concatenation error that may be present.
For a given bin i, the integral non-linearity INLi is given by the following equation,
where di is the measured cumulative delay from the origin to the tapi and the Offset Odelay
is defined below
INLi =
d i − Odelay − i ⋅ LSB
LSB
, i=0..N-1.
Gain error is the deviation of the slope of the line used in the INL calculation from
its ideal value. As stated before, the definition of INL used results in null gain error.
Page 19
Offset is the vertical intercept of the line to which the transfer function is compared
in the INL calculation. The Offset, Odelay, is such that the squared residual of εi is
minimised,
ε i = d i − Odelay − i ⋅ LSB , i=0..N-1.
In our case, this definition results in a relative offset of the transfer curve. An
absolute offset would have to take into account the offset due to different signal paths of
the reference and hit signals within (and outside) of the circuit. Since an absolute offset
value depends on the system where the TDC is incorporated and must anyway be
measured at system level, no further mention is made of this metric.
These static metrics (illustrated in Figure 2) reflect how close the transfer function
of the converter is to the ideal curve. They can be obtained using statistical methods such
as the histogram method, also known as the Code Density Test (CDT). A more detailed
overview of this method and of the test set-up used can be found in [9] and Appendix A.
digital
output
INLi
DNLi+LSB
Offset
analogue input
Figure 2: Example of a converter transfer function illustrating the static performance metrics.
Another important characteristic of the converter, which reflects its behaviour in the
presence of random error sources such as loop jitter, electrical noise or quantising noise is
the Conversion error:
Conversion Error is the deviation of the input/output characteristic from a straight
line of ideal gain (slope) that best fits the curve. The result is presented as an histogram of
the error, and its standard deviation is defined as the RMS Resolution of the converter.
This definition is quite similar to the INL definition given above, the difference
being on the method by which the transfer curve is obtained. In this case it is obtained via
a linear time sweep over the dynamic range (see Appendix A), while the INL graph is (in
our case) obtained using randomly generated hits in code density tests.
Page 20
Chapter 3:
Conversion Basics.
This metric reflects a different way of characterising the circuit, very appropriate for
High-Energy Physics experiments, where the response of most of the detectors to a
particle crossing includes some time (and amplitude) uncertainty which is reflected in the
standard deviation of their transfer function.
Other performance metrics of a converter are included here, for completeness.
Crosstalk between channels reflects the error introduced in the transfer function of a
given channel when electric activity occurs in any other channel integrated (or not) in the
same circuit. It is presented as a maximum deviation of the transfer function in any
coupling conditions.
Double hit resolution is a measure of the minimum time interval between two
consecutive samples of the quantity being measured. In the TDC domain this quantity is a
time interval. This metric is similar to the maximum sampling frequency used in the
context of ADC characterisation. However it is more adapted to the characterisation of
T/D converters due to the random nature of their sampling activity.
The following characteristics do not reflect the timing performance of the converter,
but they are important to establish the applicability of one particular converter circuit to
the envisaged system.
Number of integrated channels.
Power dissipation per channel.
Calibration requirements.
System-level functionality integrated (memory, etc).
3.2.
Error sources.
The performance metrics already discussed describe the observable effects of all the
error sources that influence the converter system. In this section the causes of these errors
will be briefly exposed. Only the general error causes will be discussed. Particular
conversion architectures are affected by different error mechanisms. These will be
discussed together with the respective architecture.
Quantisation error.
The quantising operation is inherent to the operation of any converter. It consists of
the approximation of the amplitude of the quantity being converted to a level that is part
of a limited set of available levels. The resulting signal is a discrete amplitude
representation of the sampled signal. It can be directly represented in a binary format.
The effect of the quantising operation is an error in the conversion result. This error
is proportional to the LSB of the conversion, varying between –LSB/2 and LSB/2.
Quantising is usually seen as a source of additive noise. To formulate its impact on the
Page 21
performance of the converter, this additive noise is assumed to be a random variable with
a uniform distribution between –LSB/2 and LSB/2 and that it is independent of the input
amplitude [8]. While these assumptions are not strictly valid, they do result in a
reasonable approximation for converters above 4 bits. This random variable has a standard
deviation of:
σq =
LSB
12
.
Reference phase noise (Jitter).
The quality of the reference that the converter uses is determinant to the operation of
the converter. Some converter architectures include means of averaging the important
properties of the reference over time, thereby filtering out harmful variations of these
properties and reducing the conversion errors. However, this filtering function has limited
effects and therefore it is safer to rely on a high quality reference that can be used as it is
delivered to the converter.
In the context of modern T/D converters, the reference is usually a periodic signal
with its phase noise (or jitter) being the important quality factor. Jitter present in the
reference will force the converter to permanently try to adapt to the changing period of the
reference. Therefore any jitter on the reference signal will lead to an added random noise
component to the conversion function.
Other noise sources.
Several other sources of conversion errors may be present in a T/D converter, just
like in any other electronic circuit. A careful design minimises de sensitivity of the
transfer function of the converter to these noise sources.
A distinction can be made between intrinsic and extrinsic noise sources. Intrinsic
noise is due to random motion of charge carriers in the devices (active or passive) that
make up the circuit. It is always present in the signals flowing in the circuit.
The origin of several kinds of intrinsic noise will be shortly described here [10].
However, given the large voltage levels of most of the signals used in the converters
discussed in this dissertation, their influence in the performance of the converters is small.
Thermal noise is a temperature dependent noise. It originates from the thermally
induced random motion of charge carriers within the device. It has a flat spectral density
(white noise) and a gaussian amplitude probability distribution function (PDF) with zero
mean. The variance σ2(i) is a function of the temperature T and the resistance value R (k is
the Boltzman constant and f is the frequency).
σ 2 (i ) = 4 ⋅ k ⋅ T ⋅
Page 22
1
⋅ ∆f (A 2 )
R
Chapter 3:
Conversion Basics.
Shot noise is due to the random passage of charge carriers across a potential barrier
in a semiconductor junction. Therefore it depends on the direct current flowing on the
device. It has the same spectral and amplitude characteristics of thermal noise. The
variance σ2(i) is a function of the direct current ID and of the electronic charge q.
σ 2 (i ) = 2 ⋅ q ⋅ I D ⋅ ∆f (A 2 )
Flicker noise (or 1/f noise) describes the quality of the conductive medium with
respect to the direct current flow. Several origins may contribute to this noise. Its
amplitude PDF is often non-gaussian, but the spectral density is proportional to 1/f (hence
the name). The expression of the variance σ2(i) of the amplitude of this kind of noise
includes two terms that have to be experimentally determined, K and a:
σ 2 (i ) = K ⋅
I Da
⋅ ∆f (A 2 )
f
Other kinds of intrinsic noise having a spectral density with a higher order
dependency on the frequency, such as popcorn, or burst noise (1/f2) reflect mostly the
quality of the processing of the material. Their amplitude PDF is not gaussian.
Finally avalanche, or breakdown noise is caused by the avalanche process just
before junction breakdown. Its spectral density is usually flat and its amplitude PDF is not
gaussian.
Extrinsic noise, on the other hand, is a product of the interference of the external
circuitry in the behaviour of the sensitive circuit [11]. Extrinsic noise requires a path via
which the noise source can couple into the sensitive circuit. Therefore it is strongly linked
to the circuit layout and to the signal distribution topology. This interference may be
random or deterministic.
Of several possible coupling methods we will only discuss the more relevant in the
integrated circuit domain, Capacitive coupling, Conductive coupling (via shared signal
paths) and Inductive coupling.
Capacitive coupling is due to the existence of electric fields between any two
conductors. The current flowing through the coupling capacitor is a function the rate of
change of the potential difference across its terminals. Therefore any signal variation in
one of the plates of the coupling capacitor induces a variation in the other plate. This
effect is often known as crosstalk. It may be significant where the coupling capacitor is
large (for example, two long parallel lines) or where high frequency, and large amplitude
signal variations occur close to a weak signal path.
Conductive coupling is due to the existence of a direct signal connection between
the noise generating circuit and the sensitive circuit. These connections may be the input
signals, the common power supply or ground node.
Power supply and ground distribution within IC circuits requires complex networks.
Although these networks are made of low resistivity lines, the overall resistance is not
Page 23
negligible. In the presence of switching activity, periodic current surges flow through
them, leading to voltage drops or bounces. These voltage variations may affect the
sensitive circuit. Noise coupling through the power supply distribution is also known as
supply noise.
Inductive coupling is usually not considered in the context of the integrated circuit
itself, given its small dimensions. However the package interconnects and the bond wires
that establish the connection between the IC and the rest of the circuit can be sensitive to
this coupling effect. It is due to a varying magnetic field around a conductor where current
is varied. Since the magnetic field extends around other conductors in the vicinity, its
variation may provoke a voltage change in them.
In the case of the bond wires dedicated to power supply and ground, where
relatively large current variations may be present due to the switching activity of the
circuit, the inductance of the wire may cause a voltage change across the supply network.
This effect is also named supply noise.
As mentioned before extrinsic noise may be of random or deterministic nature. If it
is of random nature, then it must be studied using statistical analytical methods. If it is of
deterministic nature, circuit analysis methods can be used. In synchronous circuits supply
noise disturbs the sensitive circuit in a systematic (and periodic) way. The knowledge of
the characteristics of the noise generating circuit can be used to minimise its effects on the
functionality of the sensitive circuit.
Offset variation.
In a TDC system, conversion offset is determined by the delay that the sampling
signals experience throughout the system as it progresses until the converter. It is a
system-wide characteristic, therefore it only makes sense to discuss it at system level.
Typically, offset is calibrated at start-up time, performing a direct measurement of the
propagation delay of the sampling signal (using the converter itself).
At the TDC circuit level, it is possible to minimise the temperature sensitivity of the
conversion offset, by forcing the reference and the sampling signals to have similar delays
inside the circuit and to have the same temperature dependency of the two paths’ delay.
Since the sampling signal will typically transverse a front-end chain consisting of
some electronic devices, like buffers or signal conditioners, its delay will be sensitive to
temperature changes. These changes are expected to be larger than the corresponding
variations at the TDC level. Periodic system-wide calibrations are therefore required if
environment changes are expected.
Page 24
Chapter 3:
3.3.
Conversion Basics.
Converter calibration.
Any converter system requires a known reference from which the conversion gain
(or constant of proportionality) can be derived. The procedure that leads to the adjustment
of the transfer function to the idealised characteristics is called Converter Calibration. In a
wider sense, the offline determination of the transfer function that leads to the relationship
binding the digital representation to the measured quantity can also be included in this
definition, although it does not influence the converter operation.
The calibration reference can be a set of pre-determined quantities, converted
together with the actual signal, which can be used to derive the transfer function of the
converter. A single start-up calibration is sufficient if the converter circuit is not sensitive
to environment variations. On the other hand, if the constant of proportionality is sensitive
to environment changes, this procedure has to be executed periodically and the updated
transfer function applied to the data. This calibration procedure does not set any
requirements to the converter, since it is executed offline. However some conversion dead
time is incurred due to the conversion time of the reference quantities.
The hardware necessary to calculate the transfer function from these reference
quantities can be integrated in the converter. Its knowledge can then be used to perform
internal calibration of the converter. In this case the output data will always be calibrated
in relation to the given reference. Conversion dead time is, however, unavoidable.
In all these schemes the calibration procedure is performed periodically, therefore
changes that may occur between these calibration runs are not accounted for and large
conversion errors may develop. To avoid this problem the best solution is make the
converter perform continuous calibration in a non-intrusive way, so that no dead time
penalty is incurred. In these schemes the transfer function is directly derived from a
reference signal and does not depend on environment conditions. A consequence of this
permanent auto-calibration is that (in normal operation) the conversion error is
continuously minimised.
Page 25
Page 26
Chapter 4.
Review of TDC Architectures.
Several methods have been proposed in the past to solve the problem of accurately
measuring time. Traditional techniques fall into a few categories [12]: counter based
techniques, vernier techniques, pulse overlap techniques and current integration
techniques. TDC circuits can be built using discrete, standard, components and therefore
avoid the need to develop special purpose monolithic circuits. Recently the demand has
been pushing for higher level of system integration and lower power dissipation, domains
where traditional methods find it difficult to compete.
The advent of sub-micron digital CMOS technologies, due to their availability, has
enabled the emergence of new TDC architectures. Time interpolation using delay line
based architectures can achieve comparable resolution to the more traditional methods and
profit from the new technology’s capabilities in terms of integration and power
dissipation.
An historical review of time interval measurement circuits can be found in [12]. In
the meantime several architectures have been described in the literature, but only partial
review papers have been published (for ex. [13]). In this Chapter, a small review of the
most relevant architectures is presented, focussing on the topics that are fundamental for
this work: time resolution, dynamic range, power dissipation, calibration, possibility of
sharing a common time interpolator block between several integrated channels and cost. A
table summarising the characteristics of each of the architectures described is presented in
the end of the chapter.
4.1.
Overview of TDC architectures.
4.1.1.
Current integration techniques.
Current integration is probably the most common technique used for time interval
measurements. In this architecture, a capacitor is charged linearly with a constant current
I. The charging of the capacitor is gated on by a “start” pulse at time t1 and off by a “stop”
pulse (time t2). The charge stored in the capacitor is thus proportional to the time interval
between the “start” and “stop” pulse. Assuming a voltage independent capacitor, the
voltage drop at its terminals (Vcap) is also proportional to this time interval.
Page 27
V cap =
I ⋅ (t 2 − t1 )
.
C
Any kind of ADC can be used to convert the Vcap into a suitable digital code. The
time resolution of these converters can be made very high. The stability of the current
source, the linearity of the capacitor and the resolution of the ADC determine the
resolution that can be achieved using this technique. Another important constrain is the
high noise sensitivity of the current integrating node. Differential schemes have been
developed to reduce noise sensitivity and enable higher resolution measurements ([14]
and [15]). Figure 1 shows the basic scheme and timing diagram of one of these
techniques. The time lapsing between the “start” and “stop” signals and the end of the
“gate” signal are measured by two independent Time-to-Analogue Converters (TAC). The
difference between these two measurements, given by an analogue voltage at the output of
a differential amplifier, corresponds to the original time interval. Mismatches between the
capacitors (C) and current levels (I) in the two TACs can be taken into account via the
appropriate changes to the constant of proportionality of the measure.
Vcap (start)
Reset
Gate
TAC
#1
Start
Hit available
TAC
#2
Stop
Vcap (differential)
Vcap (stop)
tstart
Reset
Gate
Start
Stop
Vcap (start)
Vcap (stop)
Hit available
tstop
Figure 1: Block and timming diagram of a differential Current Integrating TAC (from [14]).
The time difference being measured is, in this case:
T = t start − t stop =
Page 28
(
C ⋅ Vcap ( start ) − Vcap( stop )
I
)
.
Chapter 4: Review of TDC Architectures.
In Current integration techniques, the converter is occupied for as long as the
measurement is being acquired. This results in a considerable dead time between
measurements. Flash-ADC’s can be used to reduce the analogue to digital conversion
time. Unfortunately the cost penalty of using these devices can be prohibitive. Another
approximation is to rely on the statistical properties of the event arrival time. An analogue
memory could then store the measurements before conversion thus de-randomising the
event rate (see [16]). In this way, a single Flash-ADC could be shared between several
channels, or a slower ADC’s could be used without any throughput penalty.
Another limitation of these techniques is their limited dynamic range. Given a
maximum voltage to which a capacitor can be charged (for example, the supply voltage),
the only way to increase dynamic range is to decrease the constant of proportionality of
the measurement, either by decreasing the current level (I) or by increasing the capacitor
(C). In some applications the dynamic range is divided in separate resolution ranges [17].
In this way it is possible to measure long time intervals with a limited resolution, and
measure short time intervals with high resolution. The identification of the range to which
the measurement belongs is performed by selection of the smallest non-overflowing
range.
Low-power operation is possible (disregarding the flash-ADC dissipation).
However large-scale integration is difficult due to the requirements on good analogue
process characteristics and the noise sensitivity inherent to the architecture. Current levels
and actual capacity values depend on process, on temperature and supply voltage, forcing
calibrations of the converter.
4.1.2.
Counter techniques.
Counter based time measurement techniques generally rely on a Gray code counter
running at very high speed. A “start” and a “stop” pulse mark the moments when the
counter is sampled, the difference between these two samples corresponds to the time
interval measured. The frequency and stability of the reference clock determine the
resolution and accuracy of this scheme [12].
This method offers a very large dynamic range, in a highly integrated digital design.
However, to obtain high resolution a reference clock frequency on the GHz range
(∆tmin<1ns) is required and thus very fast processes must be used to implement it. Also it
results in a power consuming system, due to the large toggling rates present.
Alternatively, several counters, synchronous to different phases of the same clock
can be used to increase the resolution using a slower reference clock [18]. The time
measurement can be easily interpolated from the results of all the counters. The accuracy
of the synthesised clock phases sets the achievable resolution.
These techniques are sensitive to the metastability in the counter’s registers. If the
sampling “start”/“stop” signals arrive when the counter is toggling, the resulting output
Page 29
may be unpredictable [19]. Simple Gray code counters are less sensitive to this problem,
since only one bit toggles for each clock transition. Interpolation between several Gray
code counters can worsen the problem because in that configuration one bit toggle in one
counter corresponds to more than a single Least Significant Bit (LSB) change.
4.1.3.
Delay line-based techniques.
The clock rate requirements that limit the use of counter techniques for time interval
measurements can be relaxed if the basic CMOS gate delay is used as the time unit.
Modern CMOS technologies have gate delays in the order of 100ps thus the resolution of
the conversion can be quite good.
In this technique, several delay elements (usually inverters [20], alternatively
segments of a transmission line can be used [21][22]) make up a delay line through which
a signal pulse is propagated. The progression of the pulse along the delay line reflects the
time interval being measured.
In Figure 2 an example of such a line is shown. Delay elements made of two
inverters make good building blocks for these lines since they respect the polarity of the
input signal in every output tap. Alternatively differential cells can be used, but they result
in higher static power dissipation.
Pulse
Tap 0
Tap 1
Tap 2
Tap 3
Tap 4
Tap N
Figure 2: Delay line using double inverters as delay elements.
Since standard CMOS technologies are used, an easy to design and highly
integrated monolithic converter can be developed. Complex systems, including the
converter and large logic units can be integrated in a single IC with low power dissipation.
However the delay of a CMOS gate is highly dependent on the process parameters,
temperature and supply voltage, therefore requiring frequent calibration. The linearity of
the conversion transfer function is determined by the matching of the delay cells. Strict
design rules must be followed to reduce device mismatch to acceptable levels.
Large dynamic ranges can only be achieved if very long delay lines are used. Since
long lines are difficult to obtain, this technique is limited to short dynamic ranges.
Page 30
Chapter 4: Review of TDC Architectures.
4.1.4.
Phase Locked Loop (PLL) techniques.
Some of the limitations of the delay lines previously discussed can be overcome by
continuously adjusting the delay of its elements, using as a reference a clock signal. If the
delay line is closed in a voltage controlled ring oscillator (VCO) topology and the
oscillation frequency is controlled via a feedback loop, a PLL is obtained. Control of the
delay of each element can be performed by limitation of the current available to it [23].
Analogue control loops are common [24], but digital loops have also been implemented
[25]. Alternatively to current limitation, the load at the output of each delay element can
be controlled [26].
This kind of system is able to generate precisely timed signals that can be used in
time interval measurement instruments. The inclusion of the oscillator in a closed loop
guarantees self-calibration and, thus, low sensitivity to environmental and process
changes. It’s interesting to note that the need to have dynamic control of the delay of the
delay line leads to a slowing of the line in typical operation, meaning that the technology
is not pushed to its limits. Like in any delay line based architecture, delay cell mismatch
limits the linearity of the conversion.
Using asymmetric ring oscillators [24] or differential pairs [27] as the cells of the
oscillator, it is possible to obtain the convenient 2N number of time bins per clock cycle.
Measurements performed using this technique are related to the reference clock. If a time
interval is to be measured, the difference between two measures acquired at the end and at
the beginning of the time interval must be subtracted.
VCO
Clkref
Phase
Frequency
Detector
Charge
Pump
Hit
Hit registers
Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals from an oddnumbered oscillator.
PLL based circuits have the convenient property of being able (depending on the
closed loop properties) to filter out phase noise (jitter) associated with the reference clock,
therefore loosening the requirements for the time reference path. Jitter internal to the loop
can also be filtered. However, the increased PLL bandwidth required to perform that
filtering reduces the filtering capability of the jitter associated with the reference. Note
that phase noise generated within the VCO is accumulated between oscillator periods,
thus leading to increased output jitter, when compared to other delay line based schemes,
such as the Delay Locked Loop (DLL) [28].
Page 31
Large dynamic ranges can be obtained by counting the number of oscillations of the
ring oscillator. The less significant bits of the measurement are thus obtained from the
PLL and the most significant bits from the counter. Since both parts of the measurement
are generated using the same reference signal (the oscillation period), there is no
ambiguity in the final result.
PLL’s have been extensively discussed in literature (for example in [29] and [30]),
demonstrating their flexibility, high integration level and low power dissipation. However
they require careful layout design, to ensure that all the cell delays are identical and that
the interconnection capacity on the output of each cell is matched. A PLL is a second (or
higher) order system, therefore the loop stability must be carefully evaluated.
4.1.5.
Delay Locked Loop (DLL) techniques.
If the delay line is not closed and it is included inside a feedback control loop, then a
DLL is obtained [13][31]. Various topologies of the control loop have been described, but
they typically include a Phase Detector to measure the phase error and a filter that
converts this information into a meaningful quantity. In contrast to PLL’s, the reference
clock signal is injected directly into the voltage controlled delay line (VCDL) and its
phase is compared with the corresponding phase in the output of the line (see Figure 4).
A DLL has some characteristics in common with a PLL such as the ability to
generate precisely timed signals with high resolution, the self-calibration of the system
and the large dynamic ranges achievable. In order to guarantee a good linearity between
consecutive delay elements, matching of devices is a critical parameter.
VCDL
Clock
Phase
Detector
Hit
Charge
Pump
Hit registers
Figure 4: Delay Locked Loop and hit registers.
Self-calibration is based on phase information from the extremes of the delay chain.
To guarantee that the delay chain is permanently calibrated, the reference clock must be
constantly circulated through it. A constant level of power is thus dissipated, regardless of
the rate of the hits being acquired.
Dynamic ranges wider than the reference clock period can be achieved by
introducing a counter synchronous to the reference clock. Since both the DLL and the
coarse counter measurement are obtained with the same reference, the expansion of the
measurement’s dynamic range is unambiguous. Using this technique a time stamp
Page 32
Chapter 4: Review of TDC Architectures.
converter is obtained, where the time measurements is referred to the clock signal. In
many applications the reference clock can be used as the “start” or “stop” signal. If that is
not the case, “start”/“stop” measurements can easily be obtained by subtraction of the time
stamps of these two signals.
Unfortunately, this kind of controlled loop, unlike PLL loops, lack the capability of
filtering jitter coupled to the reference signal. Therefore the time critical paths should be
designed to be noise insensitive and the reference clock must be stable. Careful design of
the delay locked loop is also essential, in order to guarantee that each of the delay cells
have the same delay characteristics.
DLL’s can be built using standard digital CMOS technologies, which allows for a
high integration level and thus lowers system costs. Sensitivity to environmental
conditions is factored out by the self-calibration mechanism and noise sensitivity can be
lowered to acceptable levels by careful layout and power distribution.
4.2.
Beyond the limits of the technology: techniques to improve
resolution.
The schemes previously presented have their time resolution limited to the unit cell
delay, usually made of two inverter gates. As the demand for higher resolutions grows,
faster technologies must be used. Unfortunately the access to these technologies is, at
present, rather expensive. Another possibility to overcome the resolution limit is to devise
different techniques that are able to interpolate time within the basic cell delay. Several
architectures have been proposed in the literature, some of them are discussed in the next
few sections.
4.2.1.
Analogue time expansion.
The analogue time expansion technique extends the current integration technique
into a scheme where the time interval to be measured is stretched by a factor k dependent
on the circuit’s parameters. The expanded time interval thus obtained can be measured by
any TDC with smaller resolution.
Several topologies can be used to obtain a time stretcher. The simplest one is in fact
similar to the Wilkinson ADC. In this topology the capacitor that was charged during the
measurement is discharged with a much smaller current. The ratio between the charge and
discharge current is the stretch factor k. If this factor is big enough, a simple counter based
TDC can be used to measure the discharge (stretched) time interval and thus obtain the
original time measurement with improved resolution. Even finer resolution can be
obtained using DLL based TDC’s.
The dynamic range obtained using this technique can be extended if the start and the
stop time are separately measured in relation to a reference clock and the number of clock
Page 33
cycles elapsing from one measure to the other are also recorded [32]. A refinement of this
technique allows for the simultaneous calibration of the stretch mechanism [33].
clkstretch
pulse
synchronised pulse
integrator voltage
output to TDC
T1
T2
T3
T4
Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher [33].
For each pulse to be measured, the TDC captures two time intervals. The first (T2T1) reflects the unstretched time difference between the pulse arrival and an edge of the
reference stretch clock, the second (T3-T2) reflects the stretched image of this time
difference. The stretch factor k is:
k=
T3 − T2
.
T2 − T1
To obtain a high precision measurement of k, an average of several random time
difference measurements is performed. Since the normal data is uncorrelated with the
stretcher reference clock, this averaging operation will reduce the error to acceptably
small levels.
Previous techniques are sensitive to noise in the integrating node or non-linearity of
the capacitor. This sensitivity can be reduced by the use of two identical capacitors that
are discharged by different currents respectively when the Start and Stop signals arrive. A
comparator is used to identify the moment when the voltages on the two capacitors are
again the same, as is shown in Figure 6.
If the “stop” discharge current is k times the “start” current, then the resulting time
expansion is given by the following expression, where tsame is the extended time interval
limit:
t same − t start =
(
)
k
⋅ t stop − t start .
k −1
In a differential architecture like this the expanded time is very insensitive to supply
noise, or to any non-linearity of the capacitor or current sources used, as long as they
affect both branches in the same way. Any mismatch between capacitor values or current
levels will only produce a change in the expansion factor k, which can easily be calibrated
at set-up time.
Page 34
Chapter 4: Review of TDC Architectures.
reset
Q
same
C
C
start
stop
I
k.I
start stop
same
t
Figure 6: Time expander circuit and corresponding timing diagram.
The main disadvantages of this scheme is the demanding requirements it sets on the
comparator in terms of offset and propagation delay stability along a considerable
common mode. Its rather short dynamic range and considerable dead time between
measurements also can limit its utility, especially in high hit rate applications.
Large dynamic ranges can be obtained if the measurement is in some way
synchronised to a reference clock. If the time difference from the start to a clock edge and
from the stop to a clock edge is added to the time between these edges, dynamic range
becomes independent of the charge/discharge current levels or capacitor sizes.
4.2.2.
Vernier differences.
This technique is an extension of the analogue vernier technique [12] where the two
reference signals with slightly different periods are substituted by more convenient delay
lines with different delay per cell [34]. A “start” and “stop” pulses are propagated through
each of these lines.
Start
T1
T1
T1
T1
T1
D Q
D Q
D Q
D Q
D Q
T1 > T2
Reset
Stop
T2
T2
Tap 0
T2
Tap 1
T2
Tap 2
T2
Tap 3
Tap 4
Figure 7: Time expansion using two delay lines with different cell delay.
The rising edge of the “stop” pulse latches the state of the “start” delay line. If the
cell delay T1 of the “start” delay line is slightly bigger than the cell delay T2 of the “stop”
Page 35
delay line, the position of the first flip-flop not set (N) gives the time interval between the
“start” and “stop” signals, that is
Tin = N ⋅ (T1 − T2 )
Very good time resolution can be obtained with this technique. In order to save
silicon area, several improvements can be made: the “stop” delay line can be replaced by
the propagation delay of each D flip-flop. Another technique is to use a single delay line
with different rise (Tr) and fall (Tf) times in the “stop” path, and to connect the “start” line
to logical one [13]. This results in a shrinking pulse and the position of the first flip-flop
that is not set gives the original pulse width in terms of Tr-Tf.
Converters using these schemes have a very limited dynamic range and require very
long delay lines for the desired resolution level. Also when pulses are propagating through
the lines, no other hits should occur, leading to some dead time between measurements.
Another drawback of these schemes is the difficulty of controlling the bin sizes in each
line used. Process spreads, temperature and supply voltage influence these delays,
therefore frequent calibrations of the circuit are required.
Vernier techniques are very sensitive to the matching of the delay of the cells across
the delay lines. The effects of mismatch are amplified by the nature of the time
interpolation, where the high resolution is obtained from the small difference between the
(comparatively) large delay of the cells in each of the delay lines.
Circular vernier method.
The need of very long delay lines to obtain a reasonable dynamic range can be
obviated if the two lines are closed in a ring oscillator-like structure, such as the one
shown in Figure 8. Theoretically this configuration corresponds to an infinite length line
and thus arbitrary dynamic ranges should be obtainable.
Start
T1
T1
T1
T1
T1
D Q
D Q
D Q
D Q
D Q
T1 > T2
Reset
Stop
T2
T2
Tap 0
T2
Tap 1
T2
Tap 2
T2
Tap 3
Tap 4
Figure 8: Circular vernier scheme for dynamic range expansion.
Both the “start” and “stop” signals are fed into the respective delay line via a
multiplexer. As soon as these signals are progressing within the delay line, the
Page 36
Chapter 4: Review of TDC Architectures.
multiplexers are switched thereby establishing a ring oscillator like structure. Counting
the number of oscillations completed by each of the signals before they coincide enables
the correct expansion of the dynamic range. Unfortunately the inversion of the signal
propagating on these ring oscillators makes the decoding of the moment when the two
signals coincide difficult. Solutions have been proposed where different structures are
used to detect the coincidence of the two signals in a different way depending on the
number of oscillations that occurred in each oscillator [35]. However the usage of
different structures in the time critical circuitry makes it hard to equalise their dynamic
response in all conditions. This may produce considerable non-linearity on the conversion
transfer function.
Another undesirable side effect of this closed loop topology is that all timing errors
that may occur during the measurement time (due to noise or any other source) will
accumulate in the final measurement. This scheme has the property of integrating all the
errors present during the measurement time.
Calibration, using a PLL-like control around the closed delay line may only be done
off-line, when there are no measurements. In a high hit rate environment calibration can
only be performed infrequently, which may result in loss of accuracy. Furthermore,
coupling between the two closed delay lines may also be a problem. Due to layout
considerations they should be implemented close together, and to obtain good resolution,
their oscillation frequency (delay of cells) should be very similar. If coupling is present
and there is no active control of the lines during measurement, one of the lines may be
pulled to oscillate at the frequency of the other line, which would ruin the measurement.
To avoid this problem, calibration can be performed using a dummy channel in a double
PLL like structure. Control information derived from it can be used to control the delay of
the lines even when measurements are being performed. In this way all the lines are
actively pulled to their correct oscillation frequency. The calibration circuitry can be
shared between all channels in a circuit, therefore resulting in an efficient use of silicon.
Dual scale vernier method.
There is an alternative implementation of the vernier technique where the dead time
between measurements is small and the converter is self-calibrating. Contrary to previous
techniques, this technique results in time stamp measurements. The principle of operation
is the same as the vernier caliber (Figure 9) used to measure length [36].
Two scales are required, the reference scale, which has a time bin T and the vernier
scale, which has a time bin slightly shorter, but spans N reference bins. The difference
between the two scales determines the bin size of the converter. For example, to obtain a
bin of 0.1·T the vernier scale must span 9 reference bins, being divided into 10 time bins.
A measurement word is made of two components, the higher order bits are obtained from
the reference scale and the lower order bits form the vernier scale.
Page 37
0
1
0.43
Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark in the vernier scale
(lower) lines up with a tick mark in the reference scale (upper) [36].
The reference scale can be made with a counter counting cycles of a reference clock.
The vernier scale is, for example, a DLL calibrated delay line that spans 9 clock cycles
and is divided into 10 time bins1. When the hit signal is asserted the status of the two
scales is captured. The low order bits of the measurement result from the identification of
the next bin that will switch. If this bin number is n, then the resulting time measure is:


1
t = Mod 1 −  ⋅ T ⋅ n, T  + m ⋅ T ,
 F 

where Mod(a,b) is the modulus operation, F is the interpolation factor and m is the
reference scale measurement. The number of time bins into which the vernier line is
divided is equal to the interpolator factor F. The number N of clock cycles that it spans is
F-1.
This technique is very sensitive to the accumulation of non-linearity along the
vernier delay line. This sensitivity is amplified if a high interpolation factor is
implemented since the length of the line is increased and the LSB is shortened.
4.2.3.
Analogue time interpolation.
In a locked DLL, the signals propagating through the delay chain have edges with
almost constant slopes, directly related to the delay of the delay elements. By performing
an analogue sum of the signals in consecutive time taps, it is possible to obtain a time
interpolation between these taps, thereby increasing the resolution to a level that is better
than the intrinsic delay of a delay cell (Figure 10).
The design of such a system is made difficult by the need to match the delay
through the summing circuitry with the direct signal from the taps themselves. An
alternative approach is to store all the analogue voltages from each tap when a hit occurs,
and later perform the interpolation, either by analogue summing, or by using the stored
voltages as inputs to a weighted filter which output would then be converted using an
ADC.
1
In fact the delay line includes many more delay elements to avoid interactions between leading and trailing
edges of the signal that progresses in it.
Page 38
Chapter 4: Review of TDC Architectures.
Clock
Phase
Detector
+
+
+
Hit
+
+
Charge
Pump
+
Hit registers
Figure 10: Time interpolation using voltage sums.
Small ring oscillators, controlled by a PLL structure, can also be used as the basis of
the time interpolation [37]. First order equalisation of the delay between different time
taps is obtained by including a dummy analogue phase interpolator (weighted sum of the
voltage at its two inputs) in the non-interpolating taps. In this scheme the phase
interpolator circuit must be calibrated to improve the linearity of the interpolation.
Other interpolation techniques try to generate the voltage ramp typical of current
integration schemes in a “digital” form [38]. As the “start” signal progresses along the
delay line, a voltage ladder is generated on the summing node. Each step represents the
crossing of a new delay cell by the “start” signal. A high order filter can be used to
smooth out the edges of the steps, thus obtaining the intended voltage ramp. The “stop”
signal forces each delay cell into high impedance and disconnects the hold capacitor at the
filter’s output, allowing the resulting measurement to be kept stable for the time necessary
to process it via an ADC.
1
Stop
Start
Q
enable
0
16 digital gates with
tri-statable outputs
Q
1
0
R
R
R
R
R
Reset
High order
filter
*1
Analog output
*1
Hold
capacitor
Figure 11: Time to analogue converter using a time interpolation technique [38].
When compared to current integration techniques, this scheme has the advantage of
being potentially less sensitive to noise coupling into the summing node. Since the
interpolation is done resistively, the node has much less impedance than a capacitive node
and there is no integration of noise effects over the measurement period. To convert the
Page 39
measurement into a binary word, an ADC must be used, which will increase power
dissipation and system costs.
4.2.4.
Array of coupled oscillators.
Some techniques have been proposed to increase the resolution of PLL based time
interpolation circuits to time intervals smaller than the intrinsic gate delay. One way of
achieving this is to use an array of coupled oscillator rings [39]. Each delay cell is made of
a dual input voltage controlled buffer. Both inputs have the same polarity and together
they define the output transition time. One of the inputs is used to form the ring oscillator,
the other to couple consecutive ring in the array as shown in Figure 12.
If a fixed phase shift is established between two consecutive oscillators, then the
identical coupling between oscillators will create a uniform phase shift between all
oscillators. The oscillation frequency remains the same for all oscillators. The time
resolution achieved is the cell delay (td in Figure 12) divided by the number of rings in the
array.
The fixed phase shift is established by connecting the outputs of the boundary
oscillator to the inputs of a cell located in a different position on the oscillator in the
opposite extreme of the array. In this architecture the time bin is defined by two closely
coupled delay cells that belong to separate ring oscillators. The inter-coupling between
consecutive rings forces the size of each time bin to be set by the complete array. This
intimate coupling guarantees a good linearity of the conversion function. However, device
matching is a critical parameter for this topology.
T4
T5
T1
T2
T3
T2
T3
T4
T5
td
T1
Figure 12: Coupled oscillators (time resolution of td * 2 / 3).
At initialisation time several modes of oscillation for which the array’s boundary
conditions are met will be present. Each corresponds to the case of having a phase shift
between the boundary oscillators that is a multiple of the oscillation period. The locking
Page 40
Chapter 4: Review of TDC Architectures.
procedure has to be able to force the circuit into the correct mode, where phase shift is
smaller than one oscillation period. This task may not be trivial.
The resolution achievable with this architecture is defined as:
Tbin =
x ⋅ T + k ⋅ T (2 ⋅ N )
,
M
where T is the oscillation period set by a PLL control loop, N is the number of delay cells
per oscillator and M is the number of oscillators in the array. Variable k reflects coupling
topology of the boundary oscillators (offset in number of delay cells) and x the arrays’
modes of oscillation. The correct mode of oscillation is when x= 0. It results in the
smallest bin size.
Layout of these circuits is critical to their correct behaviour. Every delay cell must
drive exactly the same load, if a good linearity of the measurements is to be maintained.
Therefore, a good layout of the consecutive rings is essential in order to guarantee that the
rings on the extreme of the array are in the same conditions as the rings in the middle and
that there is no systematic effect that affects the size of some time bins. The same
considerations apply to the delay cells on the extreme of each ring. Interleaving oscillators
and the delay cells that make them is, therefore, essential.
This architecture enables high time resolution and large dynamic range in a
conveniently dead-timeless converter system. It can be implemented in standard CMOS
technologies, thereby allowing for high levels of integration and low system costs.
However it suffers from the same drawbacks of other PLL’s such as sensitivity to VCO
internal noise and error feedback from the end to the beginning of each oscillator ring, etc.
Sharing the array of coupled oscillators between several channels is an effective
way to compensate for the higher power dissipation required by the use of several ring
oscillators.
4.2.5.
Array of Delay Locked Loops.
The use of an array of several uniformly offset DLL’s can increase the resolution of
a system to a fraction of the intrinsic gate delay [23][40]. A different DLL (herein referred
as Phase Shifting DLL), made with a smaller number of delay elements, is used to
precisely generate the required offsets.
In order to increase the resolution of the converter, the offset between DLL’s should
only be a fraction of the delay of the basic cell. This fraction cannot be obtained directly,
but a delay that is a fraction bigger than the basic cell delay is easily obtained using a
phase shifting DLL locked to the same reference. An arrangement like the one in Figure
13, due to the symmetry of the array, is made to look like the DLL’s in the array are only
offset by a fraction of the basic cell delay.
The time bin of such a circuit is
Page 41
Tbin = Tm − Tn =
Tclk Tclk
−
.
M
N
If the required time bin size is a fraction F of the basic cell delay of the DLL’s of the
array, then the relation between M, N and F can be expressed as
M =N⋅
F
,
F +1
where M, N and F are integers.
One disadvantage of this scheme is its inability to divide the reference period in a
number of bins that is a power of two. This means that the measurement obtained will not
be in a pure binary unit of 1/2N, but rather in a unit of 1/(N·F). A special encoder that
converts this code into a normal binary code must be used, if it is to be used together with
other binary measurements such as dynamic range extension using the coarse time counter
results.
Clk
N
φ1
tn
tm
φ2
Vc
φ1
tn
φ2
Vc
φ1
tn
M
φ2
Vc
φ1
tn
φ2
Vc
M<N
φ2
φ1
Vc
Figure 13: Array of DLL’s with phase shifting DLL.
Extensions to this architecture, where the use of auxiliary (controlled) delay lines
allow for the realisation of any number of subdivisions of the clock period (including the
pure binary number) have been proposed [41]. Unfortunately they increase the complexity
of the array and thus render it more difficult to design.
Power dissipation is also a concern on this architecture due to the large number of
DLL’s that are continuously active. This drawback can be limited if several channels
share the same array.
Page 42
Chapter 4: Review of TDC Architectures.
Like all DLL based techniques, this technique can be implemented in a standard
“digital” CMOS technology. It is therefore easy to integrate it with digital processing
logic in order to build a complex TDC system in a single IC.
4.2.6.
Time interpolation using passive RC delay lines.
Most of the techniques discussed so far use to their advantage a closed control loop
to guarantee that the converter is permanently calibrated. Schemes to increase the limited
time resolution that can be directly obtained are based on time interpolation. They usually
require more closed loops in complex topologies, which invariably lead to higher power
dissipation and increased non-linearity.
Minimum delays for a given architecture can only be achieved if the parasitic RC
delay lines present in every metal or polisilicon line are used (see [21] for an example).
Delay lines built in this way suffer from a big parameter spread due to process constrains,
rendering their exact delay difficult to predict. On the other hand they are rather
insensitive to supply voltage and temperature variations.
In order to obtain the desired delay from these lines, a calibration procedure must be
used. Calibration is mainly needed at start-up, during normal operation the slow
temperature variations and supply changes will not affect substantially the behaviour of
the lines.
Such a delay line can be used as a stand-alone delay generator, but a converter built
this way would have very limited dynamic range. However, when used together with a
DLL, this limitation is overcome. This converter adds the high resolution possibility to the
other benefits of a DLL based scheme, such as large dynamic range, self-calibration, etc
[22].
The block diagram in Figure 14 depicts the scheme. When the hit signal is asserted,
several (M) consecutive samples of the status of the DLL are acquired with a constant
time interval between them. If this time interval is made such that it is a fraction 1/M of
the cell delay, it is possible to perform time interpolation within the delay of a DLL delay
cell by identifying after which sample the reference clock has exited a given cell.
If the reference clock has a period T, and the DLL is made of N delay cells, the bin
size of the resulting converter is:
Tbin =
T
.
N ⋅M
In this scheme there is no restriction to the values of N and M, therefore it is
possible to directly obtain the measurements performed in a pure binary format.
Page 43
N delay cells
clkref
PD
hit
M rows
RC delay line
hit register
hit register
hit register
hit register
M taps
Figure 14: A T/D converter based on a DLL and a RC delay line.
To operate in a truly self-calibrating mode, the circuit that implements this scheme
should also include the RC delay line’s start-up calibration hardware. Fortunately a simple
code density test is sufficient to characterise the RC delay line. From this characterisation
the calibration parameters are obtained and then applied to the line. Any standard CMOS
technology can be used to implement this scheme.
4.3.
Summary of characteristics of the TDC architectures.
In the following table, a summary of the interesting characteristics of the
architectures that have been discussed in the chapter is presented.
Architecture
Resolution
Dynamic
Range
Dead
Time
Current Integration
Counter
Delay Line
PLL
DLL
Analogue Time Expansion
Vernier Differences
Circular Vernier
Dual Scale Vernier
Analogue Time Interpolation
Array of Coupled Oscillators
Array of Delay Locked Loops
DLL / RC delay line
+
+
+
+
++
++
++
++
++
++
++
++
inf.
inf.
inf.
- / inf.
inf.
inf.
inf. / inf.
inf.
inf.
no
no
no
---no
no / no
no
no
Time
Auto
Power
Interpolator Technology
Calibration Consumption
Sharing
yes
yes
yes
- / +yes
+- / yes
yes
+-
+
+
+
+
+
+
+
+++
no
yes
no
yes
yes
no
no
no
yes
yes / no
yes
yes
yes (DLL)
Ref.
[14]
analogue
[12]
digital
[20]
digital
[24]
digital
[13]
digital
analogue [32]/[33]
[34]
digital
[35]
digital
[36]
digital
analogue [37]/[38]
[39]
digital
[23]
digital
[22]
digital
Table 1: Comparison between the different architectures discussed in the chapter2.
2
Inf. (infinite) means that there is no intrinsic limit to the dynamic range that can be implemented. No dead time means
that there is no dead time in the time interpolation circuitry. There may be some dead time associated with the read-out
of the measurements. + and – means that the characteristic under consideration is advantageous (disadvantageous). + –
means that the condition is only partially met or that it is only met under certain conditions.
Page 44
References for Part I.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Rubbia, C., The quest for the infinitesimally small, CERN/PPE 94-15, Feb. 94.
Verweij, H., Electronics for experiments at CERN, CERN/ECP 91-4, Feb. 91.
The ALICE collaboration, ALICE – A large ion collider experiment technical
proposal, CERN/LHCC 95-71, Dec. 95.
Gomes, P. On-line algorithms for future HEP data acquisition systems, PhD. thesis,
Universidade Técnica de Lisboa, 1995.
Batyunya, B. et al., Influence of the time resolution of the time-of-flight system in
ALICE on the measurement of observables, ALICE/SIM 98-08 Internal note, Feb.
98.
Kluge, A., ALICE Time-of-Flight Readout – AFRO, ALICE Internal note, Jun. 99.
Martins, R. C. et al., Taxonomic problems on ADC characterisation, Proceeding of
the 5th. IEEE International Conference on Electronics, Circuits and Systems, Vol. 3,
pp. 445-448, Sep. 98.
Razavi, B., Principles of data conversion system design, IEEE press, Chapter 6,
1995.
Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84.
Gray, P. R. et al., Analysis and design of analogue integrated circuits, John Wiley &
Sons, Inc, Chapter 11, 1993.
Fish, P. J., Electronic noise and low noise design, McGraw Hill, Inc, 1994.
Porat, D. I., Review of sub-nanosecond time-interval measurements, IEEE
Transactions on Nuclear Science, Vol. 20, pp. 36-51, 1973.
Rahkonen, T. E. et al., The use of stabilized CMOS delay lines for the digitization
of short time intervals, IEEE Journal of Solid-State Circuits, Vol. 28, No.8, pp. 887894, Aug. 93.
Tanaka, M. et al., Development of Monolithic Time-to-Amplitude Converter for
High precision TOF Measurement, IEEE Trans. on Nuclear Science, Vol. 38, No. 2,
pp. 301-305, Apr. 91.
Sasaki, O. et al., A high-resolution TDC in TKO BOX system, IEEE Trans. on
Nuclear Science, Vol. 35, No. 1, Feb. 1988.
Stevens, A. E. et al., A Time-to-Voltage Converter and Analog Memory for
Colliding Beam Detectors, IEEE Journal of Solid State Circuits, Vol. 24, No.6, Dec.
89.
Yamrone, B. et al., LeCroy MQT300 charge-to-time converter, Conference Record
of the IEEE Nuclear Science Symposium 1996. Vol. 1, pp. 436-438, Nov. 96.
Page 45
[18] Veneziano, S. et al., Performances of a Multichannel 1 GHz TDC ASIC for the
KLOE Tracking Chamber, Proceedings of the Elba conference on Advanced
Detectors, 1997.
[19] Kim, L.-S., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-State
Circuits, Vol. 25, No. 4, pp. 942-951, Aug. 90.
[20] Bailly, P. et al., A 16-channel digital TDC chip, Conference Record of the IEEE
Nuclear Science Symposium 1997.
[21] Gogaet, S. et al., A 10 ps resolution 1.6 ns tuning range CMOS delay line for clock
deskewing in data recovery systems, Proc. ESSIRC'95, Lille - France, pp. 54-57,
Sep. 95.
[22] Mota, M. et al., A high-resolution time interpolator based on a Delay Locked Loop
and an RC delay line, IEEE Journal of Solid-State Circuits, Vol. 34, No. 10, pp.
1360-1366, Oct. 99.
[23] Mota, M. et al., A four channel, self-calibrating, high-resolution Time-to-Digital
Converter, Proceedings of the 5th. IEEE International Conference on Electronics,
Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 98.
[24] Arai, Y. et al. A time digitizer CMOS gate-array with a 250 ps time resolution,
IEEE Journal of Solid-State Circuits, Vol. 31, No. 2, pp. 212-220, Feb. 96.
[25] Dunning, J. et al., An all-digital Phase-Locked Loop with 50-cycle lock time
suitable for high-performance microprocessors, IEEE Journal of Solid-State
Circuits, Vol. 30, No. 4, pp. 412-422, Apr. 95.
[26] Johnson, M. G. et al., A variable delay line PLL for CPU-coprocessor
synchronisation, IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 12181223, Oct. 88.
[27] Loinaz, M. J. et al., A CMOS multichannel IC for pulse timing measurements with
1 mV sensitivity, IEEE Journal of Solid-State Circuits, Vol. 30, No. 12, pp. 13391349, Dec. 95.
[28] Weigland, T. C. et al., Analysis of timing jitter in CMOS ring oscillators,
Proceedings of International Symposium on Circuits and Systems (ISCAS), Jun. 94.
[29] Razavi, B. et al., Monolitic phase-locked loops and clock recovery circuits – theory
and design, IEEE press, 1996.
[30] Gardner, F. M., Phaselock techniques, John Wiley & Sons, 1979.
[31] Christiansen, J. et al., An integrated 16-channel CMOS time-to-digital converter,
Conference Record of the IEEE Nuclear Science Symposium 1993, pp. 625-629,
Oct. 93.
[32] Raisanen-Ruotsalainen, E. et al., A time digitiser with interpolation based on Timeto-Voltage Conversion, Proceedings of the 40th. Midwest Symposium on Circuits
and Systems (MSCAS), Vol. 1, pp. 197-200, Aug. 97.
[33] Blanar, G. et al., A self-calibrating high-resolution common stop time digitiser
circuit, IEEE Transactions on Nuclear Science, Vol. 45, No. 3, Pt. 1, pp. 801-804,
Jun. 98.
[34] Bailly, P. et al., A 100 picosecond resolution, 6 microsecond full scale multihit time
encoder, in CMOS technology. Proc. of Third International Conference on
Electronics for Future Colliders, pp. 57-68, May 93.
[35] Fota, C., Modélisation et étude de faisabilité d’un codeur de temps numérique à
haute résolution en technologie intégrée sur Silicium et Arséniure de Gallium. Thèse
de Doctorat de l’Université Pierre et Marie Curie (Paris VI), Dec 96.
Page 46
References for Part I.
[36] Gorbics, M. S. et al., A high-resolution multihit time to digital converter integrated
circuit, IEEE Transactions on Nuclear Science, Vol. 44, No. 3, Pt. 1, pp. 379-384,
Jun. 97.
[37] Knotts, T. A. et al., A 500MHz time digitiser IC with 15.625ps resolution, Digest of
Technical Papers of the IEEE International Solid-State Circuits Conference 1994,
Vol. 37, pp. 58-59, Feb. 94.
[38] Neyer, C. et al., Internal Note ALICE 94-07 (CERN).
[39] Maneatis, J. G. et al., Precise delay generation using coupled oscillators, IEEE
Journal of Solid-State Circuits, Vol. 28, No. 12, Dec. 93.
[40] Christiansen, J., An integrated high-resolution CMOS timing generator based on an
array of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol.31, No.7,
pp. 952-957, Jul. 96.
[41] Chu, H.-C. et al., A General High-Resolution Multiphase Clock Generator,
submitted to the IEEE Journal of Solid-State Circuits in Oct. 97.
Page 47
Page 48
PART II.
A TDC ARCHITECTURE BASED
ON AN ARRAY OF DELAY
LOCKED LOOPS.
Page 49
Page 50
In this Part of the dissertation we will discuss the work performed in order to
develop and demonstrate an architecture suitable for high-resolution time interval
measurements in the context of the ALICE Time-of-Flight detector collaboration.
Particle identification in the ALICE experiment requires an accurate measurement
of the time that the particles take to cross a cylindrical surface located at a fixed distance
from the interaction point. For this purpose a dedicated Time-of-Flight detector will be
built. The detector itself is able to resolve time with a resolution between 40ps and 100ps
RMS, depending on the technology chosen [1]. All the front-end components must have a
better resolution, in order not to compromise the characteristics of the detector.
Given the time uncertainty associated with the response of the detector and with the
underlying physical process, the main performance metric used to characterise the frontend electronics is the standard deviation of the error it generates, σ, also known as the
RMS (root mean square) resolution. In this application, it is required that the Time-toDigital converter has a RMS resolution better than 50ps across the full dynamic range.
Depending on the measurement method used (time tagging or start-stop), the
dynamic range that is required varies. To avoid any ambiguity, especially when the time
tagging method is used, the TDC must allow for a large dynamic range.
Another important feature of the detector is its granularity. In order to differentiate
particles crossing the detector close to each other, it is subdivided in a large number of
independent detector cells, each having its dedicated front-end. Therefore a large number
of electronic channels are required (> 150,000), of which the front-end must sit close to
the detector. The number of channels involved and area constrains imply a large
electronics integration level.
It is our goal to demonstrate an architecture that adheres to all the previous
requirements in terms of resolution and potential for dynamic range expansion. It allows
for start-stop and time tagging measures and has a low dead time between measures. We
Page 51
will use a standard “digital” CMOS technology that has a proven digital library available,
so that digital functionality can be easily implemented at low costs. The architecture
enables the integration of several TDC channels into a single chip and allows the sharing
of common data processing and buffering logic. To demonstrate this feature, four
conversion channels and a small number of simple system-related functions such as data
encoding and buffering are included. In this way the basic functionality required to build a
time acquisition system is included in the demonstrator.
The first chapter of this part (Chapter 5) is dedicated to the presentation of the
architecture being used. Analytic tools developed to study the way different errors that
may occur in the conversion circuitry will be exposed in Chapter 6. The Chapter 7
includes a detailed description of the important electronic blocks that define the converter
performance and in Chapter 8 this part of the dissertation is concluded by the exposition
of the experimental results that were obtained using the prototype TDC.
Page 52
Chapter 5.
Architecture Overview.
5.1.
The Delay Lock Loop (DLL).
A simple instrument to measure time intervals with fine resolution can be made with
a delay line tapped at regular (time) intervals. If a reference signal is progressing along
that line and its position is sensed at the limits of the time interval, the measured time is
proportional to the number of taps that the signal covered during this interval. The delay
between two consecutive taps is the constant of proportionality.
In standard CMOS technologies, the most commonly available delay cell is the
logic gate. The usual choice for these cells is the inverter because of its simplicity and
speed. Delays of the order of few hundreds of pico-seconds can currently be obtained
under worst case operating conditions in a 0.7µm CMOS technology.
Unfortunately, the gate delay is very sensitive to process parameters, temperature
and supply voltage. This means that the circuit has to be characterised periodically in
order to measure the delay of each gate. A simpler way of operating this circuit is to build
delay elements which delay can be externally controlled. The delay of the cells is
constantly sensed and forced to the desired value, regardless of environmental changes.
This is the operating principle of a Delay Locked Loop (DLL).
In a DLL the signal progressing through the delay line is a reference clock. A
control loop encloses the delay line and constantly monitors the delay between the
reference clock at the beginning and at the end of the line. If this delay is different from
one clock period, the control loop adjusts the delay of the delay cells until the correct
value is obtained.
When the hit signal is asserted, the status of the line is stored in a set of hit registers.
The stored data reflects the time difference from one edge of the reference clock to the
moment the data was stored. A random time interval can be measured if two of such time
differences are stored. The difference between them is the pretended measurement.
The control loop has three main functions: sense the delay difference between the
signal at the begin and end of the delay line, convert the error information into a
meaningful quantity and integrate and hold the control information until a new decision is
taken.
Page 53
These functions correspond to the building blocks on Figure 1. The phase detector is
used to determine if the delay line is too fast or too slow. A sequential phase detector is
usually chosen to perform this function. The resulting (binary) information is then
converted by the charge-pump into a “packet” of charge that is stored in (or taken from) a
filter capacitor. The capacitor in this example behaves as the loop integrator.
clock
D
Qb
C
Q
phase
detector
hit
hit registers
charge
pump
Figure 1: Delay Locked Loop block diagram.
In contrast with Phase Locked Loops (PLL), which have another integrator in the
VCO (voltage controlled oscillator), the DLL loop is a first order system. The presence of
the second integration and of the proportional term in PLL’s is due to the necessity of
tracking both phase and frequency. A DLL only tracks delay (or, equivalently, phase),
resulting in a simpler loop which is inherently stable.
The scheme so far described acquires only a limited number of features of the hit
signal, like the arrival time and possibly also the pulse length (if the delay between rise
and falling edge is also measured). Alternatively, the DLL can be used to generate a time
base for a set of registers that sample the hit signal with a short periodicity (determined by
the cell delay) [2], as shown in Figure 2. In this way, a full picture of the timing
characteristics of the hit signal can be sampled and stored. Digital signal processing
algorithms can then be used to extract the interesting features from the data stream.
clock
D
Qb
C
Q
phase
detector
hit registers
charge
pump
D
D
D
D
D
D
hit
Figure 2: Delay Locked Loop used in a time base application.
Page 54
Chapter 5: Architecture Overview.
With this scheme it is easy to identify glitches (short pulses) or any other undesired
pulse characteristics. However, this sampling scheme results in a continuos activity of the
hit registers, increasing the power dissipation and, possibly, the noise in the power supply.
Also, the data is produced at a quite high rate and therefore a large read-out bandwidth is
necessary to assure that no data is lost. In these conditions, data reduction algorithms must
be applied at a very early stage.
5.2.
The Array of DLL’s (ADLL).
The time resolution of a DLL based converter is determined by the gate delay. To
obtain better resolution either a faster technology is selected, which results in shorter gate
delays, or an architecture that is able to interpolate time within the gate delay is used.
One way of achieving this interpolation is to use a group of F Timing DLL’s that
have a small time offset between them. This offset is precisely determined by a Phase
Shifting DLL, which is locked to the same reference clock (see Figure 3) [3].
clkref
N
tn
φ1
tm
Vc
φ2
tn
φ1
Vc
tapm
φ2
tn
tapn
φ1
M
φ2
tapm+1
Tm
tn tapn-1
Vc
φ1
Vc
φ2
Tn
M<N
φ2
φ1
Vc
Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition.
A time offset smaller than the minimum gate delay is, of course, not possible to
obtain directly. However, it is possible to obtain an offset that is slightly larger than the
minimum gate delay. Assuming that the offset (Tm) is a fraction 1/F bigger than the delay
of each delay cell in the Timing DLL’s (Tn) then, as shown in Figure 3, the time offset
Page 55
obtained from corresponding taps in consecutive DLL’s is Toff = Tm= Tn·(1+1/F). If the
previous tap of the second DLL is used to define the end of the bin the resulting bin size
will be Tbin = Toff-Tn = Tn·(1+1/F)-Tn = Tn/F, as intended.
Bins in the extremities of the Timing DLL’s are defined from taps in opposite ends
of consecutive DLL’s, profiting from the periodicity of the clock.
The size of a bin is defined as the delay difference of taps in the two ends of the bin:
bin = tap m+1,n−1 − tap m,n = tap m+1 + tap n −1 − ( tap m + tap n ) ⇒
Tbin = (m + 1) ⋅ Tm + (n − 1) ⋅ Tn − (m ⋅ Tm + n ⋅ Tn ) ⇔
⇔ Tbin = Tm − Tn ,
where m and n are the position of the taps that define the bin, as shown in Figure 3.
Variable m represents the timing DLL and n is the tap number within that DLL (0 ≤ m < F
≤ M and 0 ≤ n < N). Delays Tn and Tm are related to the period of the reference clock (Tclk)
by the number of taps M and N of the respective DLL’s:
Tn =
Tclk
T
, Tm = clk .
N
M
From these equations, the relationship between M, N and F can be defined as:
Tclk
Tbin =
N = Tclk − Tclk ⇔
F
M
N
⇔M =N⋅
F
.
1+ F
This definition unfortunately shows that for any given fraction F, the applicable
values for M and N do not result in a number of bins N·F that is a pure binary number
(N·F?2n, for any n). To obtain such a convenient representation, a code conversion should
be performed latter in the data acquisition chain.
Contrary to other vernier techniques that also use delay differences to obtain subgate resolution, in this architecture all the delay lines are locked to the same reference
signal and only have to span a short length (corresponding to one reference clock period).
Also, the ADLL can be shared between several channels therefore increasing the
integration level and decreasing the overall power dissipation.
The relations that have been established show that using this scheme, one can
theoretically achieve a bin size that is any fraction of the original cell delay. In practice
this is not the case since this interpolating procedure, where a small time difference (TmTn) is extracted from two large delays ((m+1)·Tm+(n-1)·Tn and m·Tm+n·Tn) is very sensitive
to any errors present in the array. A small error in the definition of Tm or Tn is amplified
by the nature of the interpolation and becomes a significant part of Tbin, therefore limiting
the achievable resolution. Bins in the extremities of the DLL’s are also sensitive to the
Page 56
Chapter 5: Architecture Overview.
error accumulation, since they are interpolated from taps in opposite extremes of
consecutive DLL’s.
This interpolation method sets, therefore, stringent requirements on the DLL’s that
make up the array. Minimisation of device mismatch and of phase error are very
important design criteria.
An interpolator based on the ADLL scheme can, in principle, be designed in such a
way as to minimise reference clock jitter and all static sources of non-linearity. The
degradation of the time resolution due to delay cell mismatch is, however, harder to deal
with since it is a characteristic inherent to the fabrication of the circuit that cannot be
completely eliminated by design. Therefore delay cell mismatch, and ultimately device
mismatch, sets the limit to the resolution achievable with these converters.
0.3
delay cell
mismatch (σ)
ideal
0.25
0.2
1%
2%
0.15
3%
0.1
4%
5%
0.05
0
1
2
3
4
5
6
7
8
9
10
interpolation factor (F)
Figure 4: Interpolation limits due to cell mismatch.
The graphic of Figure 41 shows the root mean square (RMS) resolution that can be
achieved using an ADLL based interpolator in the presence of delay cell mismatch2
(assuming N=35 delay cells per Timing DLL). As would be expected the effects of
mismatch increase as the interpolation factor (F) increases. Therefore, the gain in
resolution obtained by increasing the interpolation factor vanishes after a certain level of
delay cell mismatch. The maximum interpolation factor that is rewarded by a consequent
improvement in resolution varies between F=4 and F=5, depending on the actual delay
cell mismatch.
5.3.
Conversion dynamic range.
The use of a periodic reference signal in the array of DLL’s makes it impossible to
differentiate two measurements resulting from hit signals arriving separated by multiples
1
Given a number of cells per Timing DLL, N, some of the interpolation factors, F, displayed do not result in
a realistic ADLL. However they are included for completeness.
2
As explained in Chapter 6, device parameter’s mismatch leads to identical delay cells having a different
propagation delay. The delay of a cell is seen as a random variable with a normal PDF, having a variance σ2.
Page 57
of the reference clock period. The dynamic range of such a converter is therefore limited
to one reference clock cycle.
The simplest way to increase the dynamic range would be to increase the clock
period. However this solution requires the use of a longer delay chain (or, conversely,
smaller resolution). A better solution is to include in the converter a counter synchronous
to the reference clock.
Reset
n bit
counter
n bit
counter
Register #0
Register #1
Clk
Hit
Sel
Coarse word
Clk
N
Register #0
Register #1
N
N+1
N+1
N+2
Sel
Coarse word
N
N+1
Figure 5: Dynamic range extension using two coarse time counters.
The counter is itself a converter with a coarse resolution (one reference clock cycle)
but a large dynamic range (depending on the number of bits implemented). Its results can
be appended to the results of the array conversion, which have a fine resolution but a
small dynamic range. Since both coarse and fine time words are obtained using the same
reference clock, no ambiguity is generated from the dynamic range extension.
The critical moment for such a scheme is when a measurement is performed while
the counter is switching and thus not yet stable. In this situation, the captured coarse word
is not predictable or may be in an intermediate state and thus induce metastability in the
hit registers.
If two counters are used, synchronous to opposite phases of the reference clock,
there is at any time one counter with stable outputs (see Figure 5). All the converter has to
do is to select the correct counter results in order to obtain the correct coarse
measurement.
Page 58
Chapter 5: Architecture Overview.
The selection of the stable counter is done in accordance with the phase of the
reference clock at the moment that the hit signal is asserted. Fortunately the status of the
DLL, that is acquired at the same moment, accurately reflects the phase of the reference
clock, thus it can be used to determine the correct coarse result.
Time stamp measurements obtained from such a converter are referred to an initial
instant in the beginning of the clock period when the coarse time counters are at zero. One
can thus see the counter reset signal as a common Start signal that sets the time zero at the
beginning of the clock cycle. In these conditions, the initialisation of the coarse counters
is also an important parameter for the performance of the converter. Start-Stop
measurements don’t require any special initialisation of the coarse counter, since they are
not referred to a particular initial instant.
5.4.
Time critical paths.
Timing information is delivered to the converter via two main signals: the reference
clock and the hit signals. The reference clock is used as the basis for the measurements
and the hit signals set the exact time the measurement is to be acquired. The high
frequency spectral components of these signals are determinant to set the accuracy of the
timing information received.
These signal paths must be handled carefully since any deterioration of the
respective signals’ time characteristics will not be regenerated inside the converter and
thus will degrade the resolution of the converter. Jitter in the reference clock received by
the array must be very small since the DLL loop is unable to filter jitter in its input signal
(see Appendix B).
Most of the noise that may couple into the time critical paths can effectively be
factored out if differential signalling levels are used. In fact, noise coupling into signal
paths at board and bonding level affect close-by paths in the same way, thus it is mainly
common mode noise. Several standards are commercially available for differential
signalling. Selection should be based on bandwidth, compatibility of supply levels,
simplicity of receivers, etc.
These considerations are not so critical inside the converter IC because there the
signal paths are short and the noise environment can be designed such that the signals are
not very sensitive to noise that is generated in the circuit. Increased noise immunity could
be achieved if differential logic was also used throughout the time critical circuitry inside
the circuit. However this increased noise immunity would be obtained at the expense of
increased power dissipation.
5.5.
Measurement acquisition and storage.
The measurement instant is defined by the assertion of the hit signal. At this
moment the status of the array and of the coarse counters is captured in a group of hit
Page 59
registers. Data stored in these registers reflects the time that lapses from the beginning of
the reference clock period to the instant the hit signal was asserted. The measurement
consists only of the storing operation, therefore the time spent on this operation is minimal
and the converter has no dead time.
The hit register is the interface between the time measurement circuitry and the
timing insensitive digital processing performed afterwards. Its activity has an important
contribution to the converter linearity and should therefore be treated as a time critical
circuitry.
In order to avoid degradation of the linearity of the converter due to the acquisition
stage, the latching instant must be well defined and the same for all the tap registers. This
requires matching-minded approach to the design and layout of the registers, since
mismatch at this level results in different latching times for each register and thus in
increased non-linearity of the converter. Furthermore, the latching signal should arrive at
the same time to every register involved in the measurement. If the intended resolution is
very high, small propagation delays along the lines that distribute this signal will degrade
the measurement accuracy, as will be shown in Chapter 6. In some topologic conditions
propagation delays may accumulate resulting in non-negligible non-linearity.
Due to the large number of registers integrated in one circuit, the power dissipation
may be important. A side effect of the large instantaneous currents that may be required at
the acquisition moment is the noise it induces in the power supply. Power supply noise at
this stage may cause crosstalk between channels, if they are performing measurements
concurrently. Careful power distribution is therefore necessary to reduce this effect and
also the possible deterioration of the DLL’s closed loop dynamic behaviour.
5.6.
Read-out architecture.
A converter circuit is not complete only with the time acquisition circuitry.
Important functions such as buffering, data encoding, data reduction and handling of the
read-out protocol have an impact on the converter performance and enhance its
functionality, turning it into an integrated time measurement system.
Data buffering is probably the function that has the biggest effect on the converter’s
performance (considering High-Energy Physics applications). Due to the random nature
of the assertion time of the hit signal, measurements must be performed at unpredictable
times. Usually the data acquisition system down-stream of the converter is only able to
handle a limited data rate from any given origin, because the communication medium is
shared between several data sources. Measurements acquired with shorter time separation
than the read-out period would then be lost, even if the converter it-self was fast enough to
process them. This would result in an increased converter dead time.
This limitation can be circumvented in several different ways. The read-out rate can
be made much higher, thus decreasing the minimum interval between two accepted
Page 60
Chapter 5: Architecture Overview.
measurements. Alternatively, a derandomising buffer can be included after the converter.
This buffer holds data arriving in quick succession until it can be read-out. Also a data
reduction function (trigger based data reduction) may exist that discards measures that do
not qualify in the acceptance criteria. If applied it can reduce the data rate significantly.
hit
hit
hit
@ Hit rate
Channel #0
Channel #1
Channel #N
Channel buffer(s)
@ Internal clock rate
Group buffer
@ Read-out rate
Read-out
Figure 6: Example of the first level of a read-out buffering hierarchy.
The first solution is usually not applicable. Increased read-out speed increases
system costs and results in an ineffective use of this resource since most of the time the
high speed would not be needed. The two other solutions, if used together, are very
effective in smoothing the read-out rate so that an effective usage of a low speed read-out
channel can be made without increasing the dead time between accepted hits.
Using one large derandomising buffer per channel would however be expensive in
terms of silicon usage. A preferred solution is to build a buffering hierarchy, by
partitioning the conversion channels into small groups, use a common buffer for each
group and a small individual buffer for each channel (as in Figure 6). Each group of
channels can then be merged into a larger “super-group” and so on, until the hierarchy that
is best adapted to the application has been built.
The size of the channel group and of the individual buffers is defined by the
expected acquisition rate and channel occupancy, as well a by the read-out rate and
allowed measurement loss. A good knowledge of the application in view is therefore
required, prior to defining these buffers.
Page 61
5.7.
The prototype.
A Time-to-Digital Converter (TDC) based on this architecture was built [4]. The
circuit demonstrates the feasibility of the ADLL as a time interpolator. Furthermore, to
emphasise the ability to integrate all the required functionality in a single, inexpensive,
circuit, the prototype was implemented in a commercial 0.7µm CMOS technology. A
block diagram depicting the prototype is shown in Figure 7.
clkref
PD
rsttime
PD
8 bit
counters
PD
M=28 cells
PD
N=35 cells
PD
4 channels
hit<3:0>
control.
hit enable
clkro
2-word
channel buffer
serial
interface
data encoder
program
interface
32-word
FIFO
read-out interface
Figure 7: The prototype block diagram.
The ADLL is made of four (F=4) Timing DLL’s each dividing the reference period
in 35 parts (N=35). A 28-tapped Phase Shifting DLL (M=28) is required to achieve the
correct adjustment of Timing DLL’s. An 8-bit coarse time counter is used to obtain a
dynamic range extension to 256 reference clock cycles.
Using an 80MHz reference clock (T=12,500ps), the bin size, over a full dynamic
range of T·256=3.2µs, is
Tbin =
Page 62
T
12,500
=
= 89.3ps .
F⋅N
140
Chapter 5: Architecture Overview.
The bin size of the independent DLL’s is Tm=446.4ps and Tn=357.1ps, respectively
for the Phase Shifting and for the Timing DLL’s. The reference clock (and the hit signal)
receivers are implemented differentially, to avoid common mode noise coupling into these
time critical paths.
The demonstrator includes a common data encoder that converts the ‘thermometer’
code in which the fine time measurements are encoded at the output of the ADLL into a
binary encoded word. It also merges the correct coarse time word into the final
measurement word. The encoding results in a data word reduction from 156-bit to 16-bit.
Four TDC channels were integrated in the IC. Each channel includes a two-word
deep asynchronous pipeline buffer (channel buffer). A common 32 word deep
derandomising buffer (group buffer) is also included in order to ease the read-out rate
requirements. This partition of the buffering hierarchy is well adapted to the low hit rate
expected in the application and demonstrates the partition concept. The read-out interface
logic, as well as the encoding and common buffering circuitry work asynchronously to the
reference clock (clkref), using a clock (clkro) of up to 40MHz.
A slow, serial read-out interface is also implemented to facilitate the necessary test
and debugging tasks. All necessary programming is performed via an independent
program port that is adapted for the daisy-chaining of several TDC’s in a single serial line.
In fact, the prototype includes sufficient functionality to allow it to be used in the actual
working environment, included in the data acquisition chain of a High-Energy Physics
experiment.
In the photograph of Figure 8, the main functional blocks of the prototype are
highlighted. This circuit is encapsulated in a 68-pin plastic PLCC package.
5.7.1.
Performance analysis.
Timing characteristics.
The short analysis that will be made here takes into account only the errors intrinsic
to converters built using this architecture. Other sources of errors degrade the resolution of
the measurements, but they can be avoided, or at least minimised by careful circuit design.
The LSB (least significant bit) of the converter, in the configuration proposed, is
Tbin=89.3ps. The theoretical RMS resolution σq is determined by the quantisation error:
σq =
Tbin
12
= 25.8ps .
The resolution is, however, limited by the unavoidable delay cell mismatch. The
analysis developed in the Chapter 6 shows that the maximum effect of cell mismatch is
seen in the middle of the last Timing DLL (m=F-1 and n=N/2). Assuming a mismatch
(σmatch) of 1%, the additional RMS error due to the array is:
Page 63
2
σ ADLL = σ match ⋅ F ⋅
F −1
N
 F + 1
⋅ (M − F + 1) ⋅ 
 + ⋅ Tbin = 12.8ps .
M
4
 F 
Coarse time
counter
Array of DLLs
Hit registers (4 channels)
Read-out
FIFO
Read-out and encoding logic
Figure 8: Prototype circuit showing main functional blocks.
In addition, unavoidable jitter present in the reference clock and intrinsic to the
closed loop operation is estimated to be on the order of σjitter=15ps. Adding these
contributions quadratically, the overall RMS resolution should be ~32.5ps (0.36LSB).
This value reflects the expected resolution if a number of converters are measured.
Individual converters may have a greater or smaller resolution, depending on their actual
matching parameters. Other sources of errors will most likely degrade the converter
resolution, therefore this value can be used as a benchmark to evaluate the characteristics
of the actual prototypes.
The results of tests carried out with the prototype are detailed in Chapter 8. They
show an overall RMS resolution of 34.5ps (0.38LSB), which is in accordance with the
expected value previously shown.
Page 64
Chapter 6.
Analysis of the Limits to the TDC Resolution.
In this chapter we will develop mathematical tools to predict and analyse the effects
of different error sources in the linearity and in the time resolution of a DLL based
converter. The analysis is extended to the more complex case of the ADLL. These
analysis tools allow for the translation of important system level performance parameters
into design variables that can then be used to judge the design against the expected
performance.
All the most important internal error sources are accounted for, namely the delay
cell mismatch, the dynamic behaviour of the closed control loop and several causes of
phase error.
6.1.
Non-linearity due to cell mismatch.
The delay cell defines the LSB of a DLL based converter. Delay differences
between cells produce variations of the LSB along the dynamic range. Therefore, the
conversion becomes non-linear and the resolution is degraded.
Although all cells have identical layout and are biased in the same conditions, their
delay is not the same. If the delay of a large number of these cells is measured, their
distribution is found to have mean µ and variance σ2. The delay of a cell can, therefore, be
seen as a random variable with a normal Probability Density Function (PDF) having a
mean µ and variance σ2. The mean corresponds to the expected cell delay, and the
variance gives a measure of the spread of the actual delays around that value.
6.1.1.
Origins of mismatch.
Delay mismatch has its origins in the variation, due to the fabrication process, of the
electrical parameters of the devices that constitute the cell. Two kinds of parameter
variations can be distinguished: local and global variations [5][6][7]. Local variations
affect devices that are immediate neighbours. This kind of random variation is generally
called parameter mismatch. Global variations affect devices that are located far away in
the same die, in different dies or even in different wafers. At a circuit level, global
variations can be seen as static errors that affect the absolute values of the respective
parameters. These variations are mainly due to process and temperature gradients, non-
Page 65
uniformity of the photo-lithographic processing caused by proximity effects and different
orientation of devices.
Circuit topologies that rely on relative, rather than absolute device parameters
effectively counter global mismatch variations. The DLL structures only rely on relative
cell delay, therefore the effects of global parameter variations will be disregarded in this
study.
The effects of local variations can be limited by proper layout of the cells, keeping a
constant orientation of the devices, avoiding temperature gradients and guaranteeing that
each cell has the same “physical” patterns in its vicinity. Local variations result from
unavoidable deviations from the intended values of key parameters during fabrication.
Thin oxide thickness, bulk doping levels, mobility, etc. suffer statistical variations that
affect important electrical parameters, such as the threshold voltage (Vt), the device
current factor (β) and the body factor (γ). These random variations are usually assumed to
be uncorrelated, having a normal distribution with a variance that is inversely proportional
to the gate area.
As devices approach their minimum feature size, especially in deep submicron
technologies, mismatch also becomes dependent on gate length, L and width, W,
separately. To guarantee a good matching behaviour, devices should be drawn with an
appropriate gate area and using conservative (larger than minimum) gate dimensions.
6.1.2.
Effects of cell delay mismatch.
The integral linearity error results from the accumulation of the individual cell delay
errors, subject to the limits imposed by the closed control loop of the DLL (the overall
delay of the line is the period of the reference clock). The analysis in Appendix C shows
that the standard deviation of the integral error (σDLL(i)) in a N-tapped DLL is defined by
the following expression, where σcell=σ/µ reflects the matching of the delay of the
individual delay cells as a fraction of their mean value µ=T/N.
σ DLL (i ) = σ cell ⋅
n
⋅ ( N − n) ,
N
where the timing variable n is defined in accordance with the bin position i along the
delay chain by n = Mod (i + 1, N ) 1, 0 ≤ i < N . This definition of the timing variable n will
be used throughout this chapter, in the context of the analysis of isolated DLL’s.
From the previous equation it can be observed that for the same matching between
delay cells σcell, the standard deviation of the integral error is bigger for longer delay lines
(higher N). Therefore, for a given cell delay, better results can be obtained using a short
1
The notation Mod(a,b) denotes the modulo operation. It is required to capture the reference periodicity of
the DLL timing interpolation: The last bin (N-1) has its limits defined by tap N-1 and tap 0.
Page 66
Chapter 6: Analysis of the Limits to the TDC Resolution.
delay line operating at higher frequency than with a long delay line operating at lower
frequency.
In a ADLL, time interpolation is obtained using taps from several phase shifted
DLL’s. The standard deviation of the overall integral error can be obtained by taking into
account the error accumulation along the DLL’s in the delay path. For any bin under
consideration, the path from the origin includes delay cells in the Phase Shifting DLL and
in the respective Timing DLL. Since delay variations due to mismatch are not correlated
between DLL’s, the standard deviation of the integral error is the square sum of the partial
errors:
2
n
 F + 1 m
σ array (i ) = F ⋅ σ cell ⋅ 
 ⋅ ⋅ (M − m ) + ⋅ ( N − n ) ,
N
 F  M
where M, N and F are defined in accordance with the allowed combinations for the array.
The phase shifting variable m and the timing variable n are, respectively the timing DLL
number and the bin number in the corresponding DLL. They are calculated taking into
account the staggering of the DLL’s across the clock period. If i (0 ≤ i < N ⋅ F ) is the
array bin number, then:
m = Mod (i + 1, F ) ,


 i + 1
n = Mod Floor
 − m, N  .
 F 


The Mod(a,b)2 and Floor(a) operations are, respectively the modulo and the integer
truncation operations. The definition of n is a generalisation of the one presented for the
isolated DLL case, where the interpolation factor was F=1. These definitions of the phase
shifting variable m and of the timing variable n will be used throughout this chapter, in the
context of the analysis of ADLL structures.
In Figure 1 an example of the expected integral error due to cell delay mismatch is
shown. It corresponds to the case of an array with N=35 (number of cells per timing
DLL), F=4 (interpolation factor) and a cell delay with a standard deviation of 0.01 (1%)
of the cell delay.
When several DLL’s are assembled in an array structure, the single DLL’s rounded
curve shape (also shown) is distorted by the introduction of the Phase Shifting DLL.
There is a strong periodic component with a periodicity of F, corresponding to the folding
of the array from the last Timing DLL to the first one.
The larger non-linearity found on the first part of the curve is due to the fact that
timing interpolation in this region is performed using cells in different extremities of
successive timing DLL’s.
2
The use of the modulo operation reflects the folding operation introduced by the ADLL scheme. This
results in some bins being defined from the time interpolation of taps in opposite extremes of consecutive
DLL’s.
Page 67
0.15
0.125
0.1
0.075
0.05
0.025
ADLL
single DLL
0
0
20
40
60
80
100
120
140
bin
Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1%
(ADLL: N=35 and F=4, single DLL: N=140).
6.2.
Jitter due to internal phase noise.
In the previous section the DLL was considered as an ideal closed control loop, able
to keep the delay of voltage controlled delay chain (VCDL) exactly at one reference clock
period. The deviations from the ideal behaviour found in real control loops can be
classified into two categories, in accordance with their origin:
• Deviations of external origin: The reference signal has some phase noise that is
propagated, without attenuation, along the VCDL. The control loop tries to track
these random reference period variations by constant changes of the delay of
each cell.
• Deviations of internal origin: The control loop tries to keep the delay of the
VCDL as close as possible to the reference period. In the absence of an ideal
feedback loop, the dynamics of the control loop will generate some variation of
the VCDL delay around its ideal value. These variations are seen as jitter.
Since we are mainly interested in the study of the DLL internal sources of errors, we
will focus on the deviations of internal origin, assuming an ideal reference clock.
The delay oscillation induced by the operation of the control loop translates into
jitter in the signal seen at the end of the delay chain. This jitter can be approximated,
without loss of generality, to a random delay error with a normal PDF. The mean value of
this error is µ jitter = 0 and the standard deviation, normalised to the reference period, is
σjitter.
The error due to jitter affects all delay cells in the same way but, since it is
completely correlated, the variance of the integral error increases linearly along the VCDL
of a DLL. The resulting standard deviation of the delay σDLL, normalised to the delay of a
single delay cell is (following the same definition of n as before and σ j = σ jitter ⋅ N ):
σ DLL (i ) = σ j ⋅
Page 68
n
.
N
Chapter 6: Analysis of the Limits to the TDC Resolution.
In the case of the array of DLL’s, the same considerations of the previous section
apply and, using the same naming conventions, the resulting variance is:
2
2
m n
σ array (i ) = σ j ⋅ F ⋅   +   .
M  N
Note that the DLL’s in the array have statistically independent jitter, therefore
standard deviation components from different DLL’s are added quadratically.
0.15
ADLL
0.125
single DLL
0.1
0.075
0.05
0.025
0
0
20
40
60
80
100
120
140
bin
Figure 2: Standard deviation curve resulting from a closed loop jitter of σ=0.1% of the reference period
(ADLL: N=35 and F=4, single DLL: N=140).
The curve in Figure 2 describes de effect of jitter with σ=0.1% of the reference
clock period (σj=3.5% of the cell delay if N=35). The topology of the ADLL is reflected
on the saw-tooth shape of the curve. The same periodic components described in the
previous section are present. For comparison, the effect of the same amount of jitter on a
single DLL is also shown.
6.3.
Non-linearity due to static phase error.
Systematic offsets and unwanted delays present in the converter adversely affect the
linearity of the system. They should be carefully identified and minimised. Main sources
of non-linearity, identified in Figure 3, are:
• Phase detector’s phase error (F(D1,D2)
D1=D2-T).
• Mismatch of the propagation delay of the lines carrying phase information from
the delay chain to the phase detector (τ1 τ2).
• Unbalanced load and signal characteristics on the delay cells at the extremes of
the delay chain (d0, dN-1
n, 1 n N-2).
• Propagation delay along the sampling signal distribution for the hit registers
(thit
Page 69
Clock
d0
d1
d2
dN-2
dN-1
τ2
τ1
Tap 0
Tap 1
D
Hit
τhit
Tap 2
D
Tap N-2
D
τhit
Tap N-1
D
τhit
D2
F(D1,D2)
D1
Phase
Detector
D
τhit
Figure 3: Detail of a delay locked loop depicting the important delays within the loop.
6.3.1.
Effects of phase detector’s phase error.
The phase detector responds to differences in the phase of its input signals by
generating an electrical quantity A (voltage, charge, etc) proportional to the measured
phase difference.
A(t ) = F (φ1 (t ), φ 2 (t )) , where F (φ1 (t ), φ 2 (t )) = K ⋅ (φ 2 (t ) − φ1 (t )) − C
and K and C are, respectively, the gain and the phase error of the phase detector. φ1(t) and
φ2(t) are the phases of the two signals being compared by the phase detector.
In the context of DLL analysis it is more convenient to discuss the properties of the
loop in terms of delay instead of phase. These concepts are equivalent, their relation being
given by the transformation 2 ⋅ π ⇒ T .
The previous equation is therefore transformed in:
A(t ) = F ( D1 (t ), D2 (t )) , where F ( D1 (t ), D2 (t )) = K ⋅ ( D2 (t ) − ( D1 (t ) + T )) − C
and the 2π phase difference between the two extremes of the delay line is explicitly stated
(clock period, T). D1(t) and D2(t) are the two delays being compared.
The loop equilibrium is obtained when A(t)=0, which should correspond to
D2 (t ) = D1 (t ) + T . However, this is not the case if C "!#%$"&'((&!#)*,+-"!#
detector error C will be reflected in the effective static delay (phase) error. The origin of C
may be attributed to an unbalanced phase detector, resulting in an offset in the output
signal.
The following discussion assumes an N-tapped DLL spanning a time interval Dtot
that corresponds to a reference clock of period T. It is further assumed that no errors, other
than the one under study, are present.
In equilibrium,
K ⋅ ( D2 (t ) − D1 (t ) − T ) − C = 0 ⇔ D2 (t ) − D1 (t ) − T = Derr (t ) = C K .
Page 70
Chapter 6: Analysis of the Limits to the TDC Resolution.
The total time interval spanned by the delay chain is Dchain = T + C K . Therefore
the length of each bin is
di =
T
1 C
+ ⋅ ,
N N K
0 ≤ i < N −1.
Since the periodicity of the reference clock is T, the total time covered by the delay
chain must be Dchain = T . The remaining delay is subtracted from the last bin of the chain
d i , i = N − 1 , which is defined from the time difference (modulo T) between two taps on
opposite extremes of the delay chain (tap N-1 and tap 0 in Figure 3).
di =
T N −1 C
−
⋅ ,
N
N K
i = N −1.
In Figure 4 the effect of this error mechanism is illustrated. Each rectangle
corresponds to a bin. For comparison the ideal case is shown in the top of the figure.
Notice that due to the periodicity of the scheme (period T), the last bin corresponds to a
fraction of the delay of the last cell.
T/N
(ideal)
bin 0
bin N-1
T
bin N-1
bin 0
C/K
bin 0
T/N+1/N.C/K
T/N-(N-1)/N.C/K
T+C/K
Figure 4: Illustration of the effect of the phase detector’s phase error (N=5).
The error of the phase detector can be referenced to its input, and translated into an
added delay to one of the input signals. If we set delay τ ’diff = C/K, this delay can be
lumped into the propagation delay mismatch of the input paths (τdiff) and the phase
detector considered ideal.
The behaviour of digital, two-state phase detectors is quite different, because they
don’t extract information on the magnitude of the phase error. However the static phase
error of such a phase detector may also be referenced to its input and therefore can be
studied in the same way.
Page 71
6.3.2.
Effects of phase detector input paths’ delay mismatch.
If the propagation delay of the signals carrying the phase information from the two
extremes of the delay line to the phase detector is different, then this difference will
induce conversion non-linearity:
Dchain = T + (τ 2 − τ1 ) ⋅ T = (1 + τ diff ) ⋅ T ,
where τ1 and τ2, the propagation delays shown in Figure 3, are normalised to the reference
period T. Therefore,
1 1

d i =  + ⋅ τ diff  ⋅ T ,
N N

0 ≤ i < N −1
 1 N −1

di =  −
⋅ τ diff  ⋅ T ,
N
N

i = N −1
This effect is illustrated in Figure 5:
T/N
(ideal)
bin 0
bin N-1
T
bin N-1
bin 0
τdiff.T
bin 0
T/N.(1+τdiff)
T/N.(1-(N-1).τdiff)
T+C/K
Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch (N=5).
Assuming, in the interest of simplicity, that C/K and τdiff are represented as a
fraction of the reference period T, the conversion integral non-linearity due to these errors
is obtained from the expression:
INLDLL (i ) =
6.3.3.
1
C
⋅ n ⋅ ( + τ diff ) , 0 ≤ i ≤ N − 1
N
K
Effects of unbalanced conditions of the cells in the extremes of the delay
chain.
Cells in the extremes of the delay chain are under the effect of different environment
conditions. For example, the last cell in the chain drives a smaller load than internal cells
and the signal arriving in the first cell has different rise time than the signals inside the
delay chain. For simplicity we will consider that these conditions affect only the bins on
Page 72
Chapter 6: Analysis of the Limits to the TDC Resolution.
the extremities of the delay chain. In this case the resulting bin delays due to an increase
of δ in and of δ out in the delay of the first and the last bin are:
 ( N − 1) ⋅ δ in − δ out  T
d i = 1 +
⋅ ,
N

 N
i = 0,
 δ + δ out  T
d i = 1 − in
⋅ ,
N

 N
1≤ i ≤ N − 2,
 ( N − 1) ⋅ δ out − δ in  T
d i = 1 +
⋅ ,
N

 N
i = N −1.
The effects of unbalanced conditions of the cells in the extremes of the delay chain
are separately illustrated in Figure 6 (for the first cell) and in Figure 7 (for the last cell). In
both cases the larger first (or last) cell leads to a larger first (or last) bin and to smaller
other bins, thus maintaining the clock periodicity of the scheme.
T/N
(ideal)
bin 0
bin N-1
bin 0
T
bin N-1
bin 0
T/N.(1+(N-1)/N.δin)
T/N.(1-1/N.δin)
Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay chain (N=5).
T/N
(ideal)
bin 0
bin N-1
bin 0
T
bin N-1
bin 0
T/N.(1-1/N.δout)
T/N.(1+(N-1)/N.δout)
Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay chain (N=5).
The expression for the conversion integral non-linearity due to these errors is,
therefore:
INLDLL (i ) = δ in ⋅
n′
n
− δ out ⋅ ,
N
N
0 ≤ i ≤ N −1,
where n ′ = N − 1 − i and n was previously defined.
Page 73
6.3.4.
Effects of propagation delay on the sampling signal path.
All non-linearity sources within the DLL loop have been covered, but there is also
an external source that affects the linearity of a DLL based converter. In fact, due to
unavoidable propagation delays in the hit sampling signal distribution, the sampling of the
status of the DLL occurs at different times for different taps. The error generated by this
effect is a function of the hit register topology.
This effect is corresponds to the vernier interpolator configuration previously
described (see Chapter 4) Considering, for example, the linear hit sampling signal
distribution configuration shown in Figure 3 and a constant3 τhit propagation delay per hit
register, the resulting apparent cell delay is:
d i = (1 − τ hit ) ⋅
T
,
N
d i = (1 + ( N − 1) ⋅ τ hit ) ⋅
0 ≤ i ≤ N − 2,
T
,
N
i = N −1.
This effect is illustrated in Figure 8. In this case the last bin is extended to the end of
the clock period so that the full period is covered.
T/N
bin 0
bin N-1
bin 0
T
bin N-1
bin 0
T/N.(1-τhit)
T/N.(1+(N-1).τhit)
Figure 8: Illustration of the effect of the propagation delay on the sampling signal path – case of the linear
hit signal distribution network (N=5).
The linearity of the conversion is given by:
INLDLL = − τ hit ⋅ n ,
0 ≤ i ≤ N −1.
In order to reduce this effect, lines with smaller propagation delays can be used.
Alternatively more complex distribution configurations, such as the T-shaped distribution
network, can be used. In this distribution network the hit sampling signal is distributed in
two separate branches starting from the middle of the hit register row. In this way, the
distance from the source to the register further away is halved, and therefore the
propagation delay τhit is reduced. A positive side effect of this network is that in one of the
branches the vernier interpolation results in smaller bins and in the other in larger bins.
3
A signal propagating along a finite RC delay line does not progress at constant speed. Typically it
accelerates along the line, therefore τhit is not constant. However this convenient simplification enables a
faster understanding of this effect.
Page 74
Chapter 6: Analysis of the Limits to the TDC Resolution.
The Figure 3 is repeated in Figure 9 for this configuration. This configuration reduces the
integral non-linearity (see Chapter 7 for a detailed analysis of this distribution network).
For this particular distribution, the resulting effective cell delay is (assuming, for
simplicity, an even number of delay cells, N):
d i = (1 + τ hit ) ⋅
T
,
N
0≤i<
d i = (1 − τ hit ) ⋅
T
,
N
N
≤ i ≤ N −1.
2
Clock
d0
dN/2-1
d1
dN/2
N
,
2
dN-2
dN-1
D2
τ2
F(D1,D2)
D1
τ1
Tap 0
Tap 1
D
τhit
Tap N/2-1
D
Tap N/2
D
τhit
Tap N-2
D
τhit
D
τhit
Phase
Detector
Tap N-1
D
τhit
Hit
Figure 9: The T-shaped hit signal distribution network.
The illustration of this effect, for the T-shaped sampling signal distribution network
is shown in Figure 10. Notice the larger initial bins and the smaller final bins.
T/N
(ideal)
bin 0
bin N-1
bin 0
T
bin N-1
bin 0
T/N.(1+τhit)
T/N.(1-τhit)
Figure 10: Illustration of the effect of the propagation delay on the sampling signal path – case of the Tshaped hit signal distribution network (N=5).
The linearity of the conversion due to this error source is given by:
N
N 
INLDLL = τ hit ⋅  − n −
,
2 
2
0≤n< N.
Page 75
6.3.5.
Overall non-linearity due to static phase error.
The effects of all static error sources can be included in a single integral nonlinearity expression, where i is the bin position 0 ≤ i < N . Making the following variable
C
substitutions, Din = δ in , D PD = + τ diff , Dout = δ out , Dhit = − τ hit and
K
n = Mod (i + 1, N ) ,
n′ = N − 1 − i ,
the overall integral non-linearity expression is obtained:
INLDLL (i ) = Din ⋅
n′
n
+ ( DPD − Dout + Dhit ⋅ N ) ⋅ ,
N
N
in case the linear hit signal distribution is being used or
INLDLL (i ) = Din ⋅
n′
n
+ (DPD − Dout ) ⋅ − Dhit
N
N
N
N 
⋅  − n −
,
2 
2
if the alternative T-shaped distribution hit signal distribution is being used.
In the case of the array of DLL’s, the integral non-linearity along the delay path is
added linearly. We assume that, regardless of the actual detailed hit signal distribution, to
each of the Timing DLL’s corresponds a set of hit registers that are driven through a
separate signal path. In this context, Figure 3 and Figure 9 correspond to one of the
Timing DLL’s that make up the array. The Phase Shifting DLL is not directly sampled
therefore this effect is only visible in the Timing DLL’s.
Taking into account the staggering of the multiple Timing DLL’s we define the
following variables as a function of the position of the bin i ( 0 ≤ i < N ⋅ F ):
m = Mod (i + 1, F ) ,


 i + 1
n = Mod Floor
 − m, N  ,
 F 



 i + 1 
n′ = Mod m − Floor
, N  .
 F  

The overall integral non-linearity expression is:
 m F + 1 n′ 
m n
INLarray (i ) = Din ⋅ F ⋅  −
⋅
+  + DPD ⋅ F ⋅  +  +
F
N
 M
M N
,
 m F +1 n 
− Dout ⋅ F ⋅  ⋅
+  + Dhit ⋅ F ⋅ n
F
N
M
if the linear hit signal distribution is being used or
Page 76
Chapter 6: Analysis of the Limits to the TDC Resolution.
 m F + 1 n′ 
m n
INLarray (i ) = Din ⋅ F ⋅  −
⋅
+  + DPD ⋅ F ⋅  +  +
F
N
 M
M N
,
N
N 
 m F +1 n 
− Dout ⋅ F ⋅  ⋅
+  − Dhit ⋅ F ⋅  − n −

F
N
2 
M
2
if the alternative T-shaped distribution is chosen.
The curves in Figure 11, Figure 12, Figure 13 and Figure 14 are intended to
illustrate the shape of the INL curve resulting from the indicated sources of linearity
errors. No attempt is made to compare them, since they don’t reflect an actual value. For
completeness, the corresponding DNL graphs are also shown. They are directly obtained
from the respective INL curve.
0.2
0.15
0.1
0.15
0.05
0
0.1
-0.05
0.05
-0.1
ADLL
single DLL
-0.15
0
0
20
40
60
80
100
120
140
0
20
40
60
bin
80
100
120
140
bin
Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase detector input path’s
delay mismatch): DPD (
C
+ τ diff ) =0.1% of the reference period
K
(ADLL: N=35 and F=4, single DLL: N=140).
0.1
0.05
ADLL
0.075
single DLL
0.025
0.05
0.025
0
0
-0.025
-0.025
-0.05
-0.05
0
20
40
60
80
100
120
140
0
20
40
bin
60
80
100
120
140
bin
Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells in the extremes of
the delay chain : Din(δin)=1% and Dout(δout)=1% of the average cell
(ADLL: N=35 and F=4, single DLL: N=140).
Page 77
0.15
0
0.1
0.05
-0.05
0
-0.1
-0.05
-0.1
ADLL
single DLL
-0.15
-0.15
0
20
40
60
80
100
120
0
140
20
40
60
80
100
120
140
bin
bin
Figure 13: DNL and INL curves resulting from the propagation delay on the sampling signal path (linear hit
signal distribution network): Dhit(−τhit)=0.1% of the reference period
(ADLL: N=35 and F=4, single DLL: N=140).
0.05
0.1
0.025
0.075
0
0.05
-0.025
0.025
ADLL
single DLL
0
-0.05
0
20
40
60
80
100
120
0
140
20
40
60
80
100
120
140
bin
bin
Figure 14: DNL and INL curves resulting from the propagation delay on the sampling signal path
(T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the reference period
(ADLL: N=35 and F=4, single DLL: N=140).
In Figure 15, The combined effect of all these sources of non-linearity, when using
the T-shaped hit signal distribution network, is shown.
0.15
0.15
0.1
0.125
0.05
0.1
0
0.075
-0.05
0.05
-0.1
ADLL
0.025
single DLL
-0.15
0
0
20
40
60
80
100
120
140
0
20
40
bin
60
80
100
bin
Figure 15: DNL and INL curves resulting from the combination of the previous curves
(ADLL: N=35 and F=4, single DLL: N=140).
Page 78
120
140
Chapter 7.
Detailed Implementation.
The circuitry included in the ADLL, as well as the channel buffers are the critical
circuit blocks responsible for the performance of the converter. Their implementation will
be analysed in detail, highlighting the advantages expected from the design options taken.
7.1.
DLL building blocks.
7.1.1.
Phase detector.
The DLL closed loop operation is, in normal conditions, only required to track
variations of the delay between the two extremes of the VCDL, the frequency of the
reference signal being constant. In these conditions a simple two-state phase detector can
be effectively employed. This phase detector presents some advantageous characteristics,
such as implementation simplicity and ±T/2 operating range1.
@clk
VCDL_out
D
Q
D=1
VCDL_in
Qb
D=0
VCDL_fast
Q=1
Q=0
D=0
VCDL_slow
D=1
Figure 1: D-flip-flop operating as a two-state phase detector.
A D-flip-flop (D-FF) connected as in Figure 1 behaves as a two-state phase detector.
It samples the signal coming out of the DLL delay chain (VCDL_out) at the rising edge of
the reference clock entering the chain (VCDL_in). Therefore, the phase detector output
reflects the absolute value of the delay difference (referred as the phase error).
When a zero phase error situation is approached, the output of the phase detector
will permanently shift from one state to the other, resulting in what is called a “bangbang” behaviour of the closed loop it controls. Therefore the average phase error of the
1
The standard notation to describe phase detector operation refers to phase instead of period. Following that
notation the operating range would be termed ±π, instead of ±T/2. However these are equivalent notations
and in the context of DLL’s and TDC’s it seems more adequate to deal with time and delay instead of
frequency and phase. Some exceptions to this rule are made, for example, we use the usual nouns Phase
Error and Phase Detector, instead of Delay Error and Delay Detector.
Page 79
closed loop is zero, but its instantaneous value oscillates around this ideal value without
ever settling into it. The oscillation amplitude is independent of the phase detector. It is set
by other loop parameters.
Vpd
-T/2
Vpd
T/2
φe
-T/2
T/2 φe
Figure 2: General and D-FF based two-state phase detector transfer characteristic.
The transfer curve of a general two-state phase detector is shown in Figure 2. The
bi-stable characteristic of the D-FF based phase detector is also shown. It does not carry
quantitative information about the phase error, however when integrated along the time,
the general transfer curve is obtained.
Optionally a 3-state sequential phase-frequency detector (PFD) [8],[9] could have
been used, and the “bang-bang” behaviour avoided. However, 3-state PFDs are more
complex devices and must be carefully designed to avoid developing a dead-band around
the zero-phase error. The main application of 3-state PDFs is in PLL control loops, where
their ability to capture frequency error information is required. Furthermore, since the
main function of a PLL is to track frequency, they can usually tolerate small phase errors.
This is not the case for a DLL, whose main function is to track delay (phase). Since the
amplitude of the “bang-bang” oscillation can be made arbitrarily small by setting the
corresponding loop parameters, it is preferable to use the simpler 2-state phase detector
configuration.
The information on the amplitude of the phase error carried by the PFD output also
enables it to perform faster corrections to the VCDL, in case of severe reference clock
period variations. However this feature is not necessary in a TDC, where the reference
clock is, by definition, stable. It is therefore more important to avoid the dead-band and
obtain a better discrimination around zero phase error, which is easier to achieve if the 2state phase detector is used.
D-flip-flop implementation.
In Chapter 6, the degradation of the converter linearity due to a phase error
generated in the phase detector was discussed. It is therefore important to understand what
are the phase detector characteristics that generate a phase error, in order to be able to
counteract them.
Page 80
Chapter 7: Detailed Implementation.
Two conditions may generate a phase error in a D-FF based phase detector:
• Sampling moment shifted from the input signal’s arrival time (for example, due
to unbalanced loads in internal nodes).
• Metastability conditions.
To avoid these conditions, the sampling uncertainty of the D-FF must be limited to a
very narrow time window exactly centred on the arrival time of the input signal rising
edge (the sampling instant). These characteristics should not change in any operating
conditions and should be immune to process variations or device mismatch.
The configuration we will study is the balanced implementation of a D-FF, as
described on [10] and shown in Figure 3. In this topology, all internal nodes have the
same fanout and all gates have the same driving capability. A very balanced circuit is
obtained and therefore no shift should be seen in the sampling instant.
The critical nodes that define the speed of the data latching are included in the SR#1
block highlighted in Figure 3. This latch should be very fast to achieve its final state, after
a change in the inputs. In these conditions, the sampling time is well resolved under any
operating conditions.
dummy gate
D
SR#1
dummy gate
Figure 3: Balanced D-flip-flop topology.
Metastability will affect the phase detector operation by delaying the phase detector
decision. This, in turn, will limit the amplitude of the corrections the closed control loop
can perform in one reference clock period. If the delay is large enough, the decision may
not be taken at all, resulting in the absence of a correct control loop decision during that
period (corresponding to one clock cycle). If the metastability probability is large, a “dead
band” where the loop is unable to react to delay differences, will appear around the zero
phase error point. To avoid this situation the D-FF must be able to get out of the
metastable condition very quickly. Again, the critical SR#1 latch must be designed having
in mind this problem [11].
Page 81
This D-FF topology does not produce any hysteresis in its transfer function, since
the state of the critical SR#1 latch is independent of the output state of the flip-flop.
Therefore no “dead band” related to hysteresis can exist.
The D-FF implemented is a variation of the one shown in Figure 3, where maximum
priority was given to the correct operation of the critical latch. For this the inherently slow
3-input SR latch was substituted for a faster 2-input latch, as shown in Figure 4. The
layout of two and gates that had to be introduced in the decision path is equal and is made
in close proximity so that their delay matching is optimised and they are simultaneously
affected by supply noise. In this way, these two gates only affect the latency of the phase
detector and not its timing resolution or its static phase error.
dummy gate
and#1
D
SR#1
and#2
dummy gate
Figure 4: Balanced D-FF topology featuring fast SR#1 operation.
Device matching also affects the performance of the circuit, by making the delay of
identical gates different from each other. All devices have, therefore, large gate area and
their layout is done following matching minded rules [12],[13]. The width of the gate is
also determined by the speed requirements. Simulations have shown that, for the
technology used, a 3:1 ratio between effective gate sizes of the PMOS and the NMOS
branch of the gates results in an improved phase detector accuracy and a smaller
dependency on environment variations.
The accuracy of the phase detector, obtained from simulations is better than 12ps
under any environment or process conditions. In the presence of large mismatch
(simulated by varying the gate length of selected devices) a maximal degradation of the
accuracy to 22ps was observed.
7.1.2.
Charge-pump and loop filter.
The behaviour of closed control loops built with a sequential phase detector, a
charge-pump and a filter have been analysed in detail [14],[15] and numerical simulation
Page 82
Chapter 7: Detailed Implementation.
models have been built [16]. These loops present several advantages in comparison with
the conventional loops built with a combinatorial phase detector and filter. The main
advantage for our application is their ability to obtain zero static phase error using a
passive loop filter.
The charge-pump, together with the loop filter convert the logic state of the phase
detector into an analogue quantity that can be used to control the delay chain. Since the
control loop is only required to track delay variations between the two extremes of the
VCDL, the loop filter can be made of a simple capacitor. The resulting closed control loop
is a first order system, therefore it is inherently stable.
The charge-pump is made of a current source and a current sink that, depending on
the state of the phase detector will either deliver a “packet” of charge, or extract a
“packet” of charge from the loop filter capacitor. The capacitor behaves as an integrator of
the charge, converting it into the control voltage for the VCDL.
Icp
Vctrl
(from phase
detector)
Icp
(to VCDL)
Cfilter
Figure 5: Charge-pump and filter capacitor block diagram.
This configuration of charge-pump and 2-state phase detector leads to the “bangbang” behaviour of the closed control loop. After delay lock has been achieved, the actual
delay of the delay chain will be permanently oscillating around the zero phase error delay.
This oscillation translates into loop jitter. Assuming an otherwise ideal loop behaviour,
the amplitude ∆Vctrl of the oscillation corresponds to the charging (discharging) of the
filter capacitor (Cfilter) by a constant current (Icp) during the reference period (T):
∆Vctrl =
I cp ⋅ T
C filter
.
Therefore, given a fixed reference period, the only way to decrease the amplitude of
the oscillation and the loop jitter is to reduce the charge-pump current and/or increase the
filter capacitance.
The current on the two branches of the charge-pump is assumed matched. However
this is not a very critical parameter if only low amplitude ∆Vctrl oscillations are allowed,
since the static phase error it may entail is very small (smaller than the amplitude of
oscillation).
Page 83
Charge-pump implementation.
The implementation of the charge-pump is driven by the necessity of accurately
switch current sources into a capacitive node. In this context, the current switches are
critical to the correct behaviour of the circuit. Gate signal feedthrough in these switches
results in unwanted changes in the amount of charge stored in the filter capacitor. If these
changes are comparable to changes due to normal loop function, the behaviour of the loop
becomes unpredictable and a large static phase error may develop.
Mdp
M:1
Icp
Icp
Mswp
Cgdp
VCDLfast
Vctrl
Cgdn
Cgdp
VCDLslow
Cgdn
Mswn
Icp
Mop
Vctrl
Mon
Icp
M:1
Mdn
Figure 6: Charge-pump topologies (simplified).
In the first schematic of Figure 6 the feedthrough mechanism is illustrated. The gate
drain overlap capacitance of the switch transistors (Msw) and the filter capacitor work as a
capacitive voltage divider. Therefore when the switch of a charge-pump branch opens,
Vctrl will experience a variation proportional to:
∆Vctrl =
C gd ⋅ ∆V g
C gd + C filter
.
The gate voltage swing ∆Vg is, in this case, the supply voltage.
To guarantee that the Vctrl variation due to the control loop is bigger than the
parasitic variation due to feedthrough, the charge-pump current should be:
I cp >>
∆V g
T
⋅
C gd ⋅ C filter
C gd + C filter
≈
∆V g ⋅ C gd
T
.
The second schematic in Figure 6 shows the circuit used to reduce the feedthrough
into the Vctrl node. In this circuit the switching activity is mixed with the current
mirroring. Switching is limited to move the Vgs of the output transistors (Mo) to just below
their threshold voltage, reducing ∆Vg to a small swing. Cgd is also reduced, since these
transistors are made narrow to obtain low charge-pump currents.
Page 84
Chapter 7: Detailed Implementation.
A diode-connected transistor (Md) defines the lower limit to the ∆Vg swing. To
make its Vgs voltage lower than the threshold voltage of the output transistor, it is designed
very wide and short, its threshold voltage resulting smaller. The output transistor, on the
other hand, is conveniently narrow and long, therefore it has a slightly higher threshold
voltage, as intended.
Since the output transistors (Mo) are only lightly switched off, the sub-threshold
current is not completely eliminated. However, this current is substantially smaller than
the “on” current, therefore it does not affect the operation of the charge-pump.
When the charge-pump operates at low current levels, the mirror transistor operates
with a Vgs only a few hundred milivolts higher than threshold voltage, resulting in an order
of magnitude reduction in the ∆Vg swing. Overall, a 20 to 50 times reduction in the
minimum usable charge-pump current can be obtained using this scheme.
However, the switching speed of this charge-pump scheme is low. When a branch is
released, the gate of the output transistor must be charged using the limited current
available from the current mirror source. In order to increase the switching speed a current
dividing mirror should be used. The switching speed limits the reduction of Icp that can be
achieved, since the effective time T’ in which the charge-pump current is available to act
on the Vctrl is smaller than the period T.
Using this configuration, current levels as low as 200nA can be used. Taking into
account the limited speed of the switch at this low current levels and other design
constrains, the charge-pump implemented was designed to deliver a (programmable)
current between 10µA and 100µA.
Filter capacitor.
The filter capacitor was made as a n-well isolated PMOS transistor working in
accumulation mode [17]. In this mode of operation, a majority carrier channel is always
present under the gate. This results in a voltage independent capacitance across the
transistor gate2 and, due to the ready availability of carriers, it also has good high
frequency characteristics. A capacitor built this way has the back plate always tied to
ground. Therefore the control voltage Vctrl is defined having the ground node as a
reference.
Using a large transistor gate area, a capacity of ~47.7pF is obtained. If minimum
charge-pump current levels are used, the resulting voltage control step is 2.6mV per
reference clock period (T=12,500ps).
2
This statement holds true for most of the applicable gate voltage range with the exception of a narrow very
low gate voltage range, where a depletion region subsists underneath the gate oxide and the gate capacitance
is voltage dependent.
Page 85
7.1.3.
Delay cell.
The VCDL is made of a number of identical delay cells. In these cells the control
voltage generated in the closed control loop is translated into a propagation delay. The
ADLL is made of two different types of DLL’s, the Timing DLL, that requires a cell delay
of T/N = 357.1ps and the Phase Shifting DLL, that requires a delay of T/M = 446.4ps per
cell. These DLL’s are built using the same building blocks (but a different number of
delay cells) therefore the delay cell operating range must cover the two distinct operating
points, in any conditions. Using four Timing DLL’s in a ADLL architecture, a time
interpolation F=4 times better than the simple Timing DLL is obtained, leading to
stringent matching requirements for the delay cells.
The ADLL architecture uses a large number of fast identical cells. Furthermore, the
delay matching required between these cells leads to the specification of large sized
devices, which results in high gate capacitance. To drive these high loads at the necessary
speed, large power dissipation is required. It is therefore important to choose a cell
structure that reduces the dissipation, for a given speed and matching requirements.
The delay of a cell is sensitive to temperature and supply voltage variations. It also
depends on the process parameters. The correct operation of the DLL closed lock loop
therefore, requires that a sufficient delay range is available to cover any operating
conditions.
Choice of cell structure.
In summary, the choice of delay cell structure must conform to the following
criteria:
•
Power dissipation.
•
Noise sensitivity.
•
Device matching.
•
Cell delay control range.
Two structures where compared having in mind the particular operation of a DLL.
These structures where the differential cell using symmetric loads as developed in [18]
and the single-ended cell, based on a current-starved inverter structure.
The sudden supply current variations due to the switching activity of the singleended delay cell structures entail noise in the power supply network. Supply noise
translates into changes in the instantaneous decision threshold of each inverter and
therefore in the time characteristics of the other cells in the delay line. Differential delay
cells enjoy an apparent advantage in this respect, since their large common mode rejection
ratio (CMRR) insures good supply noise immunity. Also their constant power dissipation
Page 86
Chapter 7: Detailed Implementation.
generates less supply noise. On the other hand, the constant tail current used in the
differential delay cell significantly increases the power dissipation of the ADLL structure.
One important characteristic of the operation of a locked DLL is that all switching
activity in the delay line occurs evenly spread along the reference period, as illustrated on
Figure 7. As a consequence, the instantaneous current requirements are averaged along the
time and, therefore, the inductive supply voltage variations are strongly reduced. In these
conditions, the delay cells that make up the DLL are not adversely affected by the
switching activity and a careful distribution of the power supply, separating the DLL from
any noisy digital circuitry, will suffice to obtain a good noise performance. Simple, and
more power conservative single-ended delay cells are, therefore, a viable alternative to
differential logic.
Clock
Phase
Detector
Voltage
at tapi
VDD
0
1
2
Charge
Pump
N-1
N
0
Current from
supply
T/N
T
T/N
T
t
Iave
0
Figure 7: Rising edge propagation along the DLL delay line and corresponding current consumption.
In order to obtain a high CMRR [18], a differential amplifier must have a linear
resistive load in each branch. Furthermore, the impedance of the tail current mirror must
be high. In the delay cell shown in Figure 8, a variable linear load is obtained using the
symmetric load structure. If correctly biased, this structure guarantees a first order
linearity of a high impedance load around the half-swing output voltage. Automatic bias is
derived from the control voltage using the self-biasing circuitry (also shown).
Delay control is obtained by variation of the load impedance. Simultaneous
variation of the tail current ensures that the symmetrical load remains linear throughout
the range of operation.
Page 87
+
Vctrl
-
outb
out
in
inb
+
N cells
Figure 8: The self-biased differential delay cell (from [18]).
Single-ended architectures traditionally rely on the current starvation of two series
CMOS inverters (Figure 9). Current starvation is usually performed on both branches
(NMOS and PMOS) of the inverters in order to guarantee a perfect symmetry of
operation. However, this is not a limiting requirement, since the two inverters in series
already guarantee the correct operation of the delay cell. The cell delay is defined by the
amount of current available to charge the load at the output of each inverter. The matching
characteristics of the current-starving transistors are, therefore, critical to ensure the
matching of the cell delay. These transistors must have large gate areas.
The matching characteristics of the switching transistors are not critical, since they
are sized in such a way that they don’t limit the current available to charge the output
load.
in
Vctrl
out
+
N cells
Figure 9: The current-starved inverter delay cell (simplified version).
The delay cells are isolated from the hit registers by a tap buffer. In the case of
current-starved inverter based cells, it is recommended to implement also a dummy buffer
in the output of the first inverter, in order to guarantee symmetry of the propagation delay
of the rising and the falling edge.
Page 88
Chapter 7: Detailed Implementation.
These two delay cell structures where analysed in detail to verify their power
dissipation and noise immunity. Simulations where used extensively, in order to
accurately capture the delay variations due to noise. Only power supply noise was
considered in this study, since it was found to be the dominant effect. Other noise sources,
such as thermal noise, are completely hidden by supply noise.
The procedure that was followed in this study was to simulate the two VCDL’s (one
for each of the structures) with a square signal of a given amplitude modulated into the
power supply voltage. The phase of the square noise signal was made to vary in relation
with the phase of the signal propagating within the delay line. In this way it is possible to
identify a time window where the delay cell is sensitive to supply noise and also the
maximum delay shift.
The same procedure was also used to analyse the delay sensitivity to noise in the
control node. Noise can couple into this node via two different paths, the substrate and
capacitive coupling with the switching nodes. In a locked DLL, there are always two
opposite edges of the signal propagating inside the delay line, therefore their opposite
effects should keep the control node balanced. However, since the sensitivity of this node
is high, it is important to minimise any coupling into it.
The resulting supply noise delay sensitivity graphics are shown in Figure 10, where
all delay cells are tuned for a 390ps delay. A window of increased sensitivity,
corresponding to the cell switching moment (time=0ns), can be identified. A summary of
the sensitivity of each structure, within the sensitive window, is tabled in Table 1. The
average power dissipation obtained when the cells are biased to operate with the required
delay is also shown. The single-ended structure also shows (time<0ns) a noticeable delay
variation due to slow (or DC) changes in supply. However, it should be noticed that the
amplitude of these slow variations depends on the control voltage applied and they are
effectively countered by the closed control loop.
10
10
8
8
Differential delay cell
6
6
4
4
2
2
0
0
-2
-2
-4
-4
-6
-1.5
Current-starved delay cell
-6
-1
-0.5
0
0.5
-1.5
time (ns)
-1
-0.5
0
0.5
time (ns)
Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the differential and the
current-starved inverter structure.
The differential structure needs 5.6 times more current that the single-ended CMOS
inverter structure, for the same propagation delay.
Page 89
Step noise sensitivity
amplitude
Symmetric load differential
Current-starved inverter
Supply
100mV
3ps
5ps
Control
20mV
11ps
15ps
Power dissip.
(average/cell)
4.2mW
0.74mW
Table 1: Summary of noise sensitivity and power dissipation analysis.
Offset and gain selection.
Apart from gate area, the matching characteristics of a device also depend on its
operating point. As the gate voltage approaches the threshold voltage (Vth) and the
operation of the device moves closer to weak inversion, its matching characteristics are
severely degraded [6],[19]. Therefore, the operating point of the devices that make up the
delay cell should be reasonably away from Vth, in any conditions.
However, depending on the process parameters and on the specific conditions under
which the cell is being used, the closed control loop may force the current-starving
devices to operate in disadvantageous matching conditions. The current-starved inverter
structure was changed in order to force the cell to operate in optimal matching conditions
under any circumstances.
delay
partitioned
original
Vo
Vp
Vctrl
Figure 11: Simplified representation of the delay range partition.
The principle of operation of this cell is to divide the delay range into small and
partially overlapping ranges, as shown in Figure 11. These delay ranges are wide enough
to enable the DLL to track delay variations due to changes in the environment conditions
that may occur during operation. The selection of the operating range is performed at
start-up. It is a function of the device matching, the delay tracking coverage and the
particular operating conditions found. To enable the automation of the range selection
algorithm, the range partition is made such that in any conditions lock can be achieved in
at least three ranges. By selecting the appropriate delay range, the cell can be made to
Page 90
Chapter 7: Detailed Implementation.
operate at a point Vp further away from the threshold voltage of the current-starving
transistors than would be the case in the original cell (point Vo).
Another advantage gained from partitioning the operation range is the reduced cell
gain (the slope of the cell transfer curve in s/V). Therefore the forward gain of the control
loop is smaller and a finer adjustment of the delay is possible. In the “bang-bang”
configuration used, it translates into smaller amplitude of the periodic delay oscillation.
Alternatively, the filter capacitor can be made smaller without degrading the closed loop
performance. The sensitivity to noise in the control node is also reduced.
The proposed cell topology is shown in Figure 12. The selection of the operating
range is done using the offset signal. The offset control is implemented in the NMOS and
PMOS branches of the inverter. It generates a fixed delay offset in the transfer curve.
To improve the cell flexibility, the gain of the current-starving transistor connected
to the loop control node can be changed, using the slope signal. It is, therefore, possible to
increase the tracking coverage (range length) of each range, if it is necessary for a specific
application. The slope control is only implemented in the NMOS branch of the inverter.
The offset selection signal is obtained from the digital-to-analogue conversion of a
digital control signal and the slope selection is performed digitally, therefore they
correspond to discrete settings. Figure 13 shows the actual delay ranges. Depending on the
offset and slope selection, the cell gain will be different. A method for automatic selection
of the range will be described in Section 7.1.6.
offsetP
2 inverters
delay
in
out
offset
slope
N cells
Vctrl
offsetN
Vctrl
+
-
slope<0:1>
Figure 12: The selectable-range current-starved inverter cell.
The simulation results exposed in Figure 13 show that the maximum delay cell gain
for a given range varies from 50ps/V to 713ps/V depending on the selection of offset and
slope.
Page 91
900
cell delay (ps)
800
700
600
500
400
300
200
0
1
2
3
4
5
control voltage (V)
Figure 13: The selectable delay ranges (simulation).
The same noise sensitivity analysis was also performed for this cell. The results, for
the sensitivity window (time=0ns), are given in Table 2. When compared to the
differential cell structure, substantial power savings (3.3 times) can be obtained using this
cell, if a similar increase in supply noise sensitivity is accepted. In relation to the simple
current-starved inverter, better matching and closed loop characteristics can be obtained at
the expense of increased power dissipation.
Step noise sensitivity
amplitude
Range partition
Supply
100mV
8ps
Control
20mV
3ps
Power dissip.
(average/cell)
1.29mW
Table 2: Summary of noise sensitivity and power dissipation analysis for the proposed cell.
In summary, the advantages of such a delay cell are:
• Lower power dissipation.
• Smaller device matching sensitivity.
• Variable cell gain.
• Increased immunity to noise in the control node.
7.1.4.
Delay chain.
The delay cell is a part of a chain of cells whose overall delay is the clock period. To
achieve maximum delay matching between cells, all cells should have the same physical
and electrical environment. This consideration is especially true for the cells in the
extremities of the delay chain. Physically they have no cell in one of their sides, therefore
their matching is worse [6],[7]. Electrically the last cell does not have to drive the load
due to the input of the next cell and the first cell is driven with a signal that doesn’t have
the same timing characteristics (namely slew rate) as the other cells.
Page 92
Chapter 7: Detailed Implementation.
In order to equalise the environment of all the cells, additional dummy delay cells
are implemented in both extremes of the delay line. The purpose of these cells is to force
the environment of all cells to be the same, and therefore improve their delay matching.
They have no other timing functionality.
7.1.5.
Closed control loop.
The implementation of the delay chain, the phase detector, the charge-pump and the
filter capacitor has been discussed. Together they make up the closed control loop. The
layout of the complete DLL should follow conservative layout rules, with especial care
being given to the power supply distribution network and to the transport of the signals
carrying phase information to the phase detector.
If the propagation delay of the two feedback signals going to the phase detector is
not the same, then the delay difference ∆tpd translates into closed loop static phase error,
resulting in similar consequences as a phase error generated in the phase detector. The
origin of this delay error is depicted in Figure 14.
Since the delay chain of the Timing DLL’s is physically long (~2mm, in this
prototype), the propagation delay of the feedback signals is considerable, and a large ∆tpd
may arise. This situation was analysed in detail [20] to derive the topology that minimises
the delay difference while not imposing a heavy area penalty on the design. The
propagation delay of the two transmission lines was made as small as possible by their
careful sizing. Also the load at the output of each of drivers was equalised to keep their
slew rate similar. In this way it was possible to keep the delay error under 20ps. The Phase
Shifting DLL has a shorter delay chain, therefore the delay difference is even smaller.
clkref
phase detector
& charge pump
C
dummy
dummy
∆tpd
Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch of the phase signals.
The variation of the delay of the chain within a clock period can be estimated from
the following equation, where Kcell is the gain of each delay cell:
∆t DLL (T ) = ∆Vctrl ⋅ K cell ⋅ N =
I cp ⋅ T
C filter
⋅ K cell ⋅ N .
Page 93
Assuming the minimum charge-pump current level is being used and the cell delay
range with minimum gain is selected, the delay variation of the Timing DLL is:
∆t DLL (12,500ps) =
10µA ⋅ 12,500ps
⋅ 50 ps V ⋅ 35 = 4.6ps
47.7pF
The “bang-bang” oscillation amplitude is half this variation (~2.5ps). If, on the other
hand, the delay range with the maximum gain is selected, ∆tDLL(T) may become as big as
65.4ps, resulting in an amplitude of oscillation of ~33ps.
7.1.6.
Initialisation procedure.
The loop initialisation is the procedure by which the loop acquires initial lock to the
reference period. If the loop natural delay is close enough to the reference period, lock is
acquired without any external help. Since this cannot be guaranteed in all circumstances,
ways to pull the loop to within its locking range must be implemented.
In the case of the loop architecture using the delay range partitioning, the best
operating range must also be selected.
Achieving lock.
The transfer function of the phase detector has a periodicity of T, which means that
it is unable to distinguish signals whose delay is multiple of a period T. Therefore, it may
try to lock the DLL into a state where the VCDL delay is a multiple of T. An initialisation
procedure must be used to force the closed loop to lock to the correct delay.
One way to resolve this ambiguity is to initialise the VCDL with a delay that is
known to be smaller than the reference period T. In this situation it is possible to qualify
(the correctness of) the error information generated in the phase detector. Starting from
this point, regardless of the phase information generated by the phase detector, the loop is
constrained to slowly increase the delay of the VCDL until the phase detector is within its
locking range (±T/2). This range can be identified by the generation of the correct error
information by the phase detector, when it recognises that the VCDL delay is too short. At
this point the loop is released to proceed with the locking acquisition.
Since the forward open loop gain is small, the lock acquisition is a slow procedure.
One way to improve the loop initialisation speed is to increase the charge-pump current
levels before lock is achieved. Therefore the lock acquisition time can be decreased
without compromising the dynamic behaviour of the loop.
Range selection.
The range selection is an iterative procedure. In a first step, the tracking range width
necessary for the application is selected using the slope signal. Typically the smaller
width is selected, because it results in the minimum forward open loop gain. However,
Page 94
Chapter 7: Detailed Implementation.
other range widths can be selected if a wider tracking range is desired. The second step
corresponds to the actual range selection. This step can be automated and included in the
loop locking procedure. It uses the offset signal.
The range selection is performed by sequentially scanning the ranges for lock,
starting with the fastest range (smallest offset). After having identified the ranges where
lock can be achieved, the middle range3 is selected, because it corresponds to an operating
point in the middle of the respective range, leaving a wide delay tracking margin. This
property is depicted in Figure 15, where the viable initialisation regions within each range
are identified with a heavier line. Note that these viable regions correspond to the initial
locking range. They are a small part of the full range (thinner line) that is available for
tracking of environment variations, after the initialisation has been completed.
delay
Vctrl
Figure 15: Schematic representation of the delay range partition illustrating the viable locking regions.
7.2.
The ADLL.
Fine time interpolation is obtained by accurately phase shifting each of the Timing
DLL’s by a fraction of their cell delay. The Phase Shifting DLL is used for this purpose.
The ADLL taps result from the distribution of the phase shifted Timing DLL taps in
accordance to the arrangement in Figure 16, where each rectangle represents the size of a
DLL bin. The shaded bins represent a copy of the actual bin introduced to make the time
interpolation on the extremes of the Timing DLL’s more clear. Due to the clock
periodicity, the copy and the original bin occupy exactly the same time interval.
An ADLL bin is defined from the difference between two taps in consecutive
Timing DLL’s. This distribution of bins highlights some of potential sources of nonlinearity inherent to the architecture:
3
In extreme conditions, corresponding to the extreme ranges, lock may only be obtained for one or two
ranges (see Figure 15). If only one locking range is identified, range selection is evident. If two locking
ranges are identified, the extreme range should be chosen, because it results in the widest delay tracking
margin.
Page 95
• Some bins are defined by taps in opposite extremes of consecutive Timing DLL’s
(see, for example, bin 5 in Figure 16). Potential phase errors in any DLL will
accumulate in these bins, resulting in large non-linearity.
• There is a potential F (=4) bin periodicity in the linearity error due to the folding
of the tap distribution. Non-linearity of any DLL will increase this error (see, for
example, bin 23 in Figure 16).
• There is another potential F+1 (=5) periodicity in linearity error which
corresponds to the spacing between two taps driven directly by the Phase
Shifting DLL. Non-linearity of this DLL determines this error.
The non-linearity generated by these errors is limited by reduction of any source of
phase error and cell mismatch caused by the DLL building blocks. Coupling between
DLL’s can also be a source of conversion errors. It can be reduced by proper electrical
isolation of individual DLL’s, using careful supply distribution and providing guard-rings
to isolate them from capacitive and substrate noise coupling.
T/28 =5· ∆T
PS-DLL ps0
T-DLL 0
T-DLL 1
T-DLL 2
T-DLL 3
T-DLL 0
ps2
ps1
136 0
4
137 1
8
5
138 2
9
12
20
18
15
16
bin 5
24
132 136
21
17
14
11
8
16
13
10
7
4
ps4
12
6
139 3
136 0
ps3
133 137 1
22
19
20
134 138 2
23
135 139 3
24
6
7
11
132 136
bin 23
T/140=∆T
T/35=4· ∆T
T
Figure 16: The ADLL tap distribution arrangement.
7.3.
Channel memory.
The channel memory is made of a two-word deep pipeline. In order to reduce the hit
rejection rate to acceptable levels, an asynchronous state machine controls the pipeline.
This state machine generates the latching signals (store) for the two pipeline levels and
controls the interface with the subsequent logic blocks [3]. The functional diagram is
shown in Figure 17.
Page 96
Chapter 7: Detailed Implementation.
write
enable
write
store
reg. level #1
D
rst
∆t
store
reg. level #2
read
∆t
clear
data
available
Figure 17: Functional diagram of the channel memory controller [3].
When the store signal is asserted, the data is stored in the level #1 register. If the
level #2 register is free, the data is moved to this register, where it becomes available to be
passed on to the digital processing unit. A data available flag is asserted to signal the
existence of data in the channel memory. If the two register levels are full, further hits will
be lost, until memory space becomes available again. The channel memory was designed
to store data corresponding to two consecutive hits separated by at least 6ns.
The hit register itself is required to capture the data present at the DLL taps in the
instant that the store signal is asserted. Mismatching of the hit registers can generate a
spread in the register acquisition time, which translates into an increased differential nonlinearity of the converter. The effects of tap register mismatch are not distinguishable
from the effects of delay cell mismatch.
The reduction of acquisition time mismatch can be done in two different ways:
• Increase of device matching by increasing the gate area of critical devices.
• Increase of acquisition speed by increasing the transconductance of critical
devices so that delay variations from register to register are smaller.
Since the time critical data sampling is performed only on the level #1 register, only
the performance of this register is critical. The gate level diagram of a single bit of the hit
register is shown in Figure 18.
output
(inverted)
tap
data
enable
store
reg. level #1
store
reg. level #2
Figure 18: The two-level hit register (1 bit).
Page 97
The load on the ADLL tap output node must be kept low, in order to reduce the
power necessary to drive it. It is therefore important to make the register’s input inverter
smaller. The adverse effects of the increased device mismatch are limited by keeping the
propagation delay of this gate low. On the other hand, the back-to-back inverters that
make up the memory can be made bigger, so that their matching properties and their
driving characteristics are good.
In order to achieve a good accuracy of the acquisition time, the level #1 register is
transparent until the acquisition of a hit. Since the tap outputs are switching at the
reference clock frequency, it is necessary to limit the activity of the register by blocking
the level #2 register until data has been acquired in the previous level. For the same
reasons tri-statable gates are used, instead of pass-gates. This approach leads to slower
signal propagation, but it reduces to half the number of switching devices when the circuit
is idle. Corresponding supply noise reduction and power savings are obtained.
In this application, the data signal is changing asynchronously to the store signal,
therefore there is a finite probability that metastability conditions will occur. However, it
should be noticed that this condition only affects one register, where the transition on the
data and store signals occur “simultaneously”. Whichever logic level the register ends up
resolving leads only to a measurement error that is at maximum the same as the
metastability window width. Since this window is very small, the measurement error is
also small.
In order to synchronise the clkro synchronous read-out and processing control logic
and the asynchronous tap register control state machine and avoid metastability to disturb
the correct circuit functionality, two-stage synchronisers [21] were implemented in the
signal paths interfacing the two domains (see Figure 19). Using two-stage synchronisers
greatly reduces the probability of triggering the output of the synchroniser to its
metastable condition. In addition, the latency that it introduces between the moment data
is available in the tap register and the moment that these can be passed on to the
processing logic is sufficient to resolve any metastability that may have occurred in the
tap registers.
signal
D
D
signal_sync
clkro
Figure 19: Two-stage synchroniser using D flip-flops.
When a measurement is performed, the status of the 140 taps that make up the
ADLL must be accurately captured. The effect of this activity in the accuracy of the
measurement is limited by the fact that it affects the same way all measurements
performed in a given channel, it only contributes to generate an offset in the measurement.
Page 98
Chapter 7: Detailed Implementation.
Noise generated from activity in a neighbouring channel, due to its random nature,
may disturb the other channels, generating crosstalk. To limit channel to channel crosstalk
and obtain an acceptable performance out of these registers, the supply and control
distribution must be carefully designed.
7.3.1.
The store sampling signal distribution.
The organisation of the individual tap registers follows naturally the organisation of
the ADLL. Therefore four rows of 35 two-bit deep tap registers make up the channel
memory. Four similar registers are appended to each of these lines, to store the coarse
counter results (half of each counter word width per row).
These register rows are quite long (>2mm), therefore the store signals arrive to the
individual registers with a time difference proportional to the propagation delay of the line
that distributes them. Two distribution configurations are shown in Figure 20. The linear
distribution configuration corresponds to the vernier time interpolation scheme described
in Chapter 4. The resulting bin size is the difference between the delay of the bin defined
by two consecutive taps and the difference between the arrival time of the store signal to
the corresponding tap registers. This error accumulates in the bin that is defined by taps in
both extremes of the row (which correspond to registers in the opposite extremities of the
register row). This error is equivalent to a static phase error in the Timing DLL’s.
Alternatively, the T shaped distribution configuration can be used. In this case the
error distribution is somewhat more complex. Depending on the branch of the distribution
T network the bins become larger or smaller than the corresponding delay cell (see
Chapter 6 for detailed analysis).
(linear distribution)
store
control
st. machine
0
1
2
N/2
N/2
-1
N-2 N-1
N-bit register row
(T distribution)
store
control
st. machine
0
1
2
N/2
N/2
-1
N-2 N-1
N-bit register row
Figure 20: Alternative control signal distribution configurations within a channel memory row.
Page 99
However, two advantages are obtained from this configuration. The first is that
since each branch of the T is half as long as the complete row (and is loaded by half the
number of cells), the propagation delay along the branch is smaller, resulting in a smaller
difference between the store signal arrival time to each register.
The second advantage is that the accumulation of the error is only relative to one
branch of the T, corresponding to half the number of registers in the row, therefore the
accumulated error is smaller than on the linear configuration.
In Figure 21, a comparison between the integrated error obtained when using the
two configurations is shown. The actual register row configurations are simulated,
including the lumped loads connected to the lines due to the registers. They also include
the registers needed to store the coarse time measure, which explains the imbalance of the
two branches of the T configuration.
10
integrated error (ps)
5
0
-5
-10
-15
-20
-25
Linear
-30
T-shape
-35
0
5
10
15
20
25
30
35
regis ter
Figure 21: Integrated error for the two proposed distribution configurations (simulation).
Using the T shaped configuration, it is possible to obtain a 6-times reduction of the
integrated error, as shown in Figure 21. The non-linearity of the ADLL due to the
propagation delay of the store signal is improved correspondingly.
Page 100
Chapter 8.
Experimental Results.
The performance of the demonstrator of the ADLL architecture described in this
part of the dissertation is resumed in this Chapter. Only the relevant timing characteristics
will be discussed here, a detailed test report is included in the HRTDC users manual [4].
The test bench used to characterise the converter is explained in Appendix A.
8.1.
Delay cell range selection and charge-pump current level.
Selection of the delay cell working range is an important feature of the architecture,
because it allows adapting the cells to the specific operating environment. The
initialisation procedure was tried for every working range, using an 80MHz reference
clock. The ranges for which lock was obtained are shown in Table 11.
working
range
DLL
offset
0
1
2
3
4
slope
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
Phase Shifting ok OK ok ok ok ok
Timing
ok ok ok ok ok ok ok OK ok ok ok ok
Table 1: Locking status for each working range, after the initialisation procedure.
Following the range selection algorithm explained on Chapter 7, the ranges
highlighted in Table 1 are chosen. This selection was used throughout the tests performed.
The smallest possible current level was selected for the charge-pump, since it results
in the smallest closed loop jitter. The cycle to cycle jitter measured at the output of the last
delay cell in the fourth Timing DLL is σjitter=15.6ps, for the selected range. It does not
vary substantially with the current level of the charge-pump (σjitter=19.4ps at maximum
settings), confirming that the charge-pump operation does not adversely affect the
performance of the converter.
1
Each offset and slope selection pair corresponds to a working range. Offset selection is divided into five
options, ranging from 0 (maximum range offset) to 4 (minimum range offset). Slope selection in divided
into 3 options, from 0 (minimum range slope) to 2 (maximum range slope).
Page 101
8.2.
Converter linearity.
The measurement of the converter’s linearity required the collection of 840,000
random hits generated from an external pulse generator. The results obtained with this
Code Density Test (CDT) test are, with a 98% confidence level (1-α=0.98, therefore
α=0.02), comprised within a tolerance of 3% (DNL) and 17.7% (INL) of the actual values
(respectively β=0.03 and β=0.17). If individual DLL’s are evaluated using the same data,
a tolerance of 1.5% and 4.4% are obtained, respectively for the DNL and INL, with the
same confidence level (see the Appendix D for details on how to measure the tolerance
and confidence level of the test results).
In an architecture such as the one used in this converter, the conversion transfer
function is made of successive replications of the fine time interpolation transfer curve
along the dynamic range. The coarse time counter is responsible for the correct fine
interpolation repetition. Therefore, the linearity of fine time interpolator, made by the
array of DLL’s (ADLL), has the largest contribution to the overall linearity. The ADLL
will be characterised in great detail, whereas a simpler verification will be performed for
the extended dynamic range mechanism.
1
1
0.75
0.75
0.5
INL (LSB)
DNL (LSB)
0.5
0.25
0
-0.25
0.25
0
-0.25
-0.5
-0.5
-0.75
-0.75
-1
-1
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
bin
bin
Figure 1: DNL and INL graphs for the ADLL.
The graphs in Figure 1 show the differential and integral non-linearity of the ADLL.
A DNLmax of 0.71LSB (σDNL=0.17LSB) and an INLmax of 0.67LSB (σINL=0.19LSB) is
obtained. The main feature of these graphs is the significant non-linearity found in the
first few bins in the array. These errors occur in the bins whose limits are defined by taps
in opposite extremes of consecutive Timing DLL’s. They are the result of the presence of
phase errors and of the delay cell mismatch on the Timing and Phase Sifting DLL’s.
These phase errors2 are originated by any of the mechanisms previously exposed in
Chapter 6.
2
The phase error must be understood in its wider sense. It may be caused by an actual phase error in the
phase detector, to different propagation delay of the phase detector’s input signals or to a significant
propagation delay in the distribution of the sampling signal to the hit registers.
Page 102
Chapter 8: Experimental Results.
The DNL and INL graphs can be compared to the curves in Figure 2. These curves
were obtained from the analytical studies that were carried out3 in Appendix F. It can be
seen that, using the analytical model and reasonable assumptions of the direction of the
static errors that affect the converter, it is possible to estimate the main characteristics of
the actual non-linearity graphs (the amplitude of each error is normalised). Note that delay
cell mismatch was not taken into account on the analytical results shown here.
0.25
0.2
0.15
0.1
0.15
INL (LSB)
DNL (LSB)
0.25
0.2
0.05
0
-0.05
-0.1
-0.15
0.1
0.05
0
-0.05
-0.2
-0.25
-0.1
0
20
40
60
80
100
120
140
0
20
40
60
80
bin
100
120
140
bin
Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the cell delay, DPD=-0.1% and
Dhit=0.1% of the reference period).
From the same set of data, the characteristics of the four Timing DLL’s can be
extracted. These graphs are shown in Figure 3. The relevant feature is the presence of a
phase error4 apparent in the large first bin of each DLL. It turns out to be significant in
one of the Timing DLL’s (DLL0). A summary of the characteristics of the individual
Timing DLL’s is presented in Table 2. From the data presented in the table, the delay cell
mismatch obtained for these DLL’s is estimated to be ~4%, a bigger value than what was
expected.
0.25
0.2
0.1
0.05
0
-0.05
-0.1
INL (LSBDLL )
DNL (LSBDLL )
0.25
0.2
0.15
-0.15
-0.2
-0.25
DLL0
0
5
10
DLL1
15
DLL2
20
bin DLL
25
0
-0.05
-0.1
-0.15
-0.2
-0.25
DLL3
30
0.15
0.1
0.05
35
0
5
10
15
20
25
30
35
bin DLL
Figure 3: DNL and INL graphs for the different Timing DLL’s (LSBDLL=4.LSB).
3
Due to implementation details, tap 0 of each of the Timing DLL’s was placed in the end of the respective
delay chain. This position is delayed by one reference clock cycle from the original position, therefore their
timing is the same. However, in a non-ideal converter the non-linearity graphs corresponding to the two
cases are different. The analytical results shown here are obtained taking this into account.
4
See footnote2 in page 102.
Page 103
There may be two origins for this larger value. It may be an effect of the actual
device mismatch, a technological property seldom disclosed with adequate accuracy by
the vendors, or due to electrical noise coupling into the channel buffers or the delay cells.
Note that device mismatch may also affect channel registers and that this effect is not
distinguishable from the delay cell mismatch.
Timing DLL
0
σDNL
DNL
σINL
INL
unit
LSBDLL (LSB)
LSBDLL (LSB)
LSBDLL (LSB)
LSBDLL (LSB)
0.21 (0.84) 0.06 (0.23) 0.18 (0.71) 0.06 (0.24)
1
2
3
PS scheme
0.13 (0.52)
0.12 (0.49)
0.11 (0.46)
0.06 (0.28)
0.05 (0.18)
0.04 (0.17)
0.04 (0.18)
0.04 (0.21)
0.10 (0.41)
0.11 (0.44)
0.11 (0.43)
0.04 (0.22)
0.05 (0.19)
0.04 (0.16)
0.04 (0.15)
0.03 (0.15) LSBDLL-PS (LSB)
Table 2: Summary of linearity obtained for each DLL in the array (LSBDLL=4·LSB and LSBDLL-PS=5·LSB).
The phase shifting DLL can also be characterised using the same data set. The
graphs in Figure 4 show the non-linearity of the first few cells of the Phase Shifting DLL.
0.25
0.2
INL (LSBDLL-PS )
DNL (LSBDLL-PS )
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
1
2
3
bin DLL-P S
4
1
2
3
4
bin DLL-PS
Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL-PS=5·LSB).
The non-linearity of the Phase Shifting DLL and the phase error accumulated in the
first bin of each Timing DLL add up to a large ADLL non-linearity particularly in the
ADLL bins number 4, 9, and 14, 139 and their neighbours, as would be expected.
The auto-correlation function was applied to the DNL graph of the ADLL (Figure
5). It reveals peaks in the auto-correlation factor with a periodicity of 4·λ, which
corresponds to the interpolation factor F used. Secondary peaks at λ=5 and 10 can also be
identified, corresponding to the phase shifting performed by the Phase Shifting DLL,
which introduces a delay of F+1=5 (LSB) between consecutive Timing DLL’s.
Page 104
Chapter 8: Experimental Results.
1
auto-correlation
0.8
0.6
0.4
0.2
0
-0.2
-0.4
0
4
8
12
16
20
24
28
32
36
λ coefficient
Figure 5: The ADLL auto-correlation graph.
Although the extension of the dynamic range beyond the reference clock period is
achieved by successive translations of the ADLL transfer curve, it is important to verify
the correct behaviour of this operation. In the graphs of Figure 6 only four reference clock
periods are analysed. This dynamic range is judged sufficient for the test being carried
out. A detailed characterisation of the full dynamic range would be unpractical because of
the large number of hits that would have to be collected. The result of a specific test
enabling the verification the correctness of the dynamic range extension across the full
dynamic range, is described later in this chapter. The periodicity of the non-linearity
graphs is evident over the extended dynamic range.
1
1
0.75
0.75
0.5
INL (LSB)
DNL (LSB)
0.5
0.25
0
-0.25
0.25
0
-0.25
-0.5
-0.5
-0.75
-0.75
-1
-1
0
40 80 120 160 200 240 280 320 360 400 440 480 520 560
bin
0
40 80 120 160 200 240 280 320 360 400 440 480 520 560
bin
Figure 6: DNL and INL graphs for the converter along four reference clock periods.
For this test 1,680,000 hits where collected, therefore its results have a tolerance of
4.2% and 50% respectively for the DNL and INL curves (β=0.04 and β=0.5) with a
confidence level of 98% (α=0.02). It is impractical to collect more hits, due to the long
time it would require, therefore the tolerance in the INL measurements is wide. However,
the values obtained for DNLmax and INLmax, respectively 0.73LSB (σDNL=0.18LSB) and
0.78LSB (σINL=0.21LSB) are similar to those obtained for the array itself, the differences
being well within the tolerances accepted for such tests.
Page 105
8.3.
Linear time sweeps.
The nature of statistical tests such as the CDT results in the averaging of random
effects like phase noise and electrical noise. Phase noise (or jitter) can be present in the
reference clock received, in the hit signal path, or may be due to the closed loop behaviour
of the DLL’s. Electrical noise may couple into the DLL’s or into the hit sampling registers
through the power supply or the substrate. To evaluate the effect of such random noise in
the conversion error, a linear delay sweep is performed, using the test bench described in
Appendix A. The following graphs result from a linear delay sweep where 42,000 samples
where collected, corresponding to the accumulated effect of 5 samples collected for each
delay interval of 3ps (5 ‘trombone’ delay steps of ~0.6ps).
In Figure 7 the error graph resulting from a linear delay sweep spanning two
reference clock cycles is shown. The RMS resolution of the converter is determined from
the standard deviation of the error histogram. Its value is σ=0.39LSB (34.5ps). The
maximum observed error is 1.62LSB (144.9ps).
2
2400
1.5
2000
1600
0.5
counts
error (LSB)
1
0
-0.5
1200
800
-1
400
-1.5
0
-2
0
6000
12000
18000
24000
30000
36000
-2
42000
-1.5
-1
-0.5
0
0.5
1
1.5
2
error (LSB)
delay step
Figure 7: Error graph and histogram resulting from a delay sweep of two reference periods (σ=0.39LSB).
From the linear delay sweep results, the linearity of the conversion can also be
characterised. As shown in Figure 8, the DNLmax and INLmax measured using this method
are, respectively, 0.73LSB (σ=0.18LSB) and 0.61LSB (σ=0.22LSB).
1
1
0.75
0.75
0.5
INL (LSB)
DNL (LSB)
0.5
0.25
0
-0.25
0.25
0
-0.25
-0.5
-0.5
-0.75
-0.75
-1
-1
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
bin
bin
Figure 8: DNL and INL graphs obtained from the linear delay sweep results.
Page 106
Chapter 8: Experimental Results.
This alternative method enables the confirmation of the results obtained using the
statistical CDT test. The difference found is within the expected tolerance limits for this
test. They can be justified by the sensitivity of the linear time sweep to the accumulation
of errors generated during the delay generator alignment step (see Appendix A).
The conversion error of a single Timing DLL (the first one) was also evaluated
using the same set of data. The error histogram is shown in Figure 9. The measured RMS
resolution of this DLL is σ=0.30LSBDLL (105.5ps), very close to the quantising limit
(0.29LSBDLL). The maximum error observed was of 0.67LSBDLL (239.3ps).
2400
2000
counts
1600
1200
800
400
0
-1
-0.5
0
0.5
1
error (LSBDLL )
Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL).
40000
35000
30000
bin
25000
20000
15000
10000
5000
0
0
500
1000
1500
2000
2500
3000
3500
delay (ns)
Figure 10: Delay sweep over the full dynamic range.
The correctness of the dynamic range extension up to 3.2µs is confirmed in the
graph of Figure 10. It shows a coarse delay sweep over the conversion dynamic range.
The delay step is, in this case, only 1ns. The limit of the dynamic range is clearly
identified by the step visible in the transfer function after bin number 35,839.
8.4.
Inter-channel crosstalk.
Crosstalk between channels is an important characteristic of a multi-channel
converter. The (almost) simultaneous acquisition of hits in several channels should not
affect the individual channel’s performance. Evaluation of the channel performance in the
Page 107
presence of activity in other channels was done following the procedure exposed in
Appendix A. All channels in the IC (except one) were excited simultaneously, and the
time difference between the hit arrival into the channel being evaluated and these channels
is varied so that it covers all the reference clock cycle, a pessimistic, worst case, crosstalk
sensitivity value is obtained. The measurements performed showed that, even in the
presence of the most unfavourable conditions, the crosstalk is smaller than ±2LSB. This
situation is shown in Figure 11. Notice that the measurement error is larger than ±1LSB
only when the skew between the reference channel and the three crosstalk channels is
within a time window of ~0.5·T (6.25ns).
5
4
3
2
1
0
-1
-2
-3
-4
-5
0
1
2
3
time skew (T)
Figure 11: Measurement error due to crosstalk in the worst configuration.
8.5.
Double hit resolution.
To verify the correct functionality of the asynchronous channel buffers, and their
ability to capture hits arriving in quick succession, a double hit resolution test was
performed in accordance to the procedure in Appendix A. Burst of two pulses (the same
as the depth of the channel buffer) with a separation down to 8.5ns (limited by the
instrument used) were correctly acquired, as intended.
8.6.
Power dissipation.
The power dissipation of the fully operational circuit was measured to be 800mW. It
includes the activity of the encoding, buffering and read-out logic integrated in the same
IC. The demonstrator was built using a technology that requires a 5V supply voltage.
8.7.
Summary of results.
A summary of the relevant timing features observed in the prototype’s test is shown
in Table 3. A full description of the converter characteristics may be found in [4].
Page 108
Chapter 8: Experimental Results.
LSB
89.3 ps
max 0.71 LSB / 63.4 ps
DNL
σ 0.17 LSB / 15.2 ps
max 0.67 LSB / 59.8 ps
INL
σ 0.19 LSB / 17.0 ps
RMS resolution (σ)
0.38 LSB / 34.5 ps
dynamic range
3.2 µs
crosstalk
< 2 LSB
double hit resolution
< 8.5 ns
reference clock
80 MHz
number of channels
4
power dissipation
0.8 W
technology
0.7µm CMOS
2
timing circuitry
6.1 mm
area
2
IC
23 mm
package
68 pin PLCC
Table 3: Characteristics of the TDC prototype.
8.8.
Conclusion.
This implementation of the ADLL scheme demonstrates that it is possible to obtain
a high-resolution time measurement system using cheap commercial CMOS technologies.
The timing characteristics measured on the Time-to-Digital Converter match well with
what had been predicted during the analysis and development of the circuit.
Four TDC channels were integrated in the IC, together with the necessary encoding
and buffering logic. Therefore, sufficient functionality is included to allow it to be used in
real high-resolution time measurement systems. A batch of 1,000 TDC circuits was
produced in order to be used in the preliminary system tests necessary for the
development of the ALICE TOF detector [1] and also in the front-end of the PesTOF
detector [22][23] used in the NA49 experiment running at CERN.
The drawback of timing interpolator architectures based on the ADLL principle is
the large power necessary to drive a significant number of active DLL delay elements.
Since the time interpolator is shared between all the channels in the IC, the power
dissipation per channel would be reduced if more channels are integrated in the same
circuit. However, the overall IC power dissipation would increase, which could render
impossible the utilisation of standard plastic packages.
Page 109
Page 110
References for Part II.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
ALICE collaboration, A large ion collider experiment – technical proposal,
CERN/LHCC 95-71, Dec. 95.
Aray, Y. et al., A CMOS four-channel x 1K time memory LSI with 1ns/b resolution,
IEEE Journal of Solid-State Circuits, Vol. 27, No. 3, pp. 359-364, Mar. 92.
Christiansen, J., An integrated high-resolution CMOS timing generator based on an
Array of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol. 31, No. 7,
pp. 952-957, Jul. 96.
Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EPMIC.
Lahshmikumar, K. et al., Characterisation and modeling of mismatch in MOS
transistors for precision analogue design, IEEE Journal of Solid-State Circuits, Vol.
21, No. 6, pp. 1057-1066, Dec. 86.
Pelgrom, M. et al., Matching properties of MOS transistors, IEEE Journal of SolidState Circuits, Vol. 24, No. 5, pp. 1433-1440, Oct. 89.
Nekili, M. et al., Spatial characterisation of process variations via MOS transistor
time constants in VLSI and WSI, IEEE Journal of Solid-State Circuits, Vol. 34, No.
1, pp. 80-84, Jan. 99.
Kaenel, V. et al., A 320MHz, 1.5mW @ 1.35V CMOS PLL for microprocessor
clock generation, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 17151722, Nov. 96.
Maneatis, J., Low-jitter process-independent DLL and PLL based on self-biased
techniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732,
Nov. 96.
Johnson, M. et al., A variable delay line for CPU co-processor synchronisation,
IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 1218-1223, Oct. 88.
Kim, L. et al., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-State
Circuits, Vol. 25, No. 4, pp. 942-951, Aug. 90.
Vittoz, E., The design of high performance analogue circuits on digital CMOS
chips, IEEE Journal of Solid-State Circuits, Vol. 20, No. 3, pp. 657-665, Jun. 85.
Bastos, J. et al., Matching of MOS transistors with different layout stiles,
Proceedings of the IEEE International Conference on microelectronic test structures,
pp. 17-18, Mar. 96.
Gardner, F., Charge-pump Phase-Lock Loops, IEEE Transactions on
Communications, Vol. 28, No. 11, pp. 1846-1858, Nov. 80.
Gardner, F., Phase accuracy of charge-pump PLL’s, IEEE Transactions on
Communications, Vol. 30, No. 10, pp. 2362-2363, Oct. 82.
Page 111
[16] Paemel, M., Analysis of a charge-pump PLL: a new model, IEEE Transactions on
Communication, Vol. 42, No. 7, pp. 2490-2498, Jul. 94.
[17] Behr, A. T. et al., Harmonic distortion caused by capacitors implemented with
MOSFET gates, IEEE Journal of Solid-State Circuits, Vol. 27, No. 10, pp. 14701475, Oct. 92.
[18] Maneatis, J. et al., Precise delay generation using coupled oscillators, IEEE Journal
of Solid-State Circuits, Vol. 28, No. 12, pp. 1273-1282, Dec. 93.
[19] Forti, F. et al., Measurements of MOS current mismatch in the weak inversion
region, IEEE Journal of Solid-State Circuits, Vol. 29, No. 2, pp. 138-142, Feb. 94.
[20] Mota, M. et al., A high-resolution Time-to-Digital Converter based on an Array of
Delay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHC
Experiments, pp. 338-342, Oct. 97.
[21] Horstmann, J. U. et al., Metastability behaviour of CMOS ASIC flip-flops in theory
and test, IEEE Journal of Solid-State Circuits, Vol. 24, No. 1, pp. 146-157, Feb. 89.
[22] Pestov, Y., Timing below 100ps with spark counters: work principle and
applications, Invited talk at the 36.th International Winter Meeting on Nuclear
Physics, Bormino, 98.
[23] Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49,
ALICE/2000-02 internal note/TOF, Mar. 00.
Page 112
PART III.
A TDC ARCHITECTURE BASED
ON A DLL AND A PASSIVE RC
DELAY LINE.
Page 113
Page 114
Future High-Energy Physics experiments will require complex electronic systems in
order to handle the millions of data channels that constitute them. A significant part of
these systems will be housed within the respective detectors’ structure.
Given the large number of electronic circuits close to the detector, overall power
dissipation is an issue. Increased detector temperature due to power dissipation is usually
unacceptable and the weight and area that the power network occupies puts a heavy
burden in the detector infrastructure. It is therefore essential to reduce the power
dissipation of the individual circuits to minimal levels.
In the Array of Delay Locked Loops (ADLL) architecture previously discussed,
resolution improvements can be obtained if faster delay cells are used, or if the
interpolation factor is increased using extra timing DLLs. Both methods result in higher
power dissipation.
In this part of the dissertation, an alternative time interval measurement architecture
is introduced. This architecture uses a different time interpolation principle, which results
in higher time resolution and lower power dissipation. This architecture offers the same
potential of integration as the ADLL and has ability to perform automatic self-calibration,
thus addressing all the requirements set forward by the ALICE TOF collaboration.
In Chapter 9 the proposed architecture is introduced and the method used to obtain
increased time interpolation is explained. Two time interpolation schemes are presented,
together with the means necessary to achieve correct operation. Chapters 10 and 11,
respectively, include a detailed look into the performance of these two schemes and to
their calibration requirements. Finally, in Chapter 12 the results of the tests performed on a
prototype IC that implements these two schemes are reported.
Page 115
Page 116
Chapter 9.
Architecture Overview.
The advantageous characteristics of the DLL’s have already been described in this
dissertation and their use in the context of time interval measurements shown. An
alternative architecture, which takes advantage of these characteristics to build a highresolution time interpolator, is now introduced.
The basis of the time interpolator is a single DLL. Finer time interpolation can be
achieved either by further dividing the clock period, using extra phase-shifted timing
DLL’s as was done in the ADLL or, alternatively, by sampling the status of the DLL
several times with a small time interval between samples. In the later case, after
determining which sample of the DLL has the reference clock edge arriving to the output
of a given cell, it is possible to derive the hit arrival time with a resolution that is equal to
the sample interval. To get full time coverage over the clock period, the samples must be
obtained at uniform intervals over the full delay of a single DLL delay cell. This
interpolation method is clarified in Figure 1.
Vcontrol
tap n-1
tap n
tap n+1
tap n+2
Tcell/5
tap n
tap n+1
Tcell
tap n+2
s0
(= thit)
s1
s2
s3
s4
t
Figure 1: Detail of DLL signal propagation illustrating time interpolation through multiple delay line
samples (in this example the number of samples acquired is M= 5).
Page 117
If a single sample of the DLL status (cell delay is Tcell) is acquired at hit signal
arrival time (s0), a transition 1 to 0 is found between the data corresponding to tap(n) and
tap(n+1) of the status word. In this case, the hit time referenced to the clock is1:
t hit = Tcell ⋅ n .
Therefore, the resolution of the measurement is the intrinsic resolution of the DLL,
Tcell.
However, if several (M) uniformly spaced samples are acquired across the cell
delay, the number of samples (m) elapsed before the reference edge appears in the output
of the cell (no transition found) improves the time measurement accuracy:
M −m

t hit = Tcell ⋅  n +
, 1≤ m ≤ M .
M 

The resolution of this measurement is Tcell/M, where the interpolation factor M
corresponds to the number of cell delay sub-divisions created by multiple sampling.
Considering an N-tapped DLL, the overall resolution, related to the reference clock
period, Tclk, is:
Tbin =
Tclk
.
N ⋅M
Parameters N and M are, in this scheme, independent. This means that there is no
numerical limit to the ratio to which the reference clock can be divided. Chiefly, it is
possible to divide the reference period into a binary number of bins ( N ⋅ M = 2 n , with n
being an integer). This division was not possible in the ADLL scheme.
9.1.
Time interpolation circuit.
A time interpolation circuit based on this principle is shown in Figure 2. It includes
an N-tapped DLL and M rows of hit registers in order to store the M samples of the DLL
status that are acquired for each measurement. The multiple sampling signals are defined
at fixed time intervals from the moment the hit signal arrives. It is, therefore, natural to
generate these signals using taps of an open-ended delay line through which the hit signal
is propagated. However, guaranteeing short delays with high precision is not easily done.
Active devices (even if they were fast enough) have timing characteristics that vary
significantly with operating temperature, supply voltage and process parameters.
Continuous calibration schemes similar to the DLL are not applicable, since no reference
signal exists, therefore a different delay line should be used.
Passive RC delay lines have been used in the past for timing generation [1], because
of their low sensitivity to supply and temperature changes. Typically, a sensitivity of
around 500ppm per Volt or oC is usually found in standard technologies. On the other
1
By convention, the limits of bin n are tap n and tap n+1.
Page 118
Chapter 9: Architecture Overview.
hand, their delay is strongly dependent on the circuit processing, since the characteristics
of parasitic devices, such as resistivity, capacitance and even physical dimensions are only
weakly controlled in digital CMOS technologies. Large circuit to circuit delay variations
are thus expected, which makes start-up calibration of the lines essential to the
performance of the proposed architecture. However frequent calibration is not needed due
to the low supply and temperature dependencies.
N delay cells
Reference clock
PD
controllable delay line
hit registers ( M rows )
Hit signal
from
calibration
Figure 2: Time interpolation circuit.
9.2.
Adjustable RC delay line.
In order to be able to perform start-up calibration, the delay line should be made
adjustable. Continuous and discrete adjustment schemes are possible, the choice between
them must take into account the linearity requirements and the scheme’s complexity.
Continuous adjustment schemes can achieve maximal interpolation linearity, at the
expense of circuit complexity and higher noise sensitivity. For example, it is possible to
vary the depth of the depletion region along the length of a diffused resistor (see Figure 3)
by changing the voltage drop across the parasitic junction. This results in a change of the
cross-section of the resistor, and therefore of its distributed resistance. The depletion
region across the junction also acts as the dielectric of a distributed capacitor. Therefore a
change in its depth affects its capacitance. These resistance and capacitance variations
have opposite effects on the time constant of the delay line, but the overall result is a
continuous control of the line’s propagation delay.
Page 119
However, this method presents some drawbacks that render its implementation
impractical. The depletion region extends mostly into the less doped n-well, leaving little
control of the line resistivity. The control voltage range limits the amplitude of the
progressing signal. The signal, in fact, also influences the depletion region width, making
the time constant of the line a complex function of the signal itself.
in
n+
out
diffusion p+
depletion region
n-well n-
substrate p-
Figure 3: Continuous delay adjustment scheme based on control of the distributed parameters (simplified).
Discrete adjustment methods provide a better solution for our application. Their
implementation can be simple and the noise sensitivity of the adjustment scheme can be
quite low. A time interpolator that uses these methods has, by their discrete nature, lower
linearity. Fortunately their non-linearity can be limited to very good levels by a careful
choice of adjustment range. Of the several possible schemes for discrete adjustment, two
will be described shortly.
9.2.1.
Adjustable delay line by tap selection.
One implementation of the discrete adjustment scheme for an RC delay line is to
divide it into a large number of small segments. Their extremities are made accessible via
buffered outputs, as shown in Figure 4. Calibration of the line consists in selecting the
outputs that best approximate the interpolation linearity criteria. Since the delay line time
constant has a strong dependency on parasitic technological parameters, and these are
only weakly controlled during IC production, wide delay variations are expected from one
circuit to the other. This leads to some overlap between the adjustment range of
consecutive taps. Therefore, it must be possible to connect some of the segment outputs to
various taps.
Page 120
Chapter 9: Architecture Overview.
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
calibration
tap n-1
tap n
tap n+1
Figure 4: Adjustable delay line using a tap selection scheme.
All the output buffers are identical and due to the symmetry of their operation their
delays can, to a first approach, be subtracted and factored out. However device mismatch
and temperature gradients will affect them differently, contributing to the degradation of
the interpolation linearity. These effects can be minimised by careful buffer design and are
in fact taken into account when the line is calibrated.
9.2.2.
Adjustable delay line by lumped capacitor selection.
Another implementation of the discrete adjustment scheme is to insert a variable
lumped capacitor in selected positions along the delay line, as in Figure 5. These
capacitors participate in the definition of the line’s time constant, therefore changes in
their capacity affect the delay of the line.
The variable capacitors can be made of a bank of unit-sized capacitors that can be
selectively connected to an RC delay line node in order to obtain the best interpolation
linearity. As before, the effects of delay mismatch of the tap buffers are factored out
during calibration.
Contrary to the previous scheme, where the adjustment of the position of one tap
does not affect any other tap, in this scheme calibration is obtained by changing the delay
properties of the line. Therefore the adjustment of the delay of one tap affects the delay of
the whole delay line. An iterative adjustment procedure will adequately take into account
this effect.
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
∆R, ∆C
calibration
( independent
calibration per tap )
tap n-1
tap n
tap n+1
Figure 5: Adjustable delay line using a variable lumped capacitor scheme.
Page 121
9.3.
Auto calibration.
The automatic self-calibration of the time interpolator is a key part of the
architecture. The DLL closed control loop is able to perform constant self-calibration,
tracking temperature and supply variations. The passive RC time interpolator, on the other
hand, requires initial calibration so that its delay matches the delay of a DLL delay cell.
The calibration could be performed at production test time, either by laser trimming or by
pre-programming calibration parameters in a ROM-like structure. However this method
would be expensive and would limit the correct interpolator operation to a very specific
reference frequency, leaving the user no with flexibility to adapt the circuit to his
particular needs.
A more flexible calibration procedure can be obtained if internal means are provided
for in-situ start-up calibration. Collection of hits generated at random time intervals offers
an accurate method of characterising the interpolator [2] (see Appendix D). If the hits are
collected into time bins corresponding to the output codes and these are histogrammed,
the resulting count differences accurately represented the size of the bins. Using this
simple procedure, the whole interpolator can be characterised. The characterisation
obtained can be used to identify the calibration corrections necessary.
This procedure requires a random hit generator and a simple arithmetic unit. Hits
generated from a simple, slow, oscillator can be used for characterisation. The main
requirement is that the oscillation frequency is such that it doesn’t beat with the reference
clock. A sufficient condition to satisfy this requirement is that the ratio of its frequency
and the frequency of the reference clock is a rational number given by the ratio of two
prime numbers [3] (Appendix E). The arithmetic unit needs only a few accumulators and
comparators. The calibration can be performed in an iterative fashion, thereby improving
its accuracy.
9.4.
The prototype.
9.4.1.
Choice of technology.
A demonstrator circuit was implemented in order to explore the capabilities of the
proposed architecture. A major goal of this work is to define architectures that are well
suited for high-resolution time measurements, independently of the technology in which
they are produced. Therefore no special features should be required apart from the ones
available in standard CMOS technologies.
Since actual results are partially determined by technological properties such as
transistor transconductance, gate capacitance, parasitic resistance and capacitance, a fair
comparison of the capabilities of the architecture is best obtained if the technology used
for the ADLL demonstrator is also used to build the demonstrator of this architecture.
Page 122
Chapter 9: Architecture Overview.
Furthermore, to emphasise the suitability of the architecture to standard technologies, the
same 0.7µm CMOS technology was used for this prototype.
9.4.2.
Prototype characteristics.
The prototype includes all the blocks necessary to demonstrate the proposed
architecture: The complete time interpolator, together with the respective hit registers, a
simplified read-out control unit, a serial programming interface and wide bandwidth
differential receivers.
The key feature that should be verified with this demonstrator is the ability to
perform internal calibration. The adjustment algorithm is made only of registers and
combinatorial logic. These are easily implemented using standard cell libraries available
for most commercial technologies. It was, therefore, decided that it could be implemented
in software, allowing for a higher flexibility of the demonstrator. The calibration hit
generator, on the other hand, should be implemented in the circuit to evaluate the
correctness of the assumption that a hit frequency with the required characteristics can be
generated inside the circuit.
The prototype is schematically represented in the block diagram of Figure 6:
R-C delay line
tap selection adjustment scheme
channel 0
φ
ref. clock
RC delay line
channel 1
hit
generator
read-out controller
hit registers
hit registers
lumped capacitor adjustment scheme
calibration interface
Figure 6: Block diagram of the prototype.
In this prototype, the two interpolation schemes previously described where
implemented using a single shared DLL. Together they form a two-channel Time-toDigital Converter. Each interpolator channel is made of the differential receiver, the signal
Page 123
selection multiplexer, the adjustable RC delay line, the hit registers and the shared DLL. It
was shown in Chapter 6 that the integral error due to cell mismatch in a single DLL is a
function of the cell mismatch σcell and of the number of cells N that make up the delay
chain. The maximum standard deviation of this error has the following expression, at the
centre of the delay chain:
σ DLL =
σ cell
N
⋅
.
µ cell 2
It is therefore important to build the delay chain with a small number of delay cells.
In this prototype, we chose N=16 cells as the best compromise between reducing the
integral error due to mismatch and keeping the reference clock frequency within the limits
imposed by the technology.
The DLL delay cells to be used in this circuit have the same time characteristics of
the ones in the previous (ADLL) circuit. The same cells are therefore used, together with
the same control loop building blocks. The cell reutilization is advantageous since they
have already proved to have the necessary characteristics in terms of control range, of
matching and of noise sensitivity. Since some implementation details are common, the
comparison between architectures also becomes easier.
To obtain the intended ~390ps time interpolation in the outputs of the DLL, a
reference period of 6,250ns, corresponding to a frequency of 160MHz was used. As
shown in the block diagram, the reference clock is only used in the DLL, the read-out and
calibration interfaces are asynchronous to this clock and work at lower frequencies.
The high interpolation factor is obtained using either of the two adjustable RC delay
line schemes already described. In both schemes, the delay of the DLL delay cell is
divided into M=8 similar time intervals, resulting in a LSB of ~48.8ps. A total of
M·N=128 hit registers are required to achieve full reference period coverage for each
channel.
The hit signal integrity is of paramount importance, since the critical time
information is mostly contained in the high frequency components of the signal.
Differential receivers are used in all external time critical signal paths so as to avoid the
common noise coupled to these signals as they traverse the system outside the circuit.
A simple hit generator is also included. It is built as a slow, free running, fiveinverter ring oscillator whose output frequency is further divided by an 8-bit ripple
counter. The oscillation frequency is selectable via a program word. In a final circuit the
oscillator frequency must have a fixed relation to the reference clock frequency, defined
by the relations established in Appendix E. Since the clock frequency may change
depending on the application it is reasonable to generate the calibration hit signal based on
the reference clock (see also Appendix E).
Page 124
Chapter 9: Architecture Overview.
The externally generated calibration parameters are fed to the delay lines via a
calibration interface. Changes on these parameters are only performed at start-up time,
when calibration is being performed. Therefore, a slow, serial interface is used.
Tap Selection
In the photograph of Figure 7 the main functional blocks of the prototype are
highlighted. The circuit uses 10.7mm2 of silicon and was packaged in a 68 pin ceramic
JLCC package.
Hit Registers
DLL
Hit Registers
Oscillator
Lumped Capacitor
Figure 7: Prototype circuit showing main functional blocks.
9.4.3.
Performance analysis.
Timing characteristics.
The configuration proposed for this converter results in a LSB of Tm=48.8ps. The
theoretical RMS resolution σq is determined by the quantising performed during
conversion:
σq =
Tm
12
= 14.1ps .
Matching limitations of the DLL degrade the conversion resolution. The maximum
cumulative effect of cell mismatch is seen in the middle of the DLL delay chain (see
Chapter 6). Assuming a mismatching (σmatch) of 1%, the additional RMS error due to the
DLL is:
Page 125
σ DLL = σ match ⋅
N
⋅ Tm ⋅ M = 7.8ps .
2
The calibration of the RC delay lines acts on its integral non-linearity in such a way
as to limit it to acceptable values. A worst case ±0.5LSB delay line non-linearity results in
an additional RMS error of:
σ dl =
Tm ⋅ 0.5
12
= 7.1ps .
Jitter intrinsic to the closed control loop of the DLL is estimated to be on the order
of σjitter=8ps. Adding all these contributions quadratically, the estimated intrinsic RMS
resolution should be ~19.3ps (0.40LSB). External sources of errors, such as reference
clock jitter are not included in this estimation.
The tests performed with this prototype, and the measurement results, will be
discussed in Chapter 12.
Power dissipation.
In DLL based converters, power is mainly dissipated in the DLL itself. Reduction of
the power needed for the switching of its delay cells is mainly hampered by the device
mismatch. Since the matching requirements are quite high for these architectures, reduced
power dissipation per cell would come at the price of reduced resolution.
In this architecture, power dissipation is reduced by minimisation of the number of
DLL’s. The fine time interpolation is obtained using a passive delay line. Since the DLL
is built with the same building blocks used in the ADLL circuit, the power dissipated by
the DLL in this circuit can be estimated from what was measured in the previous
prototype to be of the order of 180mW.
Page 126
Chapter 10.
Adjustable RC Delay Line using a Tap
Selection Scheme.
In this chapter, the implementation of the tap selection adjustment scheme is
described. We start with a general analysis of how to build and analyse high accuracy RC
delay lines. These lines must abide to some layout constrains: the line dimensions must
match the dimensions of the circuits to which it interfaces and the delays are generated by
parasitic devices. The particular characteristics of this scheme are then described, together
with the calibration algorithm required to obtain the uniform time intervals.
10.1.
RC delay line.
An integrated microstrip RC delay line can be built from any of the interconnection
layers available in the chosen technology. Diffused layers usually suffer a high
temperature and supply voltage dependency, due to carrier mobility degradation and to the
variation of the depth of the junctions’ depleted region [4]. A polysilicon layer, on the
other hand, has a lower temperature dependency and negligible supply dependency (if
built over the thick oxide layer). Metal (or silicided polysilicon) layers have even smaller
environment dependency, however their small resistivity renders them impractical for
delay generation applications. The polysilicon layer will, therefore, be used to build the
delay line.
The interpolating microstrip line spans a fraction (M-1)/M of the delay of a DLL
delay cell, where M is the interpolation factor, regardless of the operating conditions. This
delay is generally short and, traditionally, the line would be analysed as a lumped
electrical element. However, such analysis would lack accuracy, since most of the critical
time information is contained in the rising edge of the propagating signal. Accurate delay
estimation must take into account the large bandwidth of the signal and thus the long
electrical length of the line at high frequencies. In these conditions, transmission line
analysis methods must be used.
Several analytical and numerical methods to perform the transient analysis of a
complex network of distributed RC lines have been proposed [5][6][7] resulting in equally
complex expressions for the propagation delay along the network. A voltage step injected
in an open-ended distributed RC line of length L propagates according to the following
equation [8], where x is an arbitrary position along the line:
Page 127
2
 

2 ∞ (− 1)k
v ( x, t )
1
x 
1
t 


= 1+ ⋅ ∑
⋅ cos  k −  ⋅ π ⋅ 1 −  ⋅ exp −  k −  ⋅ π 2 ⋅
.
 
1
π k =1
vcc
2
L 
2
RC 




k−
2
The total resistance R and capacitance C of the line are obtained from the distributed
resistivity rsq and plate and fringing capacitance, respectively cplate and cfringing.
(
)
L
L2

RC =  rsq  ⋅ c plate LW + 2 Lc fringing = rsq c plate L2 + 2rsq c fringing
.
W
 W
An important characteristic of RC lines that determines the dimensions of the
interpolator is not evident from the propagation delay equation above: as the signal
propagates along the delay line it experiences an apparent increase of propagation
velocity. The reasons for this contra-intuitive effect can be found in the slow slope of the
input pulse when compared to the propagation delay of the RC delay line. In such a short
open-ended line, the reflected pulse travelling back along the line catches up the forward
pulse before its level has crossed the logic threshold. Looking at Figure 1, if the overall
pulse amplitude is observed at position x, the closer x is to the end of the line, the earlier is
the superposition of the reflected pulse and the original pulse and, thereby, the fastest the
edge of the overall pulse crosses threshold.
R. x , C. x
~
x
R.(L-x) , C.(L-x)
L-x
Figure 1: RC line divided in two segments at access point x. R and C are, respectively resistance and
capacitance per unit length.
The delay line interfaces with the rest of the interpolator through output buffers.
Efficient layout style requires that these buffers have the same physical design so that the
resulting structure is regular and no layout related mismatches occur. Since the signal edge
does not propagate along the line at a constant velocity, an uniform delay division of the
line is obtained only if the line is accessed at irregular distances. To accommodate these
contradictory demands, the line is divided into equal delay segments that are positioned
with a pitch similar to the pitch of the output buffers, as shown in Figure 2. The gaps
opened in the line are filled with a spacer1 made of a conductor whose parasitic resistance
and capacitance is small. These spacers can be built in the metal1 layer. They are included
in the signal path therefore their contribution to the total line delay must be correctly
evaluated.
1
The distinction made between microstrip delay line and spacer reflects only a functional difference. In
reality they are microstrip lines made of different materials but embedded in the same silicon oxide dielectric
and having as reference plane the IC substrate. In consequence they are both modelled as devices with
distributed parameters.
Page 128
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
Other solutions based on non-uniform lines are difficult to implement because of the
small dependency of the delay with the line width, and the limited number of
interconnection layers available.
in
segment of equal delay
polysilicon microstrip
layer
segment of equal delay
and equal length
in
metal1 spacer layer
tap
tap
tap
tap
tap
Figure 2: Delay line division into equally sized sections.
10.1.1. RC delay line simulation model.
The complex propagation delay expression shown in the previous section does not
lend itself to easy analysis. Approximate delay estimation methods have been developed
for applications in the design automation domain [9][10]. Unfortunately, they tend to
reflect a particular network geometry and the accuracy of the delay estimations is
generally limited. In order to obtain an accurate estimation of the interpolator’s time
characteristics, a simulation model was developed that includes all the elements that
influence them. These include the polysilicon microstrip delay line segments, the metal1
spacers, the connection lines, the inter-layer contacts and the devices that make up the
driver, output buffers and capacitors.
The simplest and most accurate model of a uniform line (polysilicon or metal1) is
obtained by dividing it into small segments. The number of segments should be enough
so that each of these can be correctly modelled using a network of lumped elements. The
overall behaviour of a complex line can be obtained by connecting the uniform line
segments through the equivalent circuit of the discontinuities present in the network.
HSPICE [11] has internal models for transmission lines (U-model) which internally
divide the line into multiple T-network sections as the ones in Figure 3. However, in our
work we chose to explicitly use T-network sections as the basis of the model. It is thus
possible to avoid any dependency on the particular implementation of the simulator. In a
microstrip line with the characteristics of the one under study, inductance Ll and dielectric
conductance Gl are very small and, therefore, are not considered. The reference plane is
modelled as a single node. In reality this plane is the lightly doped IC substrate, however
Page 129
its resistivity Rref can be minimised if some layout rules are followed. These will be
explained latter.
Inter-layer contacts are modelled as single resistors whose values are extracted from
the technology parameters. In reality their resistivity depends on factors such as current
flow, and a small capacitance to the reference plane is present. However the total contact
resistance can be made small by increasing its area, rendering its variation negligible. All
other (lumped) circuit elements can be directly modelled using their equivalent circuit.
0.5.Rl.δx
0.5.Ll.δx
0.5.Rl.δx
Gl.δx
0.5.Rref.δx
Cl.δx
0.5.Ll.δx
0.5.Rref.δx
line element length = δx
Figure 3: Electrical model of an infinitesimal segment of a transmission line (the T-network).
A detail of a section of polysilicon line together with the metal1 spacers and contacts
is shown in Figure 4. The distributed electrical parameters are highlighted, for illustration
purposes. The inter-layer contact is modelled as a resistor to which the capacitors
corresponding to the ends of the connected layers are added, since they turn out to be
significant for the line width being considering.
metal1
Rm
Cm
l(plate+fringe)
l
contact
Rc
polisilicon
Rc
Rp
Rm
l
Ct
Ct
Cpl(plate+fringe)
Rsub
l
Cm
l(plate+fringe)
substrate
thick oxide
T-network T-network
Rc
Ct
metal1
contact
T-network
T-network T-network
Ct
Rc
Ct
polysilicon
contact
T-network T-network
Ct
metal1
Figure 4: Detail of the physical microstrip line and its equivalent simulation model.
A sample of the Spice model of a delay line with dimensions W (width) and L
(length), divided in N infinitesimal lumped elements is shown in the next lines. It includes
a single T-element plus the contact.
Page 130
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
.subckt T-element in out ref (layer parameters, N)
r1 in 1 ‘Rsq_layer*L/(W*N*2)’
r2 1 out ‘Rsq_layer*L/(W*N*2)’
c1 1 ref ‘Cpl_layer*L*W/N+Cfr_layer*2*L/N’
.ends T-element
.subckt Contact in out ref (layer parameters)
c1 in ref ‘Cfr_metal1*W’
c2 out ref ‘Cfr_polysilicon*W’
r1 in out ‘Rcontact/(W/2)
.ends Contact
Rsq, Cpl and Cfr are, respectively, the resistivity, the plate and the fringing
capacitance of the respective layer. Rcontact is the resistance of the contact.
The parameter spread inherent to the fabrication processing is included in the model
through three different sets of technology parameters. Each set corresponds to
representative corners of the process distribution, the centre and the two tails. The effects
of temperature and supply variation are correctly taken into account in the active device
models. This is not the case for parasitic devices, such as the microstrip delay line.
However, since this dependency is small, it can safely be disregarded.
10.2.
Tap selection delay line.
The design of an RC delay line conforming to the requirements of the built converter
starts by the definition of the general dimensions of the line and of the number of access
points needed. At this stage only the overall properties, such as the total delay, the total
length and the width of the line are important. Each line segment is made identical, for
simplicity. After having defined these properties, individual line segments can be adjusted
so that their delay becomes identical but the overall line delay does not change, resulting
in the desired RC line characteristics.
In the following lines, a more detailed description of the general design guidelines
that were followed is carried out.
Definition of the line width:
The microstrip line should be made wide so that its distributed characteristics
dominate the interpolator’s behaviour. If this was not the case, the temperature and supply
sensitivity of the lumped loads connected to the line could undermine its behaviour. The
line should also be wide enough to minimise the dimensional uncertainties due to IC
processing.
Page 131
In the tap selection adjustment scheme, the buffers are the only significant loads
connected to the line. These are simple two-stage buffers made of static inverters. The
input inverter transistors’ gate area defines the lumped loads attached to the RC line.
Mismatch considerations lead to the utilisation of large gate areas for these transistors.
First order calculations based on technological parameters result in a total gate capacitance
of ~33fF.
Since the gate capacitance has only a weak dependency on temperature and supply
voltage, it is enough that the distributed capacitance of each line segment has a larger
value than the lumped capacitor. A line width of 52µm is sufficient to obtain these
characteristics.
Definition of the number of access points:
The number of access points is determined by the adjustment scheme followed. For
the tap selection scheme, the criteria is to define the maximum allowed time interval
between access points that results in an acceptable linearity after line calibration.
Given a LSB of 48.8ps, a maximum non-linearity of ~15ps (less than 1/3 LSB) is
accepted. Therefore the maximum delay between access points has been set to 30ps. Using
the simple definition of time constant, τ=RC, as a rough approximation of the microstrip
line delay, a time constant variation of ±30% is found as process parameters are changed,
for the selected technology. Conversely, to obtain a worst case access interval of 30ps,
separation in typical conditions should not be bigger than ~21ps. Dividing the line into 32
segments (defining 33 access points) more than covers this requirement.
Definition of the line length:
A total delay of 350ps (~LSB·(M-1)) must be achieved regardless of the process
corner. The same considerations as before show that in the fast corner, the time constant is
~30% smaller than in typical conditions. Conversely, if the line covers 350/0.7=500ps in
typical conditions, then the initial condition is met in any operating conditions.
The line length is determined from parametric simulations of the complete
interpolator model, including all the devices connected to it. The output buffer pitch
defines the length of the line segments between access points. During simulations the
length of all the microstrip segments is simultaneously varied until the total required delay
is obtained.
Assuming a buffer pitch of 31.2µm, the required overall line delay is obtained when
each of the 32 segments includes a polysilicon microstrip line 7.4µm long and a metal1
spacer of 23.8µm. The resulting total distributed capacitance in each line segment is larger
than the buffer input capacitance, as desired.
Page 132
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
Adjustment of the delay of the line segments:
With identical segments all over the line, the delay between access points is smaller
towards the end the line. Adjusting these delays could be done following a trial and error
procedure, but instead a simpler approach was used:
delay
delay
The previous step resulted in a constant microstrip length vs. segment curve, and in
the corresponding non-linear delay versus segment curve. If an analytical function that
transforms the delay curve into a constant curve is found, the corresponding microstrip
length curve can be obtained using the same transformation (see Figure 5).
f(segment)
f-1(segment)
segment
tap
length
length
segment
segment
metal1 spacer
microstrip line
-1
f (segment)
segment
segment
0
m
0
m
Figure 5: Delay line segments’ length adjustment.
This transformation is valid if the microstrip line is uniform and the edge
propagating along the line has constant characteristics. This is not the case of the line
under study, since metal1 spacers interrupt the microstrip line and the output buffers load
the line in discrete points. However the uniform line approximation has enough accuracy
since the metal1 spacers, due to their low resistivity, have little effect on the delay
characteristics of the line. The first design criteria also guarantees that the characteristics
of output buffers can be neglected in this analysis. The signal characteristics along the line
are, to a large extent, invariant.
The original delay vs. tap curve can be accurately described by a high order
polynomial. In this case a fifth order is accurate enough:
delay(x) = a0 + a1 ⋅ x + a 2 ⋅ x 2 + a3 ⋅ x 3 + a 4 ⋅ x 4 + a5 ⋅ x 5 ,
where the polynomial constants are obtained from a best squares fit to the delay curve.
The inverse function converts the curve into a constant value. It is sufficient to multiply
this result by the desired segment delay (delayave) to obtain the required transformation:
F( x) = delay ave ⋅ delay −1 ( x) =
delay ave
a0 + a1 ⋅ x + a 2 ⋅ x 2 + a3 ⋅ x 3 + a 4 ⋅ x 4 + a 4 ⋅ x 5
.
Page 133
The multiplying factors obtained for each segment of the actually implemented line
are shown in Figure 6. The factor that corresponds to the buffer pitch is also shown. The
transformation results in three segments being larger than the buffer pitch. This leads to a
longer delay line, which in turn affects the total line delay.
8
7
A d ju stm en t F u n ctio n
6
B u ffe r P itch
5
4
3
2
1
0
0
4
8
12
16
20
24
28
32
seg m en t
Figure 6: Adjustment function values.
The lengthening of the line after application of the transformation stems from the,
limited, inaccuracy of the uniform line approximation used. In particular the assumption
that the characteristics of the signal propagating along the line do not change is not true.
The rise time of this signal is longer towards the end of the line, as shown in Figure 7.
2.15
2.1
rise time (ns)
2.05
2
1.95
1.9
1.85
Original
1.8
Adjusted
1.75
0
4
8
12
16
20
24
28
32
segment
Figure 7: Signal’s rise time along the original and adjusted delay line, in typical conditions (simulated).
However, the deviation caused by this assumption is small and it is effectively
countered by designing the original line with a shorter delay range. Simulations of the
adjusted line result in the graphics shown in Figure 8. A maximum segment delay nonlinearity of 4ps is found under typical conditions. Only minor linearity degradation is
observed as operating conditions are varied. These simulations confirm that the maximum
segment delay is 31.3ps and that the line spans a minimum of 378ps thus abiding to all
design criteria.
Page 134
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
Other considerations:
An RC line has a low-pass filter behaviour. The attenuation of the high frequency
signal components as it progresses along the line contributes to the delay characteristics of
the line, due to the degradation of the edge slope it provokes. This effect should be kept
small, so that the uniform line approximation we have been considering is valid.
Therefore, the line should be made such that the edge slope along the line segments used
for time interpolation is constant or has only a small degradation, regardless of variations
on the input signal due to temperature or supply variations.
35
1000
900
cummulative delay (ps)
30
delay (ps)
25
20
15
10
5
typical
800
700
fast
slow
600
500
400
300
200
100
0
0
0
4
8
12
16
20
24
28
0
32
4
8
12
16
20
24
28
32
segment
segment
Figure 8: Delay and cumulative delay of each line segment (from simulations).
The inclusion of a leading adaptation section in the beginning of the line is a simple
way of achieving this goal. This section is not directly used for time interpolation, but it
adapts the signal bandwidth to the delay line’s characteristics. The signal delay due to the
input adaptation section results in an added offset to the measurements, however it does
not influence the time interpolation function.
The signal’s velocity increase along the line is very marked in the last interpolation
segments. The adjustment function would thus generate very large multiplication factors
for these segments. The resulting long microstrip segments would make an inefficient
interpolator layout. The use of a trailing adaptation section to behave as a load to the last
segments of the line allows for a smaller spread of the segment delays and thus, shorter
final segments are possible. The length of the adaptation sections is limited by the driving
capability of the input driver. The use of these adaptation sections is illustrated in Figure
9.
leading
section
spacer
trailing
section
taps
Figure 9: The leading and trailing adaptation sections.
Page 135
The graphs in Figure 10 clearly show the effects of the inclusion of a leading and a
trailing section of 79µm length. They result from simulations of the complete interpolator,
including input driver and output buffers. The segment delay sensitivity to operating
conditions is minimal if these sections are included, whereas it increases if they are
excluded. The absence of trailing section also generates very small segments towards the
end of the line.
22
22
line with leading and trailing sections
20
line without leading and trailing sections
20
18
18
16
16
14
14
12
5V/25C
12
10
4.5V/100C
10
8
5.5V/0C
8
6
6
0
4
8
12
16
20
segment
24
28
32
0
4
8
12
16
20
24
28
32
segment
Figure 10: Segment delay sensitivity to operating conditions (from simulations). The first and second graphs
correspond, respectively, to the same line with and without leading and trailing sections.
All the graphs presented so far are obtained from simulations of the complete model
of the RC delay line. This model assumes an ideal reference plane for the distributed line,
which is only roughly approximated by the lightly doped p-substrate used. In order to
reduce the reference plain resistance a ground connected wide guard-ring structure is
implemented enclosing the RC line. This way the path of the charges displaced on the
substrate as the signal progresses through the line is reduced and its effective resistance is
small. The guard-ring also collects charges that are coupled to the substrate by other
devices on the circuit, therefore obtaining a better isolation of the RC delay line.
10.2.1. Tap selection circuitry.
The selection of access points for the taps is performed after the output buffers, so
that it doesn’t influence the delay line. To achieve maximum design flexibility, it was
decided that all access points be accessible to all the taps. This results in a somewhat
complex connectivity and in a long serial selection chain, as shown in Figure 11.
The selection of the actual access point is performed by the assertion of the
respective programmable selection bit. This closes the adjoining transmission gate switch,
establishing the intended connection. No hard-wired restriction exists to the parallel
connection of a tap to more than one access point, which would result in a finer time
interpolation. However this option will not be used since it would require an unnecessarily
complex calibration algorithm, leading to increased silicon consumption.
Page 136
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
The programmable serial chain is quite long, having 256 bits. It should be noticed
that the program word can be loaded at a low rate and that once the final calibration
parameters are established, all activity in this circuitry is stopped, reducing the power
dissipation and eliminating potential noise sources. The full selection circuitry shown in
Figure 11 uses 0.85mm2 of silicon. The area occupied by this block can be reduced by
limitation of the selectivity of the access points.
RC delay line access points
0
sel. out
sel. in
sel. strobe
32
serial selection
chain
0
7
taps
Figure 11: The access point selection circuitry.
10.3.
Auto calibration circuitry.
The adjustable line requires some means of automatic calibration in order to be
complete. The calibration procedure can be divided into two major steps. In a first step the
delay line is characterised (characterisation step). These characteristics can then be used
to compute the access points that tune the taps to the required position (tuning step).
Characterisation step.
The characterisation of each segment of the RC delay line could be done using the
delay of one DLL cell as a reference. However, the delay of these cells suffers some
variation due to mismatch and, therefore, they are not a good reference. Since the number
of bins into which the reference period is divided is fixed, this knowledge can be used to
derive the size of the ideal bin (LSB) and use it as a reference for calibration.
A statistical code density test (CDT) [2] offers a characterisation method with the
required properties and, furthermore, is easily implemented on chip. The code density test
applied to a time interpolator requires the collection of a large set of random hits. These
hits are registered and the number of hits collected for each possible output code (or bin) is
Page 137
histogrammed. The difference between the bin contents is a direct measure of the relative
size of each time bin.
The histogramming can be performed for all the individual time bins in the circuit
( M ⋅ N bins) to obtain a detailed characterisation of the combination of the DLL, the RC
delay line and the hit registers. However only the RC delay line must be characterised.
Therefore, the values corresponding to the same RC line bin can be summed across the
DLL, effectively obtaining an average measure of the line across the DLL. An added
advantage is that the effects of hit register mismatch are also averaged, therefore an
accurate characterisation of the size of the RC line bins is obtained.
The size difference between bins due to mismatch of the output buffers is
indistinguishable from the difference due to mismatch of the actual line segments. It is, in
fact lumped together with it and so the line characteristics obtained reflect this increased
error. However this is advantageous, since in this way also the buffer delays are calibrated.
Tuning step.
In the tuning step the measured line characteristics are analysed. Non-linearity
surpassing a given limit is identified and correction measures computed. These measures
are then translated into a calibration word that is serially programmed into the adjustable
delay line. Computation complexity, and therefore the amount of hardware needed,
depends on the amount of information that can be extracted from the characterisation step.
A trade-off can be established between these two steps. A faster calibration requires
a larger hardware block and a slower calibration can be performed with little hardware.
Two calibration algorithms, representing the two extreme cases, will be presented. In the
first one, an iterative procedure is established where a global line characterisation is used
to make small adjustments to the line. The procedure is repeated until the line has been
tuned to the desired linearity range. This algorithm requires a small calibration hardware
block, but it may result in long calibration time for extreme parameter deviations. In the
second algorithm, a lengthy, but complete, characterisation of the line is performed. From
this the calibration parameters are obtained in one step at the expense of significant
hardware requirements.
10.3.1. Calibration algorithms.
The RC delay line adjustment allows only for a discrete number of adjustment
options, therefore the accuracy of the calibration results are limited by the adjustment
quantising step. In the calibration algorithms that we developed for this purpose, the
concept of tolerance, or of non-linearity limit, is used to express the maximum calibration
tolerance allowed for a given application. In the case of the iterative algorithm, the
calibration tolerance can be traded of for calibration time.
Page 138
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
The algorithms presented here must be simple to implement in hardware, therefore
INL was chosen as the only accuracy criteria. Integral non-linearity error, due to its
cumulative action, is the limiting factor in the overall linearity of the converter.
Algorithms that also take into account DNL as an accuracy criteria are presented in
Appendix G. Their hardware implementation is more complex, and their convergence
slower.
Iterative algorithm.
The starting point of this algorithm is the bin size histogram, obtained after running
the characterisation step with the calibration parameters extracted from simulations
corresponding to the typical process and environment conditions. Each iteration of the
algorithm consists in the sequential analysis of a bin to verify if it conforms to the nonlinearity limits. If this is not the case, new calibration parameters, corresponding to the
addition or subtraction of one delay segment to the respective tap, are calculated. The
same variation is applied to all the taps in front of it so that the time difference between
these taps (the bin size) is unchanged. These steps (characterisation, analysis and tuning)
are repeated until the bin linearity conditions are met. The procedure is then repeated for
the next bin in the sequence.
The analysis of the linearity of a bin is based on the bin cumulative histogram
ch[bin]. It is compared to the ideal histogram (developed from the knowledge of the ideal
converter’s bin size LSB). The following operations check if the line conforms to the
integral linearity limit and takes corrective measures for the offending bins.
for i= 0 to M-1
tap[i]= segment_from_simulation_of_typical_conditions;
for bin= 0 to M-2
repeat until no_changes
Characterisation step;
if ( ch[bin]< LSB·( bin+1-limINL))
for i= 0 to M-bin-2
tap[bin+i+1]= tap[bin+i+1]+1;
else
if ( ch[bin]> LSB·( bin+1+limINL))
for i= 0 to M-bin-2
tap[bin+i+1]= tap[bin+i+1]-1;
else
no_changes
Page 139
In Figure 12 the algorithm is illustrated. The acceptable limit of the integral nonlinearity is limINL. This limit must be chosen in accordance to the calibration steps
available. Limits in the order of 0.5LSB guarantee sufficient linearity and only require a
limited number of iterations per tap. The access point selection for each tap is captured in
tap[i].
tap[all]=typical conditions
for bin=0..M-2
CDT
cumulative
histogram[bin]
repeat until changes=0
for i=0..M-bin-2
tap[bin+i+1]=
tap[bin+i +1]+1
Y
(bin+1-limINL).LSB
<
N
changes=1
Y
(bin+1+limINL).LSB
for i=0..M-bin-2
<
tap[bin+i +1]=
tap[bin+i +1]-1
N
Figure 12: Calibration procedure for the tap selection adjustment scheme.
In Figure 13 the results of a simulated calibration run using the proposed algorithm
with limINL=0.3LSB are shown. The interpolation non-linearity is kept within the
established limits (0.3LSB). By construction, the algorithm doesn’t search for the optimal
calibration parameters; it stops immediately after the non-linearity limits have been
achieved. The calibration of the particular line conditions exposed required only 10 and 8
characterisation steps, respectively for the “fast” and for the “slow” parameter conditions.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
typical
0
1
2
fast
3
bin RC
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
slow
4
5
6
0
1
2
3
4
5
6
binRC
Figure 13: Results of calibration for different conditions, using the iterative algorithm (from simulation).
Page 140
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
The definition of the calibration starting point as being the typical calibration
parameters reflects the probability of starting the iteration close to the final result. In fact,
any starting point could be used since it would only affect the speed of convergence of the
algorithm.
If tighter linearity limits are enforced, it is possible to obtain better results. The
graphs in Figure 14 where obtained with limINL=0.1LSB for the “fast” conditions.
However, in worst case conditions (“slow”) this limit cannot be enforced, since the delay
line segments are longer than that limit. If the linearity limit is set too tight, than the
convergence of the simple algorithm here proposed may not be guaranteed. A simple way
to solve this problem is not to allow the algorithm to oscillate between two calibration
settings for any bin.
The DNL graphs obtained after calibration was performed are also shown. They
emphasise the fact that due to the regularity of the structure, the maximum DNL is smaller
than what would theoretically be its limit (2·limINL).
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
typical
0
1
2
fast
3
binRC
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
slow
4
5
6
0
1
2
3
4
5
6
bin RC
Figure 14: Results of calibration using the optimum linearity limit (from simulation).
Single step algorithm.
The first step of this algorithm is a detailed characterisation of the RC delay line,
where the size of all the line segments are histogrammed. It is then possible to select the
tap access points that lead to the best interpolation linearity.
To characterise the 32 line segments into which the delay line is divided using only
the 8 taps available, 5 characterisation steps are needed. The small overlap between the
range that each covers is required to guarantee that also the segments in the extremities of
each range are covered. After these 5 characterisation steps, all information required to
build a cumulative histogram of the segment size is available and it is sufficient to
compare this histogram with the ideal cumulative bin size curve to derive the desired
access points.
In the next few lines, an algorithm that finds the best possible calibration parameters
for the line, regardless of the particular conditions, is schematically presented. The
Page 141
algorithm finds the tap access points that result in the nearest approximation to the ideal
cumulative bin size curve.
tap[0]=0 ;
for i=1 to M-1
for segment=0 to 31
if (ch[segment]< LSB·i & ch[segment+1]> LSB·i)
if (LSB·i-ch[segment]< ch[segment+1]-LSB·i)
tap[i]=segment ;
else
tap[i]=segment+1 ;
In Figure 15 the results of a simulated calibration of the delay line using this
algorithm are shown. The emphasis on minimising the integral non-linearity of the line is
clearly seen in the graphs. The differential non-linearity is, anyway, kept within the
accepted limits for any conditions. The same simulation conditions as before were used.
Comparison with the results obtained using the iterative algorithm show that, if the
linearity limits enforced when using that algorithm are tight enough, then similar results
are obtained, as would be expected.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
typical
0
1
2
fast
3
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
slow
4
5
binRC
6
0
1
2
3
4
5
6
binRC
Figure 15: Results of calibration for different conditions (from simulation).
10.3.2. Hardware implementation.
Two variables determine the silicon area required to implement these calibration
algorithms, the amount of memory needed and the complexity of the calculations needed.
These may be traded-off for calibration time.
To determine the amount of memory needed, the number of hits n that must be
collected is determined from the formula developed in the Appendix D:
Page 142
Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme.
2
z
 1 
n ≥  α / 2  ⋅  − 1 .
 β  p 
We will consider the same tolerance (β=5% of the final bin) and confidence level
(98%, corresponding to α=2%) for both cases, so that the number of required hits is only
depending on the bin size that is to be characterised. In the iterative procedure the bin to
be characterised corresponds to the interpolator’s LSB, with a hit probability
p=1/M=0.125. In the single step procedure all the line segments must be characterised,
regardless of the particular working conditions. The minimum bin that must be accurately
characterised is then ~10ps wide, corresponding to a hit probability
p=10/(LSB·M)=0.0256. The following table summarises the relevant numbers obtained
when these calculations are carried out. The tolerance for the INL measurements is
obtained using the expressions that were also developed in Appendix D.
confidence tolerance number
level
(DNL)
of bins
5% (LSB)
7
iterative
98%
single step
98%
11% (seg.)
32
algorithm
number
of hits
<16383
<16383
n
2
14
14
tolerance
(INL)
13% (LSB)
62% (seg.)
Table 1: Comparison of the two proposed algorithms.
In this table the tolerance is measured a fraction of the quantity being measured, one
LSB (~48.8ps) for the iterative algorithm and one minimum segment delay (~10ps). The
same reasoning used for the determination of the tolerance of the INL measurements leads
to the conclusion that the addition of a number of line segments to obtain the calibrated
bin results in similar final DNL and INL measurement tolerances, expressed in LSB, for
both algorithms.
The register requirements for bin storage and histogram build-up for the two
architectures are shown in Table 2. To each of these registers corresponds an equal length
accumulator.
histogram
cumulative histogram
number size (bits) number size (bits)
iterative
7
12
1
14
single step
32
11
1
14
algorithm
total
(bits)
98
366
Table 2: Register (accumulator) requirements for the two proposed algorithms.
The other comparison item is the complexity of the computing needed for each
algorithm. The iterative algorithm, as shown in Figure 12, requires only a few comparators
(see Table 3), one accumulator per bin, and a small amount of decision logic. The single
step algorithm needs a larger arithmetic unit, capable of performing the more complex
Page 143
decisions required. The silicon area that it uses is therefore much bigger than in the case of
the simple iterative algorithm.
algorithm number size (bits)
iterative
2
14
single step
4
14
Table 3: Comparator requirements for the two proposed algorithms.
The time used by each calibration algorithm is, to a large extent, determined by the
hit collection time. The iterative algorithm does not have a fixed number of
characterisation runs, so the calibration time will vary with the actual conditions found.
However, if the number of iterations is f, then the time is proportional to f·214, whereas the
single step algorithm takes a time proportional to 5·214, where the constant of
proportionality is the collection time of a single hit. It is therefore clear that only if more
!
than 5 characterisation steps are required (f
the single step algorithm.
Page 144
Chapter 11.
Adjustable RC Delay Line using a Variable
Lumped Capacitor Scheme.
In this chapter an RC delay line adjustment scheme using banks of selectable
capacitors will be analysed in detail. We follow the same analysis method that was
pursued for the tap selection adjustment scheme. We will only concentrate on the features
that differ from the previous chapter, referencing to it the relevant common topics.
11.1.
Lumped capacitor delay line.
In the lumped capacitor adjustable delay line scheme, the adjustment of the RC line
is performed by lumped load variation. This load is an important contributor to the overall
delay therefore the uniform line approximation previously used is no longer valid. A
slightly different set of design rules applies to this line:
Definition of the line width:
The width of the microstrip line is mainly defined by layout considerations. It
should result in a good compromise between two conflicting requirements. The line
should be kept wide enough to render dimensional uncertainties due to IC processing
small1 and to lower the contact resistivity. However, it should be made narrow so that its
overall capacitance is small and that small selectable capacitors can be used to adjust the
its delay. The capacity of the unit capacitor used is ~37.5fF, therefore a line width of
40µm results in an acceptable calibration sensitivity.
Definition of the number of access points:
The number of access points is predefined by the intended interpolation factor M.
The number of required access points is M=8, corresponding to M-1=7 line segments. The
RC line dimensions must match the dimensions of the output buffer and associated delay
adjustment circuitry. The 7 segments into which the line is divided include a polysilicon
microstrip line and a metal1 spacer.
1
This condition is not strictly necessary since any delay mismatch due to these uncertainties can be
corrected during calibration. However, to enable the utilisation of the calibration parameters derived for one
channel in several channels, it is convenient to minimise the mismatch between delay lines.
Page 145
As will be shown later, the last taps along the line have smaller adjustment
sensitivity (see Figure 5) since their delay can only be adjusted varying the capacitors in
front of it. To extend the adjustment range of the last taps, an extra adjustment point is
introduced after the last access point. For reasons of symmetry of the timing
characteristics of the line, this adjustment point is treated as another access point.
Therefore the number of access points implemented is M+1=9, the line being divided in 8
segments.
Definition of the line length:
The total line length is defined as for the previous scheme. A total delay of ~350ps,
corresponding to 7 segments of 48.8ps must be covered, regardless of operating
conditions. A parametric simulation of the complete interpolator model was again used to
obtain the correct overall delay. However, since similar segments are used, the delay of
each of the line segments changes considerably along the line.
Given a pitch of the adjustment circuitry of 50µm and typical working conditions,
the required overall line delay is obtained when each segment is made of a polysilicon
microstrip 35µm long and a metal1 spacer of 15µm. The middle calibration parameters
are used, resulting in a capacitance of ~150fF connected to each adjustment point.
Adjustment of the delay of the line segments:
The distributed line parameters are not dominant in this scheme therefore the
procedure previously used is not accurate. It results in a rough first approximation that
should be improved by means of parametric simulations. These simulations include the
lumped capacitors that make up the calibration scheme. The calibration is performed by
addition, or subtraction, of ~37.5fF unit capacitors from a bank middle capacity value of
~150fF.
The multiplication factors obtained from the transformation function previously
developed applied to this line and the ones actually implemented are shown in Figure 1.
7
Calculated
6
Actual
5
Buffer Pitch
4
3
2
1
0
0
1
2
3
4
5
6
7
s egment
Figure 1: Adjustment function values (calculated and actually implemented).
Page 146
Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
The size of the RC line bins, after the adjustment has been performed, is shown in
Figure 2. The effects of the parameter spread due to IC processing are clearly visible in
the first graph. In the second graph only the environment conditions are changed, the
calibration parameters are the same for all conditions. It demonstrates that only minor
variation of the delay is provoked by extreme environment conditions.
Other considerations:
The same considerations developed for the previous scheme lead to the inclusion of
leading and trailing section 210µm long. A longer leading section would lead to a reduced
effect of varying input signal characteristics (due to environment changes) in the delay
line. However, the driver capabilities would be unnecessarily stretched by this increase in
output load.
80
60
70
55
60
50
50
45
40
40
30
35
5V/25C
30
4.5V/100C
25
5.5V/0C
20
typical
10
slow
fast
0
20
0
1
2
3
4
5
6
binRC
0
1
2
3
4
5
6
binRC
Figure 2: Bin size (from simulation). The first graph compares different design corners. The second graph
shows the effects of extreme environment variation for the typical process.
11.1.1. Lumped capacitor selection circuitry.
The variable capacitors implemented in each of the 9 access points are made of a
bank of 7 unit sized capacitors that can be selectively connected to the RC delay line. The
selection of the number of bank capacitors that are connected to the line is binary
encoded. It is therefore possible to select 8 discrete capacitance levels, the resulting in a
±3 levels selection range.
The capacitor bank is schematised in Figure 3. Each capacitor is made of a square
16µm PMOS device working in accumulation mode. This mode of operation results in a
more linear and fast capacitor since the accumulation of charges under the gate guarantees
their immediate availability. The temperature and supply voltage sensitivity of devices
operating in accumulation mode is very low. Furthermore the n-well in which they are
built increases their isolation from substrate noise. In typical conditions, each of these
capacitors has ~37.2fF of capacitance. Unit-sized capacitors are used instead of scaled
single capacitors to guarantee good matching of their values.
2
Page 147
from
line
to hit
registers
1x
cal<0>
2x
cal<1>
4x
cal<2>
Figure 3: The unit capacitor bank.
The selection of capacitors is made using a NMOS pass-transistor. This transistor is
sized to have a high source-drain conductance. The conductance of a device is sensitive to
temperature and supply variations: the quantity of thermally generated carriers and the
saturation velocity of the carriers in the channel are a function of the device temperature.
The electric field across the channel is a function of the gate voltage, itself proportional to
the supply voltage. The conductance of the pass-transistor must be high enough, to
minimise the effects of these variations.
The pass-transistor is cut during a part of the signal excursion. In fact, as the input
signal rises and the voltage on the gate of the capacitor follows, the Vgs of the passtransistor is reduced. When it is smaller than the threshold voltage Vth, the pass-transistor
cuts its channel, therefore isolating the line from the adjustment capacitor. This, however,
does not affect the timing characteristics of the line since it occurs well after the threshold
voltage of the output buffer has been crossed. The signal edge progressing towards the
following taps is not affected by the variations on the line characteristics occurring in the
section of the line already crossed. In addition to the bank capacitance, a fixed capacitance
due to the output buffer and to the diffusions of the pass transistors is also connected to
the line.
R-C delay line access points
serial selection
chain
0
7
0
7
taps
Figure 4: The lumped capacitor selection circuitry.
Page 148
8
Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
In Figure 4 the lumped capacitor selection circuitry is shown. Each capacitor bank is
represented by a variable capacitor. The capacitor bank connected to tap0 is included only
for layout symmetry purposes, since it does not affect the tap delay. This adjustment
scheme gives a compact layout, the selection circuitry of each tap requiring only 6620µm2
of silicon, resulting in a total area of 0.25mm2.
11.2.
Auto calibration circuitry.
The auto calibration procedure follows the same basic steps previously described. It
starts by characterising the line and proceeds to tune the calibration parameters in order to
make the integral and differential non-linearity of the line smaller than a pre-determined
limit.
In this scheme the sensitivity of the delay between two taps (bin size) to a unit
variation in a given capacitor bank is a complex function of the distance between the tap
and the capacitor bank being changed and the position of the capacitor bank within the
line. The graph in Figure 5 summarises the tap delay sensitivity to a unit change in each
capacitor bank. It is not practical to identify all combinations of bin size sensitivity in all
environment conditions. The calibration procedure must therefore be able to tune the size
of the bin without this knowledge. An iterative procedure that follows the two step
characterisation/tuning scheme is proposed to obtain the correct calibration parameters.
16
14
12
cap 1
cap 4
cap 7
10
8
cap 2
cap 5
cap 8
cap 3
cap 6
all
6
4
2
0
0
1
2
3
4
5
6
binRC
Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation).
The adjustment capacitor banks (cap1-7) are located, respectively, in tap1-7 and an
extra capacitor bank (cap 8) is included in the end of the line to enable a wider tuning
range of the last tap. The graph in Figure 5 shows that the sensitivity of the bin size
increases as the varying capacitor is closer to it and that the cumulative effect of a unit
variation in all capacitor banks is quite independent of the bin under consideration.
The graph also shows the capacitor variations occurring before the bin under
consideration do not change its size. The reason for this is that the properties of a signal
propagating on an RC line are dominated by the characteristics of the section of the line
that lay ahead of it. There is a small contribution from the line section behind it through
Page 149
signal attenuation and edge slope degradation. However, for the short line under
consideration, these effects are small.
11.2.1. Calibration algorithm.
The starting point of the algorithm is the delay histogram obtained after running the
characterisation step using the smallest capacitor selection in every bank. With these
calibration settings the overall delay of the line and of the individual bins is guaranteed to
be shorter than the required delay, regardless of the operating conditions.
The calibration sequence tries to tune the delay line to the linearity limits following
two procedures sequentially. In the coarse tuning procedure, the overall line delay is
increased until it is close to the desired delay. The following fine tuning procedure
individually adjusts the delay of each tap to make them conform to the linearity
requirements. Delay tuning using this sequence is preferred to the use of the single fine
tuning procedure because it results in faster convergence and, therefore, in better results.
Coarse tuning procedure.
In this procedure the capacity of all the banks is simultaneously incremented by one
unit capacitor, resulting in a uniform increase of the size of all bins. The procedure is
repeated until the cumulative bin size is smaller than the ideal delay by less than a
determined limit limcoarse, which is set to 1LSB. In the following lines the procedure is
schematically described:
for bank= 1 to M
cap[bank]= 0;
repeat until ( ch[M-2]
·( M-1-limcoarse ) )
Characterisation step;
for bank= 1 to M
cap[bank]= cap[bank]+1;
The calibration parameters for each capacitor bank are described by cap[bank] and
ch[M-1] is the cumulative bin size histogram. A block diagram of the algorithm is shown
in Figure 6, where the characterisation step is represented by the Code Density Test it
performs.
When coarse tuning has been completed, the size of each bin is similar for all bins
in the line, to the extent of its matching characteristics. The delay error is therefore evenly
divided among all the bins. The average differential non-linearity is then small and so the
fine tuning procedure can mainly concentrate on adjusting the integral non-linearity.
Page 150
Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
initial calibration
repeat until changes=0
CDT
cumulative
histogram[M-2]
for bank=1..M
Y
(M-1-limcoarse).LSB
<
N
cap[bank]= cap[bank]+1
changes= 1
Figure 6: The coarse calibration procedure.
Fine tuning procedure.
After coarse delay tuning, the fine tuning procedure can be used. Each bin is
evaluated one by one and a new set of calibration parameters is iteratively determined to
adjust the line delay. The fine tuning procedure builds on the results obtained with the
coarse procedure. Each bin is sequentially evaluated to determine if it adheres to the
linearity limits. If that is not the case, the capacity of the respective capacitor bank is
increased by one unit. This unit increase is repeated for all subsequent banks until a
satisfactory result is obtained.
Changing the capacitance of a capacitor bank affects all the bins that are located
previous to it in the line. However, since the coarse adjustment step guarantees that the
line is shorter than the ideal line and that all the bins have similar size, this effect
contributes to improve the linearity of the line.
The fine calibration algorithm is schematically presented in the next few lines.
limINL is the differential and integral linearity limit.
for bin= 0 to M-2
bank= bin+1;
repeat until ( no_changes | bank> M)
Characterisation step;
if( ch[bin] < LSB·( bin+1-limINL ))
cap[bank]= cap[bank]+1;
bank= bank+1;
else
no_changes
Page 151
This algorithm approaches the final calibration solution by small increases in the
size of the bin, therefore only the inferior limits to the linearity need to be checked. The
tap delay increase per fine characterisation step is not enough to surpass the superior
linearity limits, in any conditions. In Figure 7 a diagram of the fine calibration algorithm
is shown.
from coarse calibration
for bin=0..M-2
bank= bin+ 1
repeat until changes=0 | bank>M
CDT
cumulative
histogram[bin]
Y
cap[bank]= cap[bank]+1
<
(bin+1-limINL).LSB
bank= bank+ 1
N
changes= 1
Figure 7: The fine calibration procedure.
On the graphs of Figure 8, the results of the coarse calibration step are shown for
different simulation conditions. Since calibration started from the shortest possible line
configuration, all the bins are smaller than intended and the expected downward slope of
the INL curve is found.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
typical
0
1
fast
2
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
slow
3
4
5
6
0
1
2
3
4
5
6
binRC
binRC
Figure 8: Results of the coarse calibration step for different conditions using the proposed algorithm (from
simulation).
Using restrictive limits in the fine calibration steps, an optimised calibration can be
obtained. In Figure 9 the results of the fine calibration step are shown. The linearity limit
limINL was set to 0.1LSB. In extreme conditions this limit proves to be too strict for the
Page 152
Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme.
simple algorithm proposed. However, the linearity of the line after calibration is better
than 0.2LSB, in any conditions.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
typical
0
1
fast
2
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
slow
3
4
5
6
0
1
2
3
4
5
6
binRC
binRC
Figure 9: Results of the fine calibration for different conditions using restrictive linearity limits (from
simulation).
The number of calibration steps required for each of these conditions where 12, 15
and 7 steps, respectively for the “typical”, “fast” and “slow” simulation conditions. Since
the calibration algorithm begins with the calibration settings resulting in the fastest
possible RC delay line, the simulation conditions that lead to a slower starting point
(“slow” conditions), require less calibration steps to converge into the final calibration
settings.
11.2.2. Hardware implementation.
This calibration algorithm is quite similar to the iterative algorithm proposed for the
tap selection implementation of the line. The hardware requirements are also similar, since
the number of taps to tune and the number of hits that should be collected for line
characterisation are the same. The following tables resume the hardware requirements in
terms of registers (and respective accumulators) and comparators. Requirements in terms
of control logic are similar to the iterative calibration algorithm proposed for the tap
selection adjustment scheme.
algorithm
iterative
histogram
cumulative histogram
number size (bits) number size (bits)
7
12
1
14
total
(bits)
98
Table 1: Register (accumulator) requirements for the present algorithm.
algorithm number size (bits)
iterative
1
14
Table 2: Comparator requirements for the present algorithm.
Page 153
11.3.
Comparing the two adjustment schemes.
A simple comparison of the two adjustment schemes proposed in this part of the
thesis shows that it is possible to adjust the linearity of RC delay line to the desired
values. Although the calibration aims at obtaining a small integral non-linearity, the
differential non-linearity that is achieved under any simulation conditions is also small.
Simulations show that using the lumped capacitor scheme leads to better final results.
However, these results are obtained at the expense of a longer calibration time.
Due to the independence of the calibration of each tap, the calibration principle of
the tap selection scheme is simple. The limit for the linearity that can be achieved with the
RC delay line is determined the number of access points that are implemented.
In the lumped capacitor scheme, the calibration of each tap is not independent, its
change affecting several taps differently. The calibration algorithm takes into account all
these effects, therefore its working principle is more complex and the calibration time is
longer. However, due to the multiple combinations of effects that can be used, the final
RC delay line linearity is potentially better.
Page 154
Chapter 12.
Experimental Results.
In this chapter the results of tests performed on the TDC’s prototype are reported.
The test procedure followed is very similar to the one detailed for the ADLL prototype in
the previous part of this work, so it will not be described again. The performance of the
two interpolation topologies will be shown separately. Their evaluation follows the same
criteria: Linearity, temperature sensitivity, power dissipation and timing resolution.
The calibration algorithms for the RC delay line where implemented in software,
which has the advantage of allowing for high flexibility. For example, the calibration
limits can be easily adjusted to the performance required. To generate the random hits,
both an external pulse generator and an internal oscillator where used, without any
noticeable difference. A set of 600,000 random hits is used to characterise the delay line.
According to the calculations obtained in Appendix D, this results in a 98% confidence
level that the measured results are correct within a tolerance of 0.8% (DNL) and 2.2%
(INL). It should be noted that when characterising the complete converter, a tolerance of
3.4% (DNL) and 19.4% (INL) is obtained for the same confidence level.
12.1.
Tap selection scheme.
The graphs in the Figure 1 illustrate the results of the calibration of the delay line
implementing the tap selection adjustment scheme, obtained using the iterative calibration
algorithm. The graph labelled “before” represents the state of the line before calibration.
In this situation the calibration parameters resulting from simulations of the typical
conditions are used. The other graph (labelled “after”) is the final result of the calibration.
Differential and Integral non-linearity of the RC delay line better than ±0.2LSB is
achieved.
It is noteworthy that the linearity of the line previous to calibration is close to the
traditional 0.5LSB acceptance limit, which shows that the models used to describe the
delay line are quite accurate.
Page 155
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
before
0
1
2
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
after
3
4
5
6
before
1
7
2
after
3
4
5
6
7
8
binRC
binRC
Figure 1: Delay line calibration results: DNL and INL graphs.
The DNL graph is repeated in Figure 2, together with the maximum and minimum
delay measured for each tap in every hit register column. The spread in the measured
delay results from timing mismatch of the hit registers corresponding to the same tap. It
shows a maximum timing error spread of ~0.55LSB (27ps). The delay of the last tap (tap
8) is defined by the difference between the propagation delay of the hit signal along the
RC delay line and the propagation delay of the clock signal along one DLL delay cell. The
variation of its delay includes, therefore, a contribution from the delay mismatch of the
DLL delay cells, which cannot be distinguished from the other contributions. Therefore it
is not shown in this graph.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
min
0
1
max
2
ave
3
4
5
6
7
binRC
Figure 2: Spread of the RC line tap delay over the DLL cells.
The RC delay line was measured at several temperature conditions to verify its
immunity to temperature changes. The results are shown in the graphs of Figure 3.
The circuit was heated up to the specified temperatures using a heat source that
could be moved closer or further away from the circuit. The temperature was measured
directly on the package using an electronic thermometer. Only after the selected
temperature stabilised was the characterisation performed. A different chip was used in
this test, therefore the linearity graphs have different shapes from the ones previously
shown. However, it is clear that the calibration procedure used also resulted in good RC
delay line linearity.
Page 156
Chapter 12: Experimental Results.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
30C
0
40C
1
2
50C
3
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
60C
4
5
6
7
30C
1
2
40C
3
4
50C
5
60C
6
7
8
binRC
binRC
Figure 3: Temperature dependency of the RC delay line.
The delay variation of the complete line is measured from the variation of the delay
of the last tap. This method is valid since the last tap is defined in one extreme by the
temperature independent delay of the DLL delay cell. Any variation of the delay of the
line will be reflected in a symmetric variation of the delay of the last tap. A total variation
of 17,3% of an LSB is observed for a temperature increase of 30oC, which means that the
delay of each RC line tap increased in average ~2.5%. This result can be extrapolated to
the complete temperature range, resulting in a temperature sensitivity of only 0.83% per
10oC.
Voltage supply sensitivity was also investigated. The procedure used was to
characterise the delay line at different supply levels, within the allowed range for the
technology. No significant delay variation was observed.
12.1.1. The complete interpolator.
The RC delay line is an integral part of the time interpolator. Their correct
integration is proven by the linearity graphs of the time-to-digital converter built from it.
The graphs of Figure 4 correspond to the DNL and INL of the converter.
1
1
0.75
0.75
0.5
0.5
0.25
0.25
0
0
-0.25
-0.25
-0.5
-0.5
-0.75
-1
-0.75
-1
-1.25
0
16
32
48
64
bin
80
96
112
128
1
17
33
49
65
81
97
113
bin
Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable delay line).
Page 157
A maximum integral non-linearity INLmax=1.12LSB and differential non-linearity
DNLmax=0.72LSB were measured. The non-linearity is a result of the delay mismatch of
the DLL cells. Consequently, as shown in the graphs, it is found in taps corresponding to
the transitions between successive DLL delay cells. The measured DLL delay cell
mismatch is 3-4% (RMS), slightly larger than expected. It was shown on Chapter 6 that
the contribution of the DLL cell mismatch σDNLDLL to the converter non-linearity σINLconvert.
is determined by the by the following expression:
σ INL convert. = σ DNL DLL ⋅ M ⋅
N
.
2
Therefore, disregarding the contribution of the RC delay line, a maximum converter
non-linearity of 0.5LSB (50%) requires a DLL cell mismatch smaller than 3.1%. Since
this matching level has not been obtained, integral non-linearity of the interpolator is
larger than the goal of ±0.5LSB.
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
max
0
1
2
3
min
4
5
6
ave
7
8
9 10 11 12 13 14 15
binDLL
Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows.
The integral non-linearity graph of the DLL is shown in Figure 5. The spread of the
DLL tap delay along the eight hit register rows is also shown. The delay difference
between these eight samples of the DLL is due to the mismatch of the hit registers, which
leads to different sampling times for each tap in different rows. The maximum spread that
was observed is 0.06LSBDLL (~25ps), which corresponds to 0.51LSB. This result agrees
with what was previously obtained from the different samples of the RC delay line (see
Figure 2).
In Figure 6 the integral non-linearity graph of the interpolator is superimposed on
the one of the DLL. The interpolator closely follows the DLL non-linearity, as would be
expected since the non-linearity of the RC delay line can only accumulate along its limited
length.
Page 158
Chapter 12: Experimental Results.
binDLL
0
2
4
6
1
0.75
0.5
0.25
0
-0.25
-0.5
-0.75
-1
-1.25
8
10 12 14 16
0.125
0.09375
0.0625
0.03125
0
-0.03125
-0.0625
-0.09375
DLL -0.125
-0.15625
converter
0
16 32 48 64 80 96 112 128
bin
Figure 6: Comparison of the INL graphs of the DLL and of the complete converter.
A statistical test such as the code density test just described is, by its nature,
insensitive to random effects. This is an advantage when static characteristics are being
measured. However, it is important to verify that none of the random noise mechanisms,
such as electrical noise or phase noise (jitter), degrades significantly the dynamic
characteristics of the converter. The effects of clock correlated noise can be interpreted as
a static degradation mechanism, since they interact with the measurement the same way
every reference period. They are, therefore, captured by the statistical tests.
1200
1000
800
600
400
200
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
error (LSB)
Figure 7: Conversion error (σ=0.51LSB).
A linear time sweep covering the complete clock period was performed. During this
test 26,000 samples where collected, corresponding to 10 samples per step of ~2.4ps. The
histogram of Figure 7 represents the conversion error along the full dynamic range of the
interpolator. The distribution of the error has a RMS of σ=0.51LSB, with tails extending
to ~1.5LSB. The same test was performed at different temperatures, to prove that the
conversion error is not affected by temperature variations. The resulting histograms,
displayed in Figure 8, show that only minimal temperature sensitivity is found.
Temperature sensitivity of the RMS error is ~1.3% per 10oC.
Page 159
1200
1000
30C
800
60C
600
400
200
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
error (LSB)
Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and σ=0.52LSB/60oC).
It may be interesting to evaluate the dynamic performance of the DLL itself, to
understand its contribution to the overall conversion error. In Figure 9 the characteristic
step-wise transfer function that results from a DLL time sweep is shown. Phase noise
(jitter) present on the reference clock itself, or due to the dynamics of the DLL, force the
output code transitions to jitter around their average value. If a number of DLL samples is
taken close to the expected transition time, the output will vary between the two codes due
to jitter. Variations due to the test set-up, such as small changes of the sampling time
itself, are also included in this result, since they are indistinguishable from variations due
to intrinsic jitter.
This test enables the measurement of the DLL’s internal jitter. The time interval in
which the output code uncertainty occurs corresponds to the peak-peak jitter seen on that
transition. The maximum uncertainty is expected in the last transition. This can be verified
in the graphs of Figure 10 that show the two code transitions occurring in the opposite
extremes of the delay chain1.
16
14
12
10
8
6
4
2
0
14868
17522
20176
22830
25484
step
Figure 9: DLL linear time sweep.
1
Tap 0 was implemented in the end of the delay chain, therefore it is the tap with the worst jitter. For
convenience, bin 15 is renamed bin –1.
Page 160
Chapter 12: Experimental Results.
The second graph in that picture is a magnification of the transition from code –1 to
0, representing the jitter at tap 0. The “trend” line in that graph represents the relative
number of samples in the two consecutive codes. From this curve, the average transition
instant can be extracted and so the deviation of the transition occurrence (the jitter) is
readily obtained.
The peak-peak jitter for these two transitions was measured to be, respectively,
14.4ps and 19.2ps. To perform this measurements, 100 samples where taken for each time
step of 2.4ps (equivalent to 4 “trombone” steps). The maximum jitter is measured to be
σjitter
DLL).
The jitter that is observed in the first cell (σref !"$#&%')(*!+!,(*-!.-/
jitter of the reference clock as it arrives to the delay chain. In the end of the chain the
dynamics of the DLL increase the uncertainty of the transition time. Assuming,
(optimistically) that these two sources of jitter are uncorrelated, the jitter generated by the
activity of the DLL closed loop is σloop012 346572
0.5
1.5
data
1
trend
0
0.5
0
-0.5
-0.5
-1
-1
-1.5
2350
-1.5
2550
2750
2950
3150
2368
2376
2384
2392
step
2400
2408
2416
step
Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes of the delay chain.
The DLL conversion error histogram in Figure 11 is obtained from the same set of
data as the one in Figure 7. It shows that the conversion error of the DLL considered
independently has an RMS of σDLL=0.29LSBDLL, with very small tails.
1600
1400
1200
1000
800
600
400
200
0
-1
-0.5
0
0.5
1
error (LSBDLL)
Figure 11: DLL conversion error (σ=0.29LSBDLL).
Page 161
The conversion error stems from several contributions, which add up to the total
error. The main contributors to the conversion error are the quantising mechanism
( σ quant . = 1 12 LSB DLL ), the integral non-linearity (measured to be σINL=0.05LSBDLL)
and the reference clock jitter (σjitter=0.01LSBDLL).
σ DLL = σ quant . 2 + σ INL 2 + σ jitter 2 = 12 −1 + 0.05 2 + 0.012 = 0.29LSB DLL .
The measured RMS error (σDLL=0.29LSBDLL) is in accordance with the expected
value, demonstrating that no major error source was left unaccounted for.
12.2.
Lumped capacitor scheme.
The tests previously described were also applied for the channel using the RC delay
line implementing the lumped capacitor adjustment scheme. Only the relevant results that
highlight the differences between the two adjustment schemes will be discussed.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
1
2
3
4
5
binRC
6
7
8
1
2
3
4
5
6
7
8
binRC
Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor adjustment scheme).
The graphs in Figure 12 represent the results of a calibration run of the RC delay
line that uses the lumped capacitor adjustment scheme. The linearity obtained is well
within the limits set forth for calibration. The DNL graph shows that the predicted
matching characteristics of the line where not obtained. This also leads to a worse INL
than what was predicted in simulations.
The linearity graphs corresponding to measurements performed on the full converter
are shown in Figure 13 and Figure 14. The integral non-linearity of the converter closely
follows the, appropriately scaled, DLL non-linearity. This shows that the converter’s
characteristics are limited by the DLL, as was also seen in the previous scheme.
Page 162
Chapter 12: Experimental Results.
1.25
1
0.75
1
0.5
0.75
0.5
0.25
0.25
0
0
-0.25
-0.25
-0.5
-0.5
-0.75
-0.75
-1
-1
1
17
33
49
65
81
97
1
113
17
33
49
65
81
97
113
bin
bin
Figure 13: DNL and INL graphs of the converter (using the lumped capacitor adjustable delay line).
The differential non-linearity and the integral non-linearity are measured to be
0.70LSB and 1.03LSB, respectively. The main non-linearity errors are, again, found in the
taps corresponding to DLL delay cell transitions. Comparison of the INL graphs of the
DLL in Figure 6 and Figure 14 reveals the limitation of the DLL topology used. The
different behaviour of the DLL when use together with each channel is most likely due to
clock related noise coupling into the converter. The small non-linearity of the DLL is an
important fraction of the bin, at the level of the time interpolation implemented in this
converter.
binDLL
1
1.25
1
0.75
0.5
0.25
0
-0.25
-0.5
-0.75
-1
3
5
7
converter
1
9
11 13 15 17
DLL
0.15625
0.125
0.09375
0.0625
0.03125
0
-0.03125
-0.0625
-0.09375
-0.125
17 33 49 65 81 97 113 129
bin
Figure 14: Comparison of the INL graphs of the DLL and of the complete converter.
The conversion error was measured using the same set-up as before. These
measurements resulted in the histogram of Figure 15. A RMS error of 0.44LSB (~21.5ps)
is obtained, and the maximum observed error is smaller than 1.5LSB. The resolution
measured with this adjustment scheme is slightly better than with the previous scheme.
The improvement is a consequence of the better DLL linearity obtained.
Page 163
1200
1000
800
600
400
200
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
error (LSB)
Figure 15: Conversion error (σ=0.44LSB).
The DLL behaviour is expressed in the histogram of Figure 16. It confirms the
correct dynamic behaviour of the DLL.
1600
1400
1200
1000
800
600
400
200
0
-1
-0.5
0
0.5
1
error (LSBDLL)
Figure 16: DLL conversion error (σ=0.29LSBDLL).
12.3.
Conversion time offset.
In the measurements so far presented, the conversion time offset has not been
investigated. Conversion time offset is a characteristic that cannot be considered
independently from the extrinsic offsets generated by the acquisition circuitry before the
converter. Offset variation internal to the converter, due to temperature changes or any
other origin, can be measured and circuit techniques may be applied to reduce them.
However, more important offset variations will be present, namely on the sensor,
discrimination, signal shaping and driving circuitry. It is, therefore, important to
characterise all the acquisition chain together when environment conditions changes are
observed and absolute time measurements are to be acquired. This test is out of the scope
of these studies.
The maximal temperature dependency of the internal offset was measured using the
tap selection scheme. It is of ~124ps/10oC. This dependency is a consequence of variation
of the delay of the long internal hit signal’s path. In this circuit, no effort was made to
compensate this variation with a similar variation in the reference clock path.
Page 164
Chapter 12: Experimental Results.
12.4.
Power dissipation.
One important design goal of this circuit is to achieve reduced power dissipation per
channel. An overall power dissipation of 0.22W was measured, which compares
favourably with the results obtained with the ADLL architecture.
12.5.
Summary of results.
The main characteristics measured during the tests are summarised in Table 1. Some
important properties of a TDC, such as multi-hit capability were not characterised, since
no effort was made to optimise the prototype for them. Also crosstalk between channels
was not investigated because the two channels are very dissimilar and cannot be used
simultaneously. However, it is possible to extrapolate the results obtained with the ADLL
based prototype to be confident that multi-hit capability is easily obtained and that small
crosstalk is possible.
adjustment scheme
max
INL
σ
converter
max
DNL
σ
RMS res. (σ)
INL
DNL
R-C line
area
tap selection
1.12 LSB / 55 ps
0.44 LSB / 21 ps
0.72 LSB / 35 ps
0.18 LSB / 9 ps
0.51 LSB / 25 ps
0.15 LSB / 7 ps
0.21 LSB / 10 ps
lumped capacitor
1.04 LSB / 51 ps
0.39 LSB / 19 ps
0.68 LSB / 33 ps
0.22 LSB / 11 ps
0.44 LSB / 21 ps
0.21 LSB / 10 ps
0.30 LSB / 15 ps
2
0.85 mm
0.25 mm
common characteristics
LSB
48.8 ps
o
temperature sensitivity
1.3% / 10 C
power dissipation
0.22W
number of channels
2
µ
technology
0.7 m CMOS
2
area
10.7mm
package
68 pin JLCC
2
Table 1: Characteristics of the TDC prototype.
12.6.
Conclusions.
The experimental results obtained with this prototype demonstrate that it is possible
to build a low cost high-resolution integrated TDC using the proposed architecture. The
converter has lower power dissipation than what was measured with the ADLL-based
prototype previously described.
Page 165
It was shown that the performance of the converter is mainly limited by the DLL
characteristics, essentially its non-linearity. The DLL used in this circuit was built using
the same circuit blocks as in the ADLL TDC, which use single ended signalling levels. A
more linear DLL can, most likely, be obtained if a less noise sensitive, differential,
topology is used.
The two adjustable RC delay line schemes proved to work according to the design
goals, both in terms of calibration and of temperature sensitivity. Furthermore, the model
developed to study the delay line proved to be accurate, even with the limited
technological information available.
Page 166
References for Part III.
[1]
Gogaert, S. et al., A 10ps resolution 1.6ns tuning range CMOS delay line for clock
deskewing in data recovery systems, Proceedings of the ESSIRC’95, pp. 54-56, Sep.
95.
[2] Doernberg, J. et al., Full speed testing of A/D Converters, IEEE Journal of SolidState Circuits, Vol. 19, no. 6, pp. 820-827, Dec. 84.
[3] Bossche, M. V. et al., Dynamic testing and diagnostics of A/D converters, IEEE
Transactions on Circuits and Systems, Vol. 33, no. 8, pp. 775-785, Aug. 86.
[4] Tsividis, Y., Mixed Analog-Digital VLSI devices and technology – an introduction,
McGraw-Hill 1996, Chapter 5.
[5] Elmore, W., The transient response of damped linear networks with particular
regard to wideband amplifiers, Journal of Applied Physics, Vol. 19, pp. 55-63, Jan.
48.
[6] Dvorak, V., On the transient analysis of distributed RC networks, International
Journal of Electronics, Vol. 33, no. 4, pp. 385-391, 1972.
[7] Antinone, R. et al., The modeling of resistive interconnectors for integrated circuits,
IEEE Journal of Solid-State Circuits, Vol. 18, no. 2, pp. 200-203, Apr. 83.
[8] Sakurai, T., Approximation of wiring delay in MOSFET LSI, IEEE Journal of
Solid-State Circuits, Vol. 18, no. 4, pp. 418-426, Aug. 83.
[9] Rubinstein, J. et al., Signal delay in RC tree networks, IEEE Transactions on
Computer-Aided Design, Vol. 2, no. 3, Jul. 83.
[10] Lee, M., A multilevel parasitic interconnect capacitance modeling and extraction for
reliable VLSI on-chip clock delay evaluation, IEEE Journal of Solid-State Circuits,
Vol. 33, no. 4, pp. 657-661, Apr. 98.
[11] The HSPICE user’s manual, Meta-Software 1996.
Page 167
Page 168
PART IV.
CONCLUSION.
Page 169
Page 170
Chapter 13.
Summary of Results.
In this work we studied the problem of building integrated Time-to-Digital
Converters featuring very high resolutions. Our main goal was to demonstrate the ability
to perform these time measurements in a single, low cost, monolithic circuit produced in
standard commercial CMOS technologies. Stand-alone operation was envisaged, therefore
the selected architectures are able to perform self-calibration. Also the possibility of
including digital signal processing functionality in the same circuit was pursued.
Several architectures where analysed, of which one was selected for a more detailed
analyses that lead to the construction of a demonstrator IC. Furthermore, a novel highresolution time interpolation architecture was proposed and the analysis carried out
confirmed a good time resolution and low power operation.
13.1.
The ADLL architecture.
The study of a time interpolation technique using an array of phase shifted DLL’s
was pursued. In this study, we analysed:
• The origins of non-linearity in a DLL based converter. We have showed the
effects of delay cell mismatch and how it accumulates along the delay chain. We
have also highlighted the diverse causes of phase errors intrinsic to a DLL and
the effect of these errors along the delay chain. An additional source of nonlinearity was revealed, which has similar effects on the converter non-linearity
as a phase error. The source of this delay error was shown to be the different
propagation delays of the sampling signal towards the individual registers.
• The origins of phase noise in a DLL. We have analysed the effects of phase noise
due to the “bang-bang” operation of the closed control loop.
The results of the analysis carried out for the single DLL case was extended to the
case of an array of phase shifted DLL’s (the ADLL).
• The non-linearity of an ADLL based converter. We have developed an analytical
model of the ADLL that permits to establish the effects of independent delay
error sources in the overall converter non-linearity. The presence of the phase
shifting DLL is also accounted for in the model. We have highlighted the most
important modes of delay error accumulation, in particular showing that there is
Page 171
an intrinsic periodicity on the non-linearity curves (periodicity F and also F+1,
where F is the interpolation factor) due to the interpolation scheme.
• The optimal interpolation factor F. Based on the expected conversion integral
non-linearity due to delay cell mismatch, we have established a relation between
the mismatch level and the resolution of the converter. It shows that, depending
on the actual mismatching characteristics of the delay cells, the maximum
interpolation factor F that corresponds to a consequent increase in resolution is
limited to F=4 or 5.
Using the analysis tools developed, we where able to translate the performance
goals into circuit requirements. We then proposed simple ways of constraining the
individual blocks to the requirements by optimisation of the critical performance
parameters.
• Minimisation of the phase error. We have proposed solutions to reduce all the
phase error sources, including an alternative topology for the distribution of the
sampling signal to the individual registers. The need to use distributed parameter
techniques when studying signal distribution in the time critical circuitry is
highlighted.
• Minimisation of the delay cell mismatch. A method for reducing the delay
mismatch of a current-starved delay cell regardless of the operating conditions
was proposed.
• Noise sensitivity minimisation. The noise sensitivity of the scheme was analysed
and minimised using simple circuit layout rules.
A multi-channel high-resolution TDC based on these studies was built in a standard
0.7µm CMOS technology. It demonstrated the correctness of the conclusions of the
analysis. In particular an RMS resolution of 34.5ps was obtained throughout the full 3.2µs
dynamic range. This performance, which has been confirmed in several applications, is
obtained in an IC that also includes processing and buffering logic.
13.2.
The DLL & RC delay line architecture.
We have proposed a new interpolation technique for Time-to-Digital Converters.
The possibility of designing adjustable RC delay lines in a “digital” technology was
demonstrated and we have also showed how a self-calibrating scheme can be
implemented in the same circuit. In this study, we have analysed:
• Adjustment methods for RC delay lines. We have proposed two discrete
adjustable RC delay line schemes.
• The characteristics of RC delay lines. We have proposed a methodology to
partition such a delay line so that it complies with the timing and layout
requirements of a particular design. We give guidelines to the design of circuits
Page 172
Chapter 13: Summary of Results.
that interface with the delay line without increasing its sensitivity to variations
of the environment conditions.
• Calibration procedure. A Code Density Test based calibration scheme was
proposed. This simple scheme can be hardware implemented and integrated in
the converter IC. It requires a pulse generator uncorrelated with the reference
clock and a calibration logic block.
• Calibration algorithms. We have proposed several calibration algorithms. Their
advantages or disadvantages were discussed.
Based on these studies, and on the DLL building blocks developed for the ADLL
based TDC, we have built a TDC prototype. Two different channels, each implementing
one of the adjustable RC delay lines proposed, were included. Dividing these delay lines
in M=8 segments, an interpolation factor F=8 is obtained. The technology used is also a
0.7µm CMOS technology. Using the calibration algorithms that we have proposed, we
where able to calibrate the two delay lines, obtaining an INLmax better than 0.21LSB in
each of them. The RMS resolution of the converter was measured to be as low a 21ps.
We also have shown that the performance of the converter is very insensitive to variations
of the environment conditions. Furthermore, the use of passive RC delay lines to perform
time interpolation results in a low power operation, as was demonstrated with the
prototype.
13.3.
TDC characterisation.
We have developed a consistent methodology to characterise the timing
performance of a T/D converter. With this methodology we were able to evaluate the
static and the dynamic characteristics of the converter.
• Define a consistent set of performance metrics. These metrics, adapted from the
ADC world, are well matched to the TDC environment.
• Build a comprehensive test set-up. We have developed an automated test set-up
that is able to perform very linear time sweeps across an extended dynamic
range. This set-up is critical for the evaluation of the dynamic characteristics of
the converters that we have developed.
Page 173
Page 174
Chapter 14.
Future Developments.
The major goal of the work described in this dissertation was to demonstrate the
possibility of using standard digital CMOS technologies to build integrated, multichannel, time measurement systems with high resolution. Having established this
possibility, by means of two different successful architectures, a wide range of fully
integrated systems can be developed to match the specific requirements of the several
interested users within the High-Energy Physics community. Alternatively a single
“universal” system could be designed to fulfil all these separate requirements.
During this work, although only cursory attention was given to the actual
implementation of the system level functionality, its presence was always accounted for
and the architectures proposed are adapted to operate in that environment.
Two logical development paths may now be followed:
• Profit from short gate delays available in the new, sub-micron, technologies to
demonstrate the “ultimate” performance that can be extracted following the
architectures here presented (or any other having the same capabilities).
• Develop a general purpose T/D converter. This IC would cover the entire
resolution spectrum envisaged for the near future, from the “low” 250ps range,
to the “high” 25ps range. It should also allow for different buffering strategies
and also for intelligent data filtering.
Although the first path is scientifically stimulating and poses some interesting
design challenges, it’s the second path that results in a better engineering compromise
between single-minded performance and overall functional flexibility. It is also a more
“multi-discipline project”, requiring the convergence of multiple design techniques (full
custom / standard cell) and therefore including important challenges.
Such a converter as been envisaged and preliminary studies carried out. The
enabling architecture, the interpolator based on a DLL and on a RC delay line, was
developed and proven during this work. Most of the system level functionality has been
demonstrated elsewhere in the context of lower resolution converters.
In a conventional, DLL based, converter, all the channels integrated in the same IC
perform their time interpolation by sampling the status of a common DLL, as is
schematically described in Figure 1. To obtain a higher resolution TDC, using the scheme
Page 175
based on a DLL and an RC delay line that was proposed in this work, a number of equally
spaced samples of the status of the DLL must be stored. The scheme is also pictured in
Figure 1.
clkref
clkref
PD
PD
hit
RC delay line
hit<0>
hit<1>
hit<2>
hit<3>
Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC with four times smaller
LSB, using the same building blocks and an RC delay line.
A close look at this figure already gives a hint on how to obtain high resolution from
what is intrinsically a lower resolution converter (the DLL). By the simple addition of an
adjustable RC delay line (and the calibration hardware), it is possible to obtain a higher
resolution converter channel using for the effect a small number of lower resolution
conversion channels. By proper selection of the hit signal origin, a single IC can be used
as a high channel density, low resolution, T/D converter or as a low channel density, highresolution, T/D converter, depending on the user needs (see Figure 2).
clkref
PD
hit
hit<1>
hit<2>
hit<3>
RC delay line
hit<0>
Figure 2: The general purpose TDC architecture.
Timing information can be carried in one, or in the two edges of the hit signal. It
would therefore be convenient for the converter to be able to measure these two instants in
the same channel. This feature will be implemented in this converter.
Modern CMOS technologies, for example with a 0.25µm minimum feature size,
result in very small gate delays. It is, therefore, possible to build a very compact time
conversion block and integrate it with a large processing logic block.
It is envisaged to include a more complete buffering hierarchy. Each channel will
have a dedicated four measurement deep pipelined memory (to store two pairs of risingPage 176
Chapter 14: Future Developments.
falling edge measurements). The second level of hierarchy will group 8 channels (2 in
high-resolution mode) in a deeper FIFO memory. Each of these groups includes a separate
pre-processing logic block that performs encoding, coarse time selection, etc. The groups
are then multiplexed into a single data stream.
An optional, trigger based, data reduction processor will also be included. This
processor receives commands from a central processor used to identify time windows of
interest. Measurements occurring outside these time windows are deemed uninteresting
and, therefore, are filtered out of the data stream.
The function of the local data reduction processor is to compare the time
measurements acquired in each channel with the interesting time window, which is
identified by a “trigger” time-tag. Measurements that are accepted by this criterion are
stored in a common read-out FIFO memory.
PLL
mux.
PD
clkref
coarse counter
1 low resolution channel
hit<1>
hit<31:0>
hit<2>
hit<3>
mux.
channel buffer (4 words)
RC delay line
hit<0>
4 low resolution channels
(1 high resolution channel)
channel buffer (4 words)
channel buffer (4 words)
8 low resolution channels
(2 high resolution channel)
channel buffer (4 words)
x2
channel
arbitration
32 low resolution channels
(8 high resolution channel)
encoding & offset adjust
group buffer
(256 words)
calibrate
R-C delay
JTAG
trigger
matching
RC delay
calibration
(& hit oscillator)
JTAG
interface
(testing /
programming)
x4
super-group
buffer
(256 words)
trigger
interface
& control
trigger interface
read-out
interface
read-out interface
Figure 3: Block diagram of the general purpose TDC.
A simplified block diagram of the general purpose TDC is shown in Figure 3. A
clock multiplying PLL is included to generate the required reference period for the highresolution option.
The timing specification of this TDC is shown in the next table. Three resolution
levels can be obtained with the specified 40MHz reference clock, 224.5ps, 56.4ps and
14.2ps. These values correspond to the standard deviation of the quantisation error (σq) of
Page 177
an ideal converter. In reality other sources of time uncertainty will add up. They will
affect more the higher resolution options. The experience gained during this work allows
for a preliminary estimation of the RMS resolution to be (σTDC) ~226ps, ~61ps and ~25ps,
respectively.
ref. frequency
40
160 MHz
ref. period
25
6.25
ns
DLL LSB
781.3 195.3 ps
32 cells / DLL
RC line LSB
48.8
ps using 4 channels
dynamic range 102.4 102.4 µs
Table 1: Timing specification of the general purpose TDC.
Page 178
PART V.
APPENDIXES.
Page 179
Page 180
Appendix A.
TDC Characterisation Test Bench.
The evaluation of the high-resolution TDC prototypes produced during this work
required the development of a specific test bench. This test bench allows for the
measurement of several important timing characteristics of the converter:
• Conversion linearity (differential and integral).
• Conversion error, from a linear time sweep.
• Crosstalk between channels.
• Double hit resolution.
In particular, the linear delay generator used for the characterisation of the
conversion error required the development of an adequate instrument.
Given the fine time characteristics that this test bench is intended to measure,
especial attention was given to the integrity of the time critical signals. High performance
PECL logic is used wherever the reference or the hit signals are handled. Controlled
impedance (50Ω) micro-strips and cables are used to transport, or delay, these signals.
Conversion linearity.
The static characteristics of the converter (INL, DNL) are measured using a standard
Code Density Test (CDT) that has been extensively described in the literature (in the
context of ADC testing) [1],[2],[3]. Other methodologies have been used to characterise
converters, for example using Walsh Functions [4], but their complexity does not seem
required for the test of TDC’s, which typically require that only a limited number of bins
be characterised in great detail. The resulting characterisation includes some uncertainty,
which can be limited as discussed in Appendix D.
In a CDT, the device under test (DUT) collects a large number of hits generated
with a random time interval. Due to the randomness of the hit arrival time, they are
uniformly distributed along the dynamic range of the DUT. Therefore, if the conversion
result of each hit pulse is read-out and accumulated in an histogram whose bins
correspond to an LSB of the converter, the number of hits collected in each of the
histogram bins is proportional to the size of the actual converter bin. The DNL graph is
obtained directly from the test. The INL graph is derived from the cumulative histogram
Page 181
of the bin sizes, which is obtained by adding up consecutive bin sizes. Unfortunately, also
the uncertainty of the size of each bin is accumulated in this operation. Therefore, for the
same number of collected hits, the accuracy of the differential characterisation is greater
than the accuracy of the integral characterisation.
The CDT test requires a random pulse generator or, instead, a pulse generator which
frequency is selectable (the choice of the sampling frequency is done in accordance with
Appendix E) and a computer to collect and histogram the measurements obtained. In our
set-up we used a Hewlett-Packard 8012B pulse generator. Data is collected in a computer
that also controls the test bench.
Since this is a statistical test, no information is obtained on the dynamic
characteristics of the converter. Chiefly, random errors due to reference clock jitter or to
the dynamics of the DLL and random noise due to other activity within the circuit are
averaged out. In order to observe these effects, a linear time sweep is performed across a
significant segment of the dynamic range.
Conversion error.
The linear time sweep is performed with a very short delay step (more than an order
of magnitude shorter than the LSB of the converter under consideration), over a range of a
few reference clock cycles. This range is wide enough to characterise the fine time
interpolation scheme and also to verify that the dynamic range extension scheme does not
interfere with the interpolation performance.
Standard (active) delay generators do not have the linearity required to perform a
linear time sweep suitable for this application. Therefore a computer controlled passive
delay generator, using a step-motor driven coaxial phase shifter (also known as
“trombone”), was used. Although no direct measurement of the “trombone” linearity was
performed, the measurements obtained and the mechanics of the instrument give a high
degree of confidence in its linearity. In order to expand the small dynamic range of the
“trombone”, a selectable delay box was used. When the “trombone” reaches the end of its
dynamic range, it is rewinded to the initial position and a corresponding delay is
incremented in a delay box, by proper selection of the internal cable length.
The accuracy of this alignment procedure is a concern. Even a small difference
between the delay of the apparatus before and after the adjustment step will accumulate
into a sizeable error, after a few adjustment steps. To guarantee an adequate alignment of
the delay generator its delay is measured prior to adjustment, using an adjustment TDC,
and again after adjustment. The two measures are compared and a fine adjustment is
performed (changing the “trombone” delay), if required. The adjustment TDC does not
have to be linear, since the two measurements it has to perform are identical. However, it
must have a resolution better than the “trombone” delay step. Averaging many hits is an
easy way of achieving high resolution in commercial delay measurement instruments.
Page 182
Appendix A: TDC Characterisation Test Bench.
In Figure 1, a block diagram of the computer controlled linear delay generator is
shown, illustrating its connection to the device under test (DUT). Since the DUT is a time
stamp TDC, the hit signal was synchronised with the reference clock (clkref) before it
progresses through the trombone and the selectable cable delay box. The adjustment TDC
was mounted in parallel with the DUT, in such a way that in normal operation it does not
influence the test.
In our test bench, we used the Sage model 6709 coaxial phase shifter driven by a
computer controlled stepper motor, to obtain a minimum delay step of ~0.6ps in a
dynamic range of 2ns. The CAEN programmable delay box N-146A, which has a
minimum delay step of 0.5ns and a dynamic range of ~80ns, was used to extend the
dynamic range of the apparatus. The adjustment TDC used was the Stanford Research
SR620 universal time interval counter, which quoted resolution is ~2ps if 1000 hits are
averaged.
clkref
trombone
fine adjustment
hit signal
adjustment
TDC
DUT
selectable
cable
delay
coarse adjustment
adjustment control
(from computer)
Figure 1: The linear passive delay generator block diagram (computer controlled).
This apparatus is rather cumbersome and requires an external adjustment TDC.
Therefore a simpler, but more reliable method was developed to perform the delay
adjustment. If the two extremes of a delay line are connected to each other by means of an
inverting amplifier, the frequency of oscillation of the oscillator thus generated is given by
the following expression:
f =
1
,
2 ⋅ ( Dline + Adelay )
where Dline is the delay of the line and Adelay is the propagation delay of the amplifier that
closes the loop. Therefore it is possible to derive the delay of the line from the
measurement of the oscillation frequency (given the delay of the amplifier).
Page 183
As explained before, the absolute value of Dline is not necessary, since it is used only
for the comparison between the delay of the line before and after alignment. Therefore,
the only important property of the Adelay is its invariance and not its absolute value. A fast
PECL inverter guarantees this invariance (within acceptable limits).
The block diagram of this scheme is shown in Figure 2. When the delay of the delay
generator is to be measured, a set of relays is switched in such a way that the oscillator
loop is closed and the DUT is disconnected from the generator. The oscillation frequency
is measured before the adjustment step and again after it. If these frequencies are different,
the ‘trombone’ delay is again adjusted until the frequency agrees with the one measured
before the adjustment step.
A simple procedure to measure frequency is to count the number of oscillation
cycles completed in a given time interval. The bigger the time interval, the better is the
accuracy of the measurement. The oscillation period of a stable oscillator (or a multiple of
it) can be used to set the counting time interval. This simple delay generation scheme was
implemented in a 9U VME board that also includes all the alignment logic required.
It is not practical to extend this test to the full dynamic range of the converter, due to
its duration and to the possible accumulation of errors generated on the successive
alignment steps. Fortunately the verification of the correctness of the dynamic range
extension over its full dynamic range does not require the generation of small delay steps.
For this application, it is more convenient to perform a coarse time sweep with delay steps
of ~1ns. Since the requirements in terms of jitter and linearity of the hit signal are relaxed,
an active instrument can be used as a delay generator, resulting in a faster characterisation.
In our test bench, the Stanford Research model DG535 digital delay generator was used.
VME interface
oscillator
cycle counter
clkref
adjustment
control
trombone
fine adjustment
hit signal
DUT
selectable
cable
delay
coarse adjustment
Figure 2: The linear passive delay generator block diagram (automated).
Page 184
Appendix A: TDC Characterisation Test Bench.
Other characteristics of the converter, such as crosstalk, double hit resolution and
sensitivity to the activity on the digital circuitry can be evaluated with this test bench (they
are applicable only for the converter based on an array of DLL’s).
Crosstalk.
The characterisation of the crosstalk between channels was performed in accordance
with the following procedure:
A double delay sweep is generated using the Stanford Research model DG535
digital delay generator. One channel (the channel under test - CUT) is stimulated
independently from all the other channels in the circuit (the offending channels - OC). For
each delay step in the CUT, a delay sweep spanning three reference clock cycles is
simultaneously performed on all the OC. In this way, the worst correlation between the
simultaneous hits in the OC, a hit in the CUT and the phase of reference clock can be
found. The comparison between the peak error obtained using this procedure and the error
obtained for the same delay in the CUT, but with the OC inactive, gives a measure of the
worst case, maximum error due to crosstalk.
Double hit resolution.
Double hit resolution is measured using the Philips PM5786 pulse generator to
generate bursts of pulses. This pulse generator is able to generate pulses with a minimum
separation of ~8.5ns, corresponding to the maximum double hit resolution that can be
measured. The bursts are generated asynchronously to the reference clock so that any
correlation between the reference the clock and the activity in the channel buffer can be
identified.
Page 185
Page 186
Appendix B.
Analysis of the DLL Closed Loop Behaviour.
The control operation of a DLL is based on the integration of the phase error
resulting from the comparison of the phase of the periodic reference signal and of the
VCDL output. The negative feedback control loop adjusts the delay of the VCDL in order
to minimise the phase error.
The DLL configuration is a first order loop, therefore, if the sampling operation
inherent to the phase detector is ignored, a simple continuous time approximation can be
used to analyse its frequency response. This approximation can be used for loop
bandwidths a decade or more smaller than the operating frequency.
Following the naming conventions established in [5], we define output delay Do(s)
as the delay established by the VCDL and input delay Di(s) as the delay to which the
phase detector compares the output delay. These two quantities are related by the
following expression:
Do ( s ) = ( Di ( s ) − Do ( s ) ) ⋅
I CP ⋅ K VCDL
,
s ⋅ CF ⋅T
where ICP is the charge-pump current, KVCDL is the gain of the VCDL, CF is the loop filter
capacitance and T is the period of the reference signal. The average charge-pump current
is given by the fraction of the reference period in which the charge-pump is activated
(Di(s)-Do(s)/T) times its peak current (ICP)1. It is, therefore, proportional to the phase
(delay) error.
The closed loop response is then:
Do ( s )
=
Di ( s )
1
s
1+
wn
,
where wn is the loop bandwidth.
wn =
I CP ⋅ K VCDL
.
CF ⋅T
1
If the loop is built in a “bang-bang” configuration, using a two-state phase detector, the average chargepump current can be evaluated over a large number of reference periods.
Page 187
Since a first order loop is inherently stable, the only stability criteria of interest is to
avoid the influence of the higher order poles introduced by the delay around the sampled
feedback loop. In our application, the reference signal has a known and stable frequency,
therefore it doesn’t require a high tracking bandwidth. It is, therefore, interesting to reduce
the bandwidth of the loop by increasing the filter capacitor and decreasing the chargepump current and the gain of the VCDL. In this way the phase noise inherent to the
“bang-bang” loop operation can be minimised.
The nature of the loop, where a reference signal is propagating along a VCDL,
means that variations of the input signal’s phase will also propagate through the VCDL
and thus reduce the measurement accuracy. Therefore, although internal phase noise can
be minimised and the delay of the VCDL stabilised at one reference period T, the phase
noise carried by the reference signal must be eliminated at its origin, if the reduction of
the measurement accuracy is to be minimised.
Page 188
Appendix C.
Analysis of the Effects of Cell Delay Mismatch
on the Integral Non-Linearity of a DLL.
A DLL is a closed feedback control loop with a somewhat complex dynamic
behaviour. The object of this study is the static behaviour of the DLL that results from
averaging of the dynamics of the control loop over a long period. Without loss of
generality, we will assume an ideal control loop that is able to keep the delay along the
DLL stable and equal to one clock period T. The following analysis follows broadly the
method developed in [6] for resistor strings in flash ADC’s.
For the purpose of this analysis, we will focus only on random mismatch effects.
The delay of each cell in the DLL can be seen as an independent random variable with a
normal probability distribution (PDF) G of mean µ m = T N and variance σ 2m (N is the
number of cells that make up the DLL). The mean corresponds to the expected cell delay,
and the variance gives a measure of the spread of the actual delays around the mean.
In these conditions one can see the DLL as a delay chain whose delay at the origin
is D=0 and at the other extreme is D=T.
tap 0
tap 1
tap j
0
T/N
j·T/N
tap N-1
tap N
(N-1)·T/N
T
0≤j<N
Figure 1: Voltage controlled delay line with fixed length.
The delay Di of each cell is defined as random variable with a normal PDF
G T N , σ 2m . The delay from the origin to the output of cell j can be expressed as a
(
)
fraction of the total delay of the delay chain:
u j (X ,Y ) =
X
,
X +Y
Page 189
j
where X = ∑ Di and Y =
i =1
N
∑ Di .
i = j +1
Since Di have normal PDF’s, X and Y are also random variables with normal PDF:
(
Y: G (µ
)
, σ ) , with µ
X: G µ1 , σ12 , with µ1 = j ⋅ µ m and σ1 =
2
2
2
j ⋅ σm
= ( N − j ) ⋅ µ m and σ 2 = N − j ⋅ σ m ,
2
using the variable transformations:
u=
X
and v = X ,
X +Y
we have
g (u, v ) = f ( X (u, v ), Y (u, v )) ⋅ J ,
where |J|, the Jacobian of the function, is defined as:
∂X (u , v )
∂u
J =
∂X (u , v )
∂v
From Y = v ⋅
we get J = 0 ⋅
∂Y (u, v )
∂X (u, v ) ∂Y (u , v ) ∂X (u, v ) ∂Y (u, v )
∂u
=
⋅
−
⋅
.
∂Y (u, v )
∂u
∂v
∂v
∂u
∂v
1− u
and X = v
u
( X + Y )2
1− u
v
v
v
− 1⋅ 2 = − 2 = 2 =
u
X
u
u
u
and thus
2
(
X +Y)
g (u , v ) = f ( X , Y ) ⋅
.
X
Considering X and Y independent variables, their joint PDF is:
f ( X , Y ) = f ( X ) ⋅ f (Y )
X and Y have normal PDF’s,
f (X ) =
 ( X − µ 1 )2 
,
⋅ exp −
2


2
⋅
σ
2 ⋅ π ⋅ σ1
1


f (Y ) =
 ( X − µ 2 )2 
,
⋅ exp −
2

2 ⋅ σ 2 
2 ⋅ π ⋅ σ2

thus
Page 190
1
1
Appendix C: Analysis of the Effects of Cell Delay Mismatch on the Integral Non-Linearity of a DLL.
2 

 σ 22 ⋅ (v − µ1 )2 + σ12 ⋅  v ⋅ (1 − u ) − µ 2  

v
1
u

 .
g (u , v ) = 2 ⋅
⋅ exp −

2
2
u 2 ⋅ π ⋅ σ1 ⋅ σ 2
2 ⋅ σ1 ⋅ σ 2






The PDF for u is, by definition
∞
g (u ) = ∫ g (u, v ).dv ,
−∞
thus
g (u ) =

1
−
exp
⋅
 2 ⋅ σ2 ⋅ u 2
2 ⋅ π ⋅ σ1 ⋅ σ 2 ⋅ u 2
2

1

B 2  
 ⋅
⋅  C −

A


2

A
B  


⋅ ∫ v ⋅ exp −
⋅  v −  ⋅ dv
 2 ⋅ σ2 ⋅ u 2 
A  
2
−∞

∞
with
A = r ⋅ u 2 + (1 − u ) 2 ,
B = ( r ⋅ µ1 − µ 2 ) ⋅ u 2 + µ 2 ⋅ u ,
C = (r ⋅ µ12 + µ 2 ) ⋅ u 2
and
r = σ 22 σ12 .
If the substitution u = u 0 + u1 ( u 0 = j N ) is made, the equation is obtained:


N
1
1
1
 N
⋅ exp
⋅
⋅
⋅
3
2
u 0 ⋅ (1 − u 0 )
2 ⋅ π ⋅ Cm
 2 ⋅ C m 1 + (1 − u 0 ) ⋅ u 0
2

u12

u12

1 +
 u 0 ⋅ (1 − u 0 ) 
g u0 (u1 ) =
where C m =
σm
.
µm
Since u12 (u 0 ⋅ (1 − u 0 )) « 1, the following equations are obtained:
σ u0 = C m ⋅
g u0 (u1 ) =
g u0 (u ) =
u 0 ⋅ (1 − u 0 )
N
1
2 ⋅ π ⋅ σ u0
1
2 ⋅ π ⋅ σ u0
 u2
⋅ exp 1 2
 2 ⋅ σu
0


⇔


 (u − u )2 
0

⋅ exp
 2 ⋅ σ u2 
0


Page 191






µ u0
Thus, u (the delay division ratio) has a normal probability density with average
= u 0 and a standard deviation σ u0 . The standard deviation of the integral error is
obtained if σ u0 is normalised to the (average) cell delay:
σ DLL =
σ u0 ⋅ ( N ⋅ µ m )
σ DLL = C m ⋅
µm
= N ⋅ σ u0 ⇔
j ⋅ (N − j )
N
The maximum standard deviation of the integral error is found in the middle of the
delay chain, with a value σDLL(max) of:
σ DLL (max) = C m ⋅
N
2
which compares favourably with the maximum standard deviation of the integral error in
an open (not enclosed in a control loop) delay chain σDC(max), found in the end of the
delay chain:
σ DC (max) = C m ⋅ N
Therefore, the inclusion of a delay line inside a closed control loop such as the DLL
improves the standard deviation of the integral linearity error by a factor of two.
Page 192
Appendix D.
Number of Random Samples Required for
TDC Characterisation.
A hit arriving at a time interpolator at a random time has equal probability p of
being collected by each of the bins into which the reference period is divided (assuming
identical bins). This probability is a function of the total number of subdivisions (Nbins),
given by p = 1 N bins .
To estimate the size of a given bin, an experiment can be devised where random hits
are generated (trials). The possible outcomes of a trial are success, if a hit is collected in
the bin, or failure, if not. After a large number of trials have been executed, the ratio of the
number of successes over the number of trials is a direct measure of the bin size.
The accuracy of the estimation is, of course, related to the number of trials. It is
therefore, important to know what is the minimum number of trials that should be
executed to obtain the required accuracy.
The experiment just described has the following properties:
• It consists of a number (n) of repeated trials.
• Each trial has an outcome that may be classified as a success or as a failure.
• The probability of success remains (p) constant from trial to trial.
• The repeated trials are independent.
It therefore classifies as a set of n Bernoulli Trials and, therefore, the number of
successes has a Binomial probability distribution with mean µ = n ⋅ p and variance
σ 2 = n ⋅ p ⋅ (1 − p ) . It is known that the distribution of a Binomial random variable can be
approximated by the normal distribution, having the same mean and variance, if the
number of trials is large. In a normal distribution, the probability that a random variable X
will assume a value that deviates from its average µ less than zα/2·σ is 1-α:
P(µ − z α / 2 ⋅ σ ≤ X ≤ µ + z α / 2 ⋅ σ ) = 1 − α .
The variable zα/2 is the standard normal distribution z-value that is the limit of an
area under the (standard) normal curve of α/2 (see Figure 1 for clarification of these
definitions). It can be obtained from any table of areas under the normal distribution curve
(for example [7]).
Page 193
1-α
α/2
-zα/2
0
n(z;µ=0,σ=1)
α/2
zα/2
z
Figure 1: P(-zα/2 < Z < zα/2) = 1-α.
The result of the experiment, x successes representing the measured size of the bin,
is a sample of a normal random variable X with mean µ and variance σ2. From the
previous probability limit it is, therefore, possible to conclude that the bin size lies within
its true value µ with a tolerance of zα/2 standard deviations (σ), with a 100.(1-α) percent
confidence. If the accepted tolerance to which the bin size is to be determined is set to β.µ
and µ and σ are substituted for their actual values, we get the following expression for the
number of trials needed n:
2
z
 1 
z α / 2 ⋅ σ ≤ β ⋅ µ ⇔ z α / 2 ⋅ n ⋅ p ⋅ (1 − p ) ≤ β ⋅ n ⋅ p ⇔ n ≥  α / 2  ⋅  − 1 .
 β  p 
The probability p is defined as 1/Nbins. Therefore the number of hits required to
obtain the bin size with a tolerance 100·β% and a confidence 100·(1-α)% in the
measurement is
2
z 
n ≥  α / 2  ⋅ ( N bins − 1) .
 β 
With the same set of hits, a similar estimation of the size of each bin can be
obtained. Therefore the DNL characteristics of the line are obtained.
In principle, the INL characteristics of the line are directly obtained by cumulating
the DNL histogram. It should be noticed that while performing this operation, the
uncertainty of the results (described by the variance) must also be added. For an open
ended line, the worst variance is measured in the last bin to be:
σ c = N bins ⋅ σ .
The number of samples needed to obtain the INL characteristics with the same
tolerance and confidence level must then be increased to
2
z

nc = N bins ⋅ n =  α / 2  ⋅ ( N bins − 1) ⋅ N bins .
 β 
Conversely, for the same number of samples, the tolerance of the INL estimation is
β c = N bins ⋅ β .
Page 194
Appendix D: Number of Random Samples Required for TDC Characterisation.
If an enclosed line, for example within the DLL closed loop, is considered, then the
worst variance is measured in the middle bin to be:
σc =
N bins
2
⋅σ.
The number of samples needed to obtain the INL characteristics with the same
tolerance and confidence level must then be increased to
2
N
N
z

nc = bins ⋅ n =  α / 2  ⋅ ( N bins − 1) ⋅ bins .
4
4
 β 
Conversely, for the same number of samples, the tolerance of the INL estimation is
βc =
N bins
2
⋅β .
Page 195
Page 196
Appendix E.
TDC Characterisation Hit Frequency.
Interpolator characterisation requires that the reference clock period be sampled at
random times. However, sampling at random, by its strict definition, would be impossible.
What must be done is to guarantee that the reference clock frequency is not sampled
repeatedly at the same phase (beating effect). By choosing a sample frequency to be nonharmonically related to the clock frequency, we are assured of this [8]. Therefore, when a
sufficient number of equidistant samples has been acquired, a uniform distribution of the
samples along the clock period is obtained.
The sample frequency must, of course, be stable in order to guarantee that it doesn’t
wander into a beating frequency during the characterisation procedure. Fortunately, very
accurate and stable oscillators are common. They can be used directly or as a reference for
a clock multiplying PLL, enabling the generation of basically any frequency ratios. It is,
for example, possible to generate the sample frequency from the clock frequency, thus
guaranteeing correct characterisation regardless of the actual clock present.
Any jitter present in the sampling frequency will only contribute to further
randomise the sampling time, which benefices the characterisation. In this context, the
requirements for a PLL can be quite relaxed.
The relation between the sample period Tsample and the clock period Tclk may be
generally described by the following equation, where A and B are integers that have no
common divider1:
Tsample =
A
⋅ Tclk .
B
This relation merits a closer look to identify aids to the choice of the sample
frequency. If we expand A, by letting


1 
A =   C +  ⋅ B ± D  ⋅ S ,
M


then the previous equation can be expanded to:
1
It is commonly found in the literature that the integers A and B should be prime numbers [8]. However this
is only a sufficient condition to generate a non-beating frequency, corresponding to a sub-set of the possible
integer ratios that satisfy the absence of beating effect requirement.
Page 197
1 D

Tsample =  C +
±  ⋅ S ⋅ Tclk
M B

The constants on this equation are all related to identifiable characteristics of the
sampling frequency:
B is the number of intervals into which the sampling divides clock period. It should
be large enough so that the sampling coverage is compatible with the expected
characterisation accuracy.
S reflects the possible existence of sub-sampling, where only every nth. sample out
of the ones generated is collected. It is now clear that the sub-sampling rate cannot be
chosen randomly, because the definition of the constant A restricts it. If S and B have
common dividers, then the effective number of intervals B’ is reduced to B divided by
them.
C is the number of integer Tclk periods contained in Tsample (or one more if
1 M ± D B < 0 ). This constant must also abide to the rules of the definition of A.
M gives a measure of the spread between consecutive samples (normalised to Tclk).
The actual sample spread is (1 M ± D B ) ⋅ S ⋅ Tclk . The constant M should be the same as
the number of sub-divisions (bins) of the interpolator being characterised. In this way, a
more uniform sample distribution is obtained along the time that the test is being
performed. Since it is included in the definition of constant A, it must also obey to the
required restrictions.
D is a small perturbation that actually defines the constant A. There is no real
restriction to this constant, except for the rules defining A, but it should be made smaller
than B M , to keep these definitions coherent.
Typically, when determining the sampling frequency, B, S, C and M are defined by
system requirements, and then D is determined so that the resulting A B don’t have any
common dividers. The existence of common dividers between these two constants results
in a decreased effective number of intervals B’.
The clock multiplying system required to perform these operations is graphically
described in Figure 1. The critical operation is the clock multiplication (by B) on the
return path of the PLL control loop. The delay introduced by this operation influences the
stability of the closed control loop and, therefore, should be carefully analysed and
minimised.
It is interesting to note that, in a noiseless system, after collecting B samples
generated this way, the interpolator is completely described, with a measurement
tolerance of ±1/(2.B).100% of the clock period. In the presence of inevitable noise, it is
safer to assume a random uniform sample distribution and collect the conservative
number of samples determined in Appendix D.
Page 198
Appendix E: TDC Characterisation Hit Frequency.
PLL
Fclk
1
B


 C⋅ B+ ± D  ⋅ S
M


PD
LPF
VCO
Fsample
B
Figure 1: The clock multiplying PLL.
Page 199
Page 200
Appendix F.
Analysis of the Limits to the TDC Resolution
(Alternative Tap Definition).
This Appendix completes the Chapter 6. It contemplates the case where the tap 0 of
each of the Timing DLLs is located in the end of the delay chain (Figure 1). Since the
delay chain spans exactly one clock period, this alternative definition doesn’t change the
performance of the converter. However the shape of the non-linearity histograms is
altered, so we present here the corresponding non-linearity histogram expression for a
single DLL (F=1) and for an ADLL.
Clock
d0
d1
d2
dN-2
τ2
dN-1
τ1
Tap 1
Tap 2
D
Hit
Tap N-2
D
τhit
Tap N-1
D
τhit
τhit
F(D1,D2)
D1
Phase
Detector
Tap 0
D
D2
D
τhit
Figure 1: Detail of a delay locked loop depicting the important delays within the loop (notice the alternative
location of tap 0).
The alternative timing and phase shifting variables m, n and n’ as a function of the
bin position i ( 0 ≤ i < F ⋅ N ) are defined as:
m = Mod (i + 1, F ) ,

 i + 1 
n = Mod m − Floor
, N  ,
 F  



 i + 1
n′ = Mod Floor
 − m, N  .
 F 


The following expressions reflect, respectively, the standard deviation of the
integral non-linearity error due to cell delay mismatch and loop jitter:
Page 201
2
n
 F + 1 m
σ array (i ) = F ⋅ σ cell ⋅ 
 ⋅ ⋅ (M − m ) + ⋅ ( N − n ) .
N
 F  M
2
2
m n
σ array (i ) = σ j ⋅ F ⋅   +   .
M  N
The integral non-linearity due to combined effect of all static errors is given by the
following expressions, respectively for the case where the hit sampling signal is
distributed via a linear network or via the T-shaped network.
 m n
 m F +1 n 
INLarray (i ) = Din ⋅ F ⋅  ⋅
+ +
+  − DPD ⋅ F ⋅  −
F
N
 M N
M
.
 m F + 1 n′ 
− Dout ⋅ F ⋅  ⋅
+  − Dhit ⋅ F ⋅ n
F
N
M
 m F +1 n 
 m n
INLarray (i ) = Din ⋅ F ⋅  ⋅
+  − DPD ⋅ F ⋅  −
+ +
N
F
M
 M N
,
N
N 
 m F + 1 n′ 
− Dout ⋅ F ⋅  ⋅
+  − Dhit ⋅ F ⋅  − n −

F
N
2 
M
2
where, as before, the following variable transformations are used:
Din = δ in , D PD =
Page 202
C
+ τ diff , Dout = δ out and Dhit = − τ hit .
K
Appendix G.
DNL-aware Algorithms for the RC Delay Line
Calibration.
The calibration algorithms so far exposed used integral non-linearity as the only
criteria for judging the correctness of the calibration results. If differential non-linearity is
also to be used, more complex calibration algorithms are needed. Since these algorithms
try to optimise two variables simultaneously, their convergence may be hazardous when
the two goals require contradictory directions. The logic controlling the execution of the
algorithm must be able to decide which goal is more important and pursue the calibration
taking in account that decision.
In the following lines the algorithms previously described are modified so that they
can also set limits to differential non-linearity.
Tap selection adjustment scheme.
Iterative algorithm.
The analysis of the linearity of a bin is based on the bin histogram h[bin]. A
cumulative histogram ch[bin] is built from it and both are compared to the ideal
histograms (developed from the knowledge of the ideal converter’s bin size LSB). The
following operations check if the line conforms to the differential and integral linearity
limits and takes corrective measures for the offending bins.
for i= 0 to M-1
tap[i]= segment_from_simulation_of_typical_conditions;
for bin= 0 to M-2
repeat until no_changes
Characterisation step;
if ( ch[bin]< LSB.( bin+1-limINL) & h[bin]< LSB·( 1+limDNL) |
| h[bin]< LSB·( 1-limDNL) )
for i= 0 to M-bin-2
Page 203
tap[bin+i+1]= tap[bin+i+1]+1;
else
if ( ch[bin]> LSB.( bin+1+limINL) & h[bin]> LSB·( 1-limDNL) |
| h[bin]> LSB·( 1+limDNL) )
for i= 0 to M-bin-2
tap[bin+i+1]= tap[bin+i+1]-1;
else
no_changes
In Figure 1 the algorithm is clarified. The acceptable limits of the integral and
differential non-linearity are, respectively, limINL and limDNL. These linearity limits must be
chosen in accordance to the size of the calibration steps. The access point selection for
each tap is captured in tap[i].
tap[all]=typical conditions
for bin=0..M-2
CDT
histogram[bin]
cumulative
histogram[bin]
repeat until changes=0
Y
(bin+1-limINL).LSB
(1+limDNL).LSB
(1-limDNL).LSB
(bin+1+limINL).LSB
<
N
Y
<
N
Y
<
for i=0..M-bin-2
tap[bin+i+1]=
tap[bin+i+1]+1
changes=1
N
Y
<
N
for i=0..M-bin-2
tap[bin+i+1]=
tap[bin+i+1]-1
Figure 1: Calibration procedure for the tap selection adjustment scheme.
The accepted limits to integral and differential non-linearity do not have to be the
same. Setting different limINL and limDNL, it is a simple way to force the algorithm to give
priority to one of the goals pursued.
Page 204
Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration.
Single step algorithm.
This algorithm finds the tap access points that result in the nearest approximation to
the ideal cumulative bin size curve. It also checks that the specified limit to the differential
non-linearity, limDNL, is not surpassed.
tap[0]=0 ;
for i=1 to M-1
for segment=0 to 31
if (ch[segment]< LSB·i & ch[segment+1]> LSB·i)
if (LSB·i-ch[segment]< ch[segment+1]-LSB·i &
& 1-limDNL< ch[segment]-ch[tap[i-1]]< 1+limDNL)
tap[i]=segment ;
else
tap[i]=segment+1 ;
Lumped capacitor adjustment scheme.
Coarse tuning procedure.
In this procedure the capacity of all the banks is simultaneously incremented by one
unit capacitor, resulting in a uniform increase of the delay of all taps. The procedure is
repeated until the cumulative bin size is smaller than the ideal delay by less than a
determined limit limcoarse. In the following lines the procedure is schematically described:
for bank= 1 to M
cap[bank]= 0;
repeat until ( ch[M-2]= LSB·( M-1-limcoarse ) )
Characterisation step;
for bank= 1 to M
cap[bank]= cap[bank]+1;
The calibration parameters for each capacitor bank are described by cap[bank] and
ch[M-1] is the cumulative bin size histogram. A block diagram of the procedure is shown
in Figure 2, where the Characterisation step is represented by the Code Density Test it
performs.
Page 205
initial calibration
repeat until changes=0
CDT
cumulative
histogram[M-2]
for bank=1..M
Y
(M-1-limcoarse).LSB
<
N
cap[bank]= cap[bank]+1
changes= 1
Figure 2: The coarse calibration procedure.
Fine tuning procedure.
The fine tuning procedure builds on the results obtained with the coarse procedure.
Each bin is sequentially evaluated to determine if it adheres to the linearity limits. If that is
not the case, the capacity of the respective capacitor bank is increased by one unit. This
unit increase is repeated until a satisfactory result is obtained.
The fine calibration algorithm is schematically presented in the next few lines. The
bin size histogram is h[bin] and limDNL and limINL are the differential and integral linearity
limits.
for bin= 0 to M-2
bank= bin+1;
repeat until ( no_changes | bank> M )
Characterisation step;
if( ch[bin] < LSB·( bin+1-limINL ) & h[bin]< LSB·( 1+limDNL ) |
| h[bin]< LSB·( 1-limDNL ) )
cap[bank]= cap[bank]+1;
bank= bank+1;
else
no_changes
The algorithm approaches the final calibration solution by small increases in the bin
size, therefore only the inferior limits to the linearity need to be checked. In this version of
the algorithm, a second loop (shown bellow) can be used to perform a final adjustment to
the calibration settings. This loop may be required in case the pursuit of one linearity
parameter goal forces the RC delay line to surpass the superior limit of the other linearity
Page 206
Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration.
parameter. Since the bin size increase/decrease per fine characterisation step is very small,
this situation only occurs if the linearity limits are too narrow.
for bin= M-2 to 0
bank= bin+1;
repeat until ( no_changes | bank< 1 )
Characterisation step;
if( ch[bin] > LSB·( bin+1-limINL ) & h[bin]> LSB·( 1-limDNL ) |
| h[bin]> LSB·( 1+limDNL ) )
cap[bank]= cap[bank]-1;
bank= bank-1;
else
no_changes
In Figure 3 and Figure 4, the diagrams of the two fine calibration algorithm loops
are shown.
from coarse calibration
for bin=0..M-2
bank= bin+1
repeat until changes=0 | bank>M
CDT
histogram[bin]
cumulative
histogram[bin]
Y
(bin+1-limINL).LSB
(1+limDNL).LSB
.
(1-limDNL) LSB
<
N
Y
<
N
Y
cap[bank]= cap[bank]+1
bank= bank+1
<
N
changes= 1
Figure 3: The fine calibration procedure (first loop).
Page 207
from fine calibration (1st. Loop)
for bin=M-2..0
bank= bin+1
repeat until changes=0 | bank<1
CDT
histogram[bin]
cumulative
histogram[bin]
Y
(bin+1+limINL).LSB
(1-limDNL).LSB
(1+limDNL).LSB
>
N
Y
>
N
Y
cap[bank]= cap[bank]-1
bank= bank-1
>
N
Figure 4: The fine calibration procedure (second loop).
Page 208
changes= 1
References for the Appendixes.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84.
Ginetti, B. et al., Reliability of code density test for high-resolution ADCs,
Electronics Letters, Vol. 27, No. 24, pp. 2231-2233, Nov. 91.
Bossche, M. V., et al., Dynamic testing and diagnostics of A/D converters, IEEE
Transactions on Circuits and Systems, Vol. 33, No. 8, pp. 775-785, Aug. 86.
Brandolini, A. et al., Testing Methodologies for analogue-to-digital converters,
IEEE Transactions on Instrumentation and Measurement, Vol. 41, No. 5, pp. 595603, Oct. 92.
Maneatis, J. G., Low-jitter process-independent DLL and PLL based on self-biased
techniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732,
Nov. 96.
Kuboki, S. et al., Nonlinearity analysis of resistor string A/D converters, IEEE
Transactions on Circuits and Systems, Vol. 29, No. 6, pp. 383-390, Jun. 82.
Walpole, R. E. et al., Probability and statistics for engineers and scientists - fifth
edition, MacMillan Publishing Company, 93.
Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84.
Page 209
Page 210