Mota`s Ph.D. Thesis - Paulo Moreira - Home Page
Transcription
Mota`s Ph.D. Thesis - Paulo Moreira - Home Page
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Design and Characterization of CMOS High-Resolution Time-to-Digital Converters Manuel José dos Reis Gaspar Seabra Mota (Licenciado) Dissertação para a obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores Orientador: Doutor José de Albuquerque Epifânio da Franca Presidente: Reitor da Universidade Técnica de Lisboa Vogais: Doutor Dinis Gomes Magalhães dos Santos Doutor Moisés Simões Piedade Doutor José de Albuquerque Epifânio da Franca Doutor Diamantino Rui da Silva Freitas Doutor António Manuel da Cruz Serra Doutor João Paulo Calado Cordeiro Vital Doutor Alessandro Marchioro Outubro de 2000 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Projecto e Caracterização Experimental de Circuitos Integrados CMOS para Medição de Intervalos de Tempo com Alta Resolução Manuel José dos Reis Gaspar Seabra Mota (Licenciado) Dissertação para a obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores Orientador: Doutor José de Albuquerque Epifânio da Franca Presidente: Reitor da Universidade Técnica de Lisboa Vogais: Doutor Dinis Gomes Magalhães dos Santos Doutor Moisés Simões Piedade Doutor José de Albuquerque Epifânio da Franca Doutor Diamantino Rui da Silva Freitas Doutor António Manuel da Cruz Serra Doutor João Paulo Calado Cordeiro Vital Doutor Alessandro Marchioro Outubro de 2000 Abstract The subject of this thesis is the development and evaluation of high-resolution Time-to-Digital Converter architectures suitable for the measurement of very short time intervals in the context of the Time-of-Flight detector of the ALICE experiment. The selected architectures are able to measure time intervals with a Root Mean Square (RMS) resolution better than 50ps and a large dynamic range. Apart from the timing characteristics of such TDC’s, their architectures enable the design of highly integrated multi-channel converter ASIC’s operating with low power dissipation. The developed circuits are based on Delay Locked Loop (DLL) architectures. The feedback control loop of the DLL ensures that the time measurements are permanently calibrated in relation to a reference periodic signal. Schemes to obtain fine time interpolation without penalty in terms of added power dissipation or increased sensitivity to environmental changes (supply voltage or temperature) are investigated and implemented. Two different approaches are selected and their detailed analysis carried out. One uses several phase shifted DLL’s and the other a passive RC delay line. The prototypes that implement these schemes were built in a standard 0.7µm CMOS technology. In the first approach, an RMS resolution of 34.5ps across a dynamic range of 3.2µs was measured. For the second, an RMS resolution of 21ps was obtained. Keywords Time-to-Digital Converter (TDC), Delay Locked Loop (DLL), self-calibration, high-resolution, multi-channel, passive RC delay lines. Page i Page ii Resumo O objectivo desta tese é a avaliação e desenvolvimento de arquitecturas de Conversão Tempo para Digital com alta resolução temporal adequados à medição de intervalos de tempo muito curtos, no âmbito do detector de Tempo de Voo da experiência ALICE. As arquitecturas seleccionadas são capazes de medir intervalos de tempo com uma resolução melhor do que 50ps (Desvio Quadrático Médio - RMS) ao longo de uma larga gama dinâmica. Além das características temporais destes conversores, as suas arquitecturas permitem a implementação de circuitos integrados específicos multi-canal, operando com baixa dissipação de potência. Os circuitos desenvolvidos são baseados em Malhas de Aquisição de Atraso (DLL) fechadas. A realimentação negativa da DLL garante que as medições temporais estão permanentemente calibradas tendo como referência um sinal periódico. Foram investigados e implementados esquemas que permitem uma interpolação temporal muito fina sem aumentar significativamente a dissipação de potência ou a sensibilidade do esquema à variação das condições ambientais (tensão de alimentação ou temperatura de operação). Dois destes esquemas foram seleccionados e a sua análise detalhada levada a cabo. Um dos esquemas usa várias DLL’s com um atraso de fase fixo e o outro utiliza uma linha de atraso passiva RC. Os protótipos em que foram implementados estes esquemas utilizam uma tecnologia CMOS de 0.7µm. Com estes protótipos obtiveram-se, respectivamente, resoluções de 34.5ps (RMS) ao longo de uma gama dinâmica de 3.2µs e de 21ps (RMS). Palavras Chave Conversor Tempo para Digital (TDC), Malha de Controlo de Atraso (DLL), autocalibração, alta resolução, multi-canal, linhas de atraso passivas RC. Page iii Page iv Acknowledgements It goes without saying that I am indebted to all the people whose contribution, small and large, made my work and my life easier during the period that I spent working for this thesis; the list of their names would be too long to write down. However, I wish to acknowledge in particular the help of my colleagues Jorgen Christiansen and Paulo Moreira who had the kindness and patience to answer all my questions and whose guidance and experience helped me to advance this work in the best direction. I will also acknowledge the help of my supervisor, José Epifânio da Franca who was always attentive to my requirements, even the most pressing ones. I thank Gaspar Barreira and Paulo Gomes who started it all and Alessandro Marchioro and Mike Letheren who welcomed me into the microelectronics group at CERN and provided me with the proper means and environment to proceed with my work. An acknowledgement is also due to JNICT, whose support made it all possible1 and to LIP, where the brave new world of microelectronics and High Energy Physics was first shown to me. Since life is not only work, even when that work is exciting, I greet cheerfully the friends I met in Geneva, whose warmth and imagination made life abroad very interesting. A final word is reserved to my family and friends back in Portugal who always found the right way to let me know they cared, even after being away for so much time. 1 The author is supported by a grant from the Junta Nacional de Investigação Científica e Tecnológica (JNICT) under the “Sub-Programa Ciência e Tecnologia do 2o. Quadro Comunitario de Apoio”. Page v Page vi Contents. PART I. Introduction. 1 1. Introduction and Structure of this Work. 3 2. Time Interval Measurements in HEP Experiments – An Introduction. 9 2.1. High Energy Physics experiments. 2.1.1. A HEP experiment at CERN: ALICE. 2.2. High resolution time interval measurements in ALICE. 3. Conversion Basics. 9 10 13 17 3.1. Performance metrics. 18 3.2. Error sources. 21 3.3. Converter calibration. 24 4. Review of TDC Architectures. 27 4.1. Overview of TDC architectures. 27 4.1.1. Current integration techniques. 27 4.1.2. Counter techniques. 29 4.1.3. Delay line-based techniques. 30 4.1.4. Phase Locked Loop (PLL) techniques. 31 4.1.5. Delay Locked Loop (DLL) techniques. 32 4.2. Beyond the limits of the technology: techniques to improve resolution. 33 4.2.1. Analogue time expansion. 33 4.2.2. Vernier differences. 35 4.2.3. Analogue time interpolation. 38 4.2.4. Array of coupled oscillators. 40 4.2.5. Array of Delay Locked Loops. 41 4.2.6. Time interpolation using passive RC delay lines. 43 4.3. Summary of characteristics of the TDC architectures. 44 References for Part I. PART II. A TDC Architecture based on an Array of Delay Locked Loops. 5. Architecture Overview. 45 49 53 5.1. The Delay Locked Loop (DLL). 53 5.2. The Array of DLL’s (ADLL). 55 5.3. Conversion dynamic range. 57 5.4. Time critical paths. 59 5.5. Measurement acquisition and storage. 59 5.6. Read-out architecture. 60 5.7. The prototype. 62 Page vii 5.7.1. Performance analysis. 6. Analysis of the Limits to the TDC Resolution. 6.1. Non-linearity due to cell mismatch. 63 65 65 6.1.1. Origins of mismatch. 65 6.1.2. Effects of cell delay mismatch. 66 6.2. Jitter due to internal phase noise. 68 6.3. Non-linearity due to static phase error. 69 6.3.1. Effects of phase detector’s phase error. 70 6.3.2. Effects of phase detector input path’s mismatch. 72 6.3.3. Effects of unbalanced conditions of the cells in the extremes of the delay chain. 72 6.3.4. Effects of propagation delay on the sampling signal path. 74 6.3.5. Overall non-linearity due to static phase error. 76 7. Detailed Implementation. 7.1. DLL building blocks. 79 79 7.1.1. Phase detector. 79 7.1.2. Charge-pump and loop filter. 82 7.1.3. Delay cell. 86 7.1.4. Delay chain. 92 7.1.5. Closed control loop. 93 7.1.6. Initialisation procedure. 94 7.2. The ADLL. 95 7.3. Channel memory. 96 7.3.1. The store sampling signal distribution. 8. Experimental Results. 99 101 8.1. Delay cell range selection and charge-pump current level. 101 8.2. Converter linearity. 102 8.3. Linear time sweeps. 106 8.4. Inter-channel crosstalk. 107 8.5. Double hit resolution. 108 8.6. Power dissipation. 108 8.7. Summary of results. 108 8.8. Conclusion. 109 References for Part II. 111 PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line. 9. Architecture Overview. 113 117 9.1. Time interpolation circuit. 118 9.2. Adjustable RC delay line. 119 Page viii 9.2.1. Adjustable delay line by tap selection. 120 9.2.2. Adjustable delay line by lumped capacitor selection. 121 9.3. Auto calibration. 122 9.4. The prototype. 122 9.4.1. Choice of technology. 122 9.4.2. Prototype characteristics. 123 9.4.3. Performance analysis. 125 10. Adjustable RC Delay Line using a Tap Selection Scheme. 10.1. RC delay line. 10.1.1. RC delay line simulation model. 10.2. Tap selection delay line. 10.2.1. Tap selection circuitry. 10.3. Auto calibration circuitry. 127 127 129 131 136 137 10.3.1. Calibration algorithms. 138 10.3.2. Hardware implementation. 142 11. Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. 11.1. Lumped capacitor delay line. 11.1.1. Lumped capacitor selection circuitry. 11.2. Auto calibration circuitry. 145 145 147 149 11.2.1. Calibration algorithm. 150 11.2.2. Hardware implementation. 153 11.3. Comparing the two adjustment schemes. 12. Experimental Results. 12.1. Tap selection scheme. 12.1.1. The complete interpolator. 154 155 155 157 12.2. Lumped capacitor scheme. 162 12.3. Conversion time offset. 164 12.4. Power dissipation. 165 12.5. Summary of results. 165 12.6. Conclusions. 165 References for Part III. PART IV. Conclusion. 13. Summary of Results. 167 169 171 13.1. The ADLL architecture. 171 13.2. The DLL & RC delay line architecture. 172 13.3. TDC characterisation. 173 14. Future Developments. 175 Page ix PART V. Appendixes. 179 A. TDC Characterisation Test Bench. 181 B. Analysis of the DLL Closed Loop Behaviour. 187 C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL. 189 D. Number of Random Samples Required for TDC Characterisation. 193 E. TDC Characterisation Hit Frequency. 197 F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition). 201 G. DNL-aware Algorithms for the RC Delay Line Calibration. 203 References for the Appendixes. 209 Page x List of Figures. PART I. Introduction. Chapter 1. Introduction and Structure of this Work. Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction. Figure 1: The CERN particle accelerator complex (simplified) [4]. 10 Figure 2: Longitudinal and transverse view of ALICE detector [3]. 11 Figure 3: The hierarchical trigger data reduction block diagram of ALICE experiment [3]. 12 Figure 4: Schematic view of the TOF detector front-end. 13 Figure 5: The error propagation chain. 14 Chapter 3. Conversion Basics. Figure 1: Ideal transfer characteristic of a 3-bit converter. 18 Figure 2: Example of a converter transfer function illustrating the static performance metrics. 20 Chapter 4. Review of TDC Architectures. Figure 1: Block and timing diagram of a differential Current Integrating TAC (from [3]). 28 Figure 2: Delay line using double inverters as delay elements. 30 Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals from an odd-numbered oscillator. 31 Figure 4: Delay Locked Loop and hit registers. 32 Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher [33]. 34 Figure 6: Time expander circuit and corresponding timing diagram. 35 Figure 7: Time expansion using two delay lines with different cell delay. 35 Figure 8: Circular vernier scheme for dynamic range expansion. 36 Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark in the vernier scale (lower) lines up with a tick mark in the reference scale (upper) [36]. 38 Figure 10: Time interpolation using voltage sums. 39 Figure 11: Time to analogue converter using a time interpolation technique [38]. 39 Figure 12: Coupled oscillators (time resolution of td * 2 / 3). 40 Figure 13: Array of DLL’s with phase shifting DLL. 42 Figure 14: A TDC converter based on a DLL and a RC delay line. 44 PART II. A TDC Architecture based on an Array of Delay Locked Loops. Chapter 5. Architecture Overview. Figure 1: Delay Locked Loop block diagram. 54 Figure 2: Delay Locked Loop used in a time base application. 54 Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition. 55 Figure 4: Interpolation limits due to cell mismatch. 57 Page xi Figure 5: Dynamic range extension using two coarse time counters. 58 Figure 6: Example of the first level of a read-out buffering hierarchy. 61 Figure 7: The prototype block diagram. 62 Figure 8: Prototype circuit showing main functional blocks. 64 Chapter 6. Analysis of the Limits to the TDC Resolution. Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1% (ADLL: N=35 and F=4, single DLL: N=140). 68 Figure 2: Standard deviation curve resulting from a closed loop jitter of σjitter=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 69 Figure 3: Detail of a delay locked loop depicting the important delays within the loop. 70 Figure 4: Illustration of the effect of the phase detector’s phase error (N=5). 71 Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch (N=5). 72 Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay chain (N=5). 73 Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay chain (N=5). 73 Figure 8: Illustration of the effect of the propagation delay on the sampling signal path - case of the linear hit signal distribution network (N=5). 74 Figure 9: The T-shaped hit signal distribution network. 75 Figure 10: Illustration of the effect of the propagation delay on the sampling signal path - case of the T-shaped hit signal distribution network (N=5). 75 Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase detector input path’s mismatch): DPD(C / K + τdiff)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 77 Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells in the extremes of the delay chain: Din(δin)=1% and Dout(δout)=1% of the average cell (ADLL: N=35 and F=4, single DLL: N=140). 77 Figure 13: DNL and INL curves resulting from the propagation delay on the sampling signal path (linear hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 78 Figure 14: DNL and INL curves resulting from the propagation delay on the sampling signal path (T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 78 Figure 15: DNL and INL curves resulting from the combination of the previous curves (ADLL: N=35 and F=4, single DLL: N=140). 78 Chapter 7. Detailed Implementation. Figure 1: D-flip-flop operating as a two-state phase detector. 79 Figure 2: General and D-FF based two-state phase detector transfer characteristic. 80 Figure 3: Balanced D-flip-flop topology. 81 Figure 4: Balanced D-flip-flop topology featuring fast SR#1 operation. 82 Figure 5: Charge-pump and filter capacitor block diagram. 83 Figure 6: Charge-pump topologies (simplified). 84 Page xii Figure 7: Rising edge propagation along the DLL delay line and corresponding current consumption. 87 Figure 8: The self-biased differential delay cell (from [18]). 88 Figure 9: The current-starved inverter delay cell (simplified version). 88 Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the differential and current-starved inverter structure. 89 Figure 11: Simplified representation of the delay range partition. 90 Figure 12: The selectable-range current-starved inverter cell. 91 Figure 13: The selectable delay ranges (simulation). 92 Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch of the phase signals. 93 Figure 15: Schematic representation of the delay range partition illustrating the viable locking regions. 95 Figure 16: The ADLL tap distribution arrangement. 96 Figure 17: Functional diagram of the channel memory controller [3]. 97 Figure 18: The two-level hit register (1 bit). 97 Figure 19: Two-stage synchroniser using D flip-flops. 98 Figure 20: Alternative control signal distribution configurations within a channel memory row. 99 Figure 21: Integrated error histogram for the two proposed distribution configurations (simulation). 100 Chapter 8. Experimental Results. Figure 1: DNL and INL graphs for the ADLL. 102 Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the delay cell, DPD=-0.1% and Dhit=0.1% of the reference period). 103 Figure 3: DNL and INL graphs for the different Timing DLLs (LSBDLL=4·LSB). 103 Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL=5·LSB). 104 Figure 5: The ADLL auto-correlation graph. 105 Figure 6: DNL and INL graphs for the converter along four reference clock periods. 105 Figure 7: Error graph and histogram resulting from a delay sweep of two reference periods (σ=0.39LSB). 106 Figure 8: DNL and INL graphs obtained from the linear delay sweep results. 106 Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL). 107 Figure 10: Delay sweep over the full dynamic range. 107 Figure 11: Measurement error due to crosstalk in the worst configuration. 108 PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line. Chapter 9. Architecture Overview. Figure 1: Detail of DLL signal propagation illustrating time interpolation through multiple delay line samples (in this example the number of samples acquired is M=5). 117 Figure 2: Time interpolation circuit. 119 Page xiii Figure 3: Continuous delay adjustment scheme based on control of the distributed parameters (simplified). 120 Figure 4: Adjustable delay line using a tap selection scheme. 121 Figure 5: Adjustable delay line using a variable lumped capacitor scheme. 121 Figure 6: Block diagram of the prototype. 123 Figure 7: Prototype circuit showing main functional blocks. 125 Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme. Figure 1: RC line divided in two segments at access point x. R and C are, respectively resistance and capacitance per unit length. 128 Figure 2: Delay line division into equally sized sections. 129 Figure 3: Electrical model of an infinitesimal segment of a transmission line (the T-network). 130 Figure 4: Detail of the physical microstrip line and its equivalent simulation model. 130 Figure 5: Delay line segments’ length adjustment. 133 Figure 6: Adjustment function values. 134 Figure 7: Signal’s rise time along the original and the adjusted delay line, in typical conditions (simulated). 134 Figure 8: Delay and cumulative delay of each line segment (from simulations). 135 Figure 9: The leading and trailing adaptation sections. 135 Figure 10: Segment delay sensitivity to operating conditions (from simulations). The first and second graphs correspond, respectively, to the same line with and without leading and trailing sections. 136 Figure 11: The access point selection circuitry. 137 Figure 12: Calibration procedure for the tap selection adjustment scheme. 140 Figure 13: Results of calibration for different conditions, using the iterative algorithm (from simulation). 140 Figure 14: Results of calibration using the optimum linearity limit (from simulation). 141 Figure 15: Results of calibration for different conditions (from simulation). 142 Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. Figure 1: Adjustment function values (calculated and actually implemented). 146 Figure 2: Bin size (from simulation). The first graph compares different design corners. The second graph shows the effects of extreme environment variations for the typical process. 147 Figure 3: The unit capacitor bank. 148 Figure 4: The lumped capacitor selection circuitry. 148 Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation). 149 Figure 6: The coarse calibration procedure. 151 Figure 7: The fine calibration procedure. 152 Figure 8: Results of the coarse calibration step for different conditions using the proposed algorithm (from simulation). 152 Page xiv Figure 9: Results of the fine calibration for different conditions using restrictive linearity limits (from simulation). 153 Chapter 12. Experimental Results. Figure 1: Delay line calibration results: DNL and INL graphs. 156 Figure 2: Spread of the RC line tap delay over the DLL cells. 156 Figure 3: Temperature dependency of the RC delay line. 157 Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable delay line). 157 Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows. 158 Figure 6: Comparison of the INL graphs of the DLL and of the complete converter. 159 Figure 7: Conversion error (σ=0.51LSB). 159 Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and σ=0.52LSB/60oC). 160 Figure 9: DLL linear time sweep. 160 Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes of the delay chain. 161 Figure 11: DLL conversion error (σ=0.29LSBDLL). 161 Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor adjustment scheme). 162 Figure 13: DNL and INL graphs of the converter (using the lumped capacitor adjustable delay line). 163 Figure 14: Comparison of the INL graphs of the DLL and of the complete converter. 163 Figure 15: Conversion error (σ=0.44LSB). 164 Figure 16: DLL conversion error (σ=0.29LSBDLL). 164 PART IV. Conclusion. Chapter 13. Summary of Results. Chapter 14. Future Developments. Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC with four times smaller LSB, using the same building blocks and an RC delay line. 176 Figure 2: The general purpose TDC architecture. 176 Figure 3: Block diagram of the general purpose TDC. 177 PART V. Appendixes. Appendix A. TDC Characterisation Test Bench. Figure 1: The linear passive delay generator block diagram (computer controlled). 183 Figure 2: The linear passive delay generator block diagram (automated). 184 Appendix B. Analysis of the DLL Closed Loop Behaviour. Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL. Figure 1: Voltage controlled delay line with fixed length. 189 Page xv Appendix D. Number of Random Samples Required for TDC Characterisation. Figure 1: P(-zα/2 < Z < zα/2) = 1-α. 194 Appendix E. TDC Characterisation Hit Frequency. Figure 1: The clock multiplying PLL. 199 Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition). Figure 1: Detail of a delay locked loop depicting the important delays within the loop (notice the alternative location of tap 0). 201 Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration. Figure 1: Calibration procedure for the tap selection adjustment scheme. 204 Figure 2: The coarse calibration procedure. 206 Figure 3: The fine calibration procedure (first loop). 207 Figure 4: The fine calibration procedure (second loop). 208 Page xvi List of Tables. PART I. Introduction. Chapter 1. Introduction and Structure of this Work. Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction. Chapter 3. Conversion Basics. Chapter 4. Review of TDC Architectures. Table 1: Comparison between the different architectures discussed in the chapter. 44 PART II. A TDC Architecture based on an Array of Delay Locked Loops. Chapter 5. Architecture Overview. Chapter 6. Analysis of the Limits to the TDC Resolution. Chapter 7. Detailed Implementation. Table 1: Summary of noise sensitivity and power consumption analysis. 90 Table 2: Summary of noise sensitivity and power consumption analysis for the proposed cell. 92 Chapter 8. Experimental Results. Table 1: Locking status for each working range, after the initialisation procedure. 101 Table 2: Summary of the linearity obtained for each DLL in the array (LSBDLL=4·LSB and LSBDLL-PS=5·LSB). 104 Table 3: Characteristics of the TDC prototype. 109 PART III. A TDC Architecture based on a DLL and a Passive RC Delay Line. Chapter 9. Architecture Overview. Chapter 10. The Adjustable RC Delay Line using a Tap Selection Scheme. Table 1: Comparison of the two proposed algorithms. 143 Table 2: Register (accumulator) requirements for the two proposed algorithms. 143 Table 3: Comparator requirements for the two proposed algorithms. 144 Chapter 11. The Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. Table 1: Register (accumulator) requirements for the present algorithm. 153 Table 2: Comparator requirements for the present algorithm. 153 Chapter 12. Experimental Results. Table 1: Characteristics of the TDC prototype. 165 PART IV. Conclusion. Chapter 13. Summary of Results. Chapter 14. Future Developments. Table 1: Timing specification of the general purpose TDC. 178 Page xvii PART V. Appendixes. Appendix A. TDC Characterisation Test Bench. Appendix B. Analysis of the DLL Closed Loop Behaviour. Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-linearity of a DLL. Appendix D. Number of Random Samples Required for TDC Characterisation. Appendix E. TDC Characterisation Hit Frequency. Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition). Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration. Page xviii Glossary of Acronyms. ADC Analogue-to-Digital Converter ADLL Array of Delay Locked Loops ALICE A Large Ion Collider Experiment ASIC Application Specific Integrated Circuit CDT Code Density Test CERN European Organisation for Nuclear Research CMRR Common Mode Rejection Ratio CMOS Complementary Metal-Oxide-Silicon Field Effect Transistor Logic CUT Channel Under Test DAQ Data Acquisition System D-FF D-type Flip-Flop DLL Delay Locked Loop DNL Differential Non-Linearity DUT Device Under Test HEP High Energy Physics HMPID High-Momentum Particle Identification HRTDC High Resolution Time-to-Digital Converter IC Integrated Circuit INL Integral Non-Linearity ITS Inner Tracking System JLCC J-Leaded Chip Carrier LADAR Laser Radar LHC Large Hadron Collider LIDAR Light Detection and Ranging LIP Laboratório de Instrumentação e Física Experimental de Partículas Page xix LSB Least Significant Bit NMOS N-Channel Metal-Oxide-Silicon Field Effect Transistor PDF Probability Density Function PECL Positive Emitter Coupled Logic PHOS Photon Spectrometer PID Particle Identification PLCC Plastic Leaded Chip Carrier PLL Phase Locked Loop PMOS P-Channel Metal-Oxide-Silicon Field Effect Transistor RC Resistive-Capacitive RMS Root Mean Square TAC Time-to-Amplitude Converter TDC Time-to-Digital Converter T/D Time-to-Digital TOF Time-of-Flight TPC Time Projection Chamber VCDL Voltage Controlled Delay Line VCO Voltage Controlled Oscillator Page xx PART I. INTRODUCTION. Page 1 Page 2 Chapter 1. Introduction and Structure of this Work. In this thesis we describe the development and demonstration of architectures adapted for the accurate measurement of short time intervals. High-resolution time measurements have been performed in the past using instruments based on analogue measurement techniques. These instruments were built using discrete components or using a single Integrated Circuit (IC) employing special high performance “analogue” technologies. Our goal is to evaluate and demonstrate architectures that are suitable for monolithic integration and which can be built in a standard CMOS technology. The ability to share the same time interpolator between several measurement channels is also a major aim of the work. Furthermore, it is intended that these architectures be implemented together with all the necessary digital signal processing circuitry to build a converter with full functionality. Although the emphasis of this work is the architecture development, we carried out detailed analysis of the critical circuitry that determines the timing performance of the converter. Domain of application of this work. The work was carried out at the “European Organisation for Nuclear Research” (CERN), in Geneva, as a collaboration between the Microelectronics group and the “Laboratório de Instrumentação e Física Experimental de Partículas” (LIP), Lisbon. Therefore, emphasis is given to the specific requirements of the High-Energy Physics experimental environment. Nevertheless, the conclusions we obtain from the work are applicable in any domain where high-resolution time measurements are required, for example in LIDAR (LIght Detection And Ranging) and LADAR (Laser rADAR) applications. Our work contains contributions that can be useful in the domain of phase and delay synthesis, in applications such as time bases for digital oscilloscopes, phase modulation and demodulation as well as phase synchronisation. Page 3 Structure of the thesis. The structure of this thesis follows naturally the developments achieved along the duration of the work. It is divided into four parts, each describing a major milestone of the work. In the first part of this thesis, we start with an introduction to the subject. It includes a brief description of the goals of a High-Energy Physics experiment and the systems needed to achieve them. The necessity of high-resolution time measurements is emphasised together with the particular constrains of the experimental environment (Chapter 2.). A general overview of the interesting characteristics of a Time-to-Digital Converter (TDC) is given in the form of the set of characterisation metrics that we used throughout the work to evaluate the time performance of T/D converters. A short description of the effects of the quantisation error and of the different noise sources that may be present is also given (Chapter 3.). We then present a brief review of the common types of time interval measurement systems that have been used in the past, highlighting their advantages and disadvantages. This review includes recent proposals that aim at the same goals as the ones pursued in this work (Chapter 4.). In the second part of this thesis we develop the analysis carried out to evaluate an architecture based on an Array of Delay Locked Loops (ADLL). As a corollary of this evaluation, a TDC demonstrator was built based on this architecture. An overview of the time interpolation scheme resulting from the phase shifting of a number of Delay Locked Loops (DLL) is presented. We review the main features of the scheme, emphasising its inherent advantages and difficulties. A block diagram and a short description of the TDC prototype is presented, together with the estimated timing performance (Chapter 5.). A detailed analysis of the causes of non-linearity that degrade the performance of a DLL-based converter is derived and an analytical model that predicts their effects in the conversion characteristic is presented. This analysis is extended to the ADLL-based converter. A similar analysis is carried out for the phase noise generated due to the dynamics of the DLL operation (Chapter 6.). Having established a model for the causes and consequences of non-linearity and phase noise, the critical circuit blocks are then described. Ways to improve their performance and ensure that they match the required characteristics are proposed (Chapter 7.). We then proceed to present the experimental results obtained from the prototype TDC that was built based on this architecture, and demonstrate that these results are in accordance with the analysis carried out (Chapter 8.). Page 4 Chapter 1: Introduction and Structure of this Work. In the third part of this thesis, a new architecture suitable for low power operation is proposed. The basic building block of this architecture is also a DLL, but finer time interpolation is obtained using passive RC delay lines. The principle of operation of this new architecture is described. The main characteristics of the architecture are detailed, with an emphasis on the interesting properties of RC delay lines. Two alternative adjustable delay line schemes are proposed. A block diagram and a short description of the TDC prototype built using this architecture is presented and an estimation of the timing performance exposed (Chapter 9.). We then carry out the detailed analysis of the adjustable RC delay line based on a tap selection scheme. We develop a simulation model of the distributed delay line that includes all the significant devices (lumped or distributed) that contribute to its delay characteristics. We propose a method to derive the dimensions of each of the segments into which the line is divided based on the delay requirements as well as on the dimension of the surrounding circuitry. A few calibration algorithms are also proposed and their performance is illustrated based on simulated delay line conditions (Chapter 10.). The same kind of analysis is performed for the adjustable RC delay line based on a variable lumped capacitor scheme. We present different calibration algorithms (Chapter 11.). As a corollary of this part of the work we present the experimental results obtained from a demonstrator TDC built using this architecture. Based on these results, we validate our analysis and confirm that this architecture performs as expected (Chapter 12.). The concluding part of this work is divided into two chapters. In the first, we highlight the contributions and developments carried out during this work (Chapter 13.). In the second, we propose what amounts to be the logical conclusion of this work: a general purpose TDC architecture using the DLL / RC delay line based architecture that we developed. This TDC is able to perform alternatively low resolution measurements in a large number of integrated channels or high-resolution time measurements in a small number of integrated channels (Chapter 14.). Finally a few appendices, complimentary to the main text, are included. They expand and complete the explanations given in the main text. Of relevance is the description of the test bench that we developed specifically for TDC characterisation. This test bench was used throughout the work to evaluate the TDC prototypes that were built (Appendix A.). Main contributions of this work. As the structure of the thesis makes clear, we will present two integrated circuits that demonstrate two different solutions for the multi-channel, high-resolution time measurement system requirements. Page 5 • A four channel high-resolution TDC. This IC implements the Array of Delay Locked Loops (ADLL) architecture. Apart from the extended dynamic range time interpolation core this circuit also integrates digital logic to perform important functions such as encoding, buffering and read-out management. • A two channel high-resolution TDC. This IC implements a novel time interpolation architecture, based on a DLL and a passive RC delay line. This architecture allows for higher resolution with lower power operation. Some important results were obtained while designing these circuits. They are presented in this work: • A detailed study of the behaviour of a Delay Locked Loop (DLL) was carried out. We show how different error mechanisms affect the accuracy of the time interpolation and propose solutions to minimise these effects. • These studies are extended to the more complex case of the Array of DLL’s (ADLL). We show that for a given device mismatch level, there is an optimal interpolation factor (number of DLL’s in the array) that results in a consequent improvement of the resolution of a converter built this way. • An alternative architecture that avoids some of the limitations identified on the ADLL-based architecture, such as power dissipation and maximum resolution that can be obtained. • A procedure to compensate for technological tolerances in tapped passive RC delay lines is proposed. We proceed to present several methods to characterise and adjust these lines. We then analyse the possibility of integrating the adjustment algorithms in the same IC. Related publications. The contributions made during the course of this research led to the following publications: Mota, M., Christiansen, J., A high-resolution time interpolator based on a Delay Locked Loop and an RC delay line, IEEE Journal of Solid-State Circuits, vol. 34, no. 10, pp. 1360-1366, Oct. 1999. Mota, M., Christiansen, J., A four channel, self –calibrating, high-resolution Timeto-Digital Converter, Proceedings of the 5th. IEEE International Conference on Electronics, Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 1998. Mota, M., Christiansen, J., A high-resolution Time-to-Digital Converter based on an Array of Delay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHC Experiments, London, UK, Sep. 1997. Page 6 Chapter 1: Introduction and Structure of this Work. Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49, ALICE/2000-02 internal note/TOF, Mar. 2000. Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EP internal note, Geneva, Switzerland, 1997. Contributions in the field of microelectronics applied to the High-Energy Physics domain led to the following additional publications: Mota, M., Gomes, P., Christiansen, J., MEC3 – A pipelined zero-suppression and trigger matching chip, IEEE Transactions on Nuclear Science, vol. 42, no. 4, pt. 1, pp. 808-811, Aug. 1995. Gomes, P., Mota, M., Christiansen, J., NANA – An integrated signal processor and record builder for level-2 read-out of asynchronous event-filtering digital pipelines, IEEE Transactions on Nuclear Science, vol. 42, no. 4, pt.1, pp. 849-853, Aug. 1995. Page 7 Page 8 Chapter 2. Time Interval Measurements in HEP Experiments – An Introduction. High-Energy Physics (HEP), or particle physics, is the discipline that explores and tries to understand the deep structure of matter [1]. As the discipline evolved, some models where developed to explain this structure. As in any scientific endeavour, the particle physicist is not satisfied until his theoretical developments – the models – have been demonstrated by experimental means. His experiments may, however, bring to light finer, and not completely understood, phenomena. The cycle of scientific progress is now closed: new models have to be developed which require the elaboration of new and more performant experiments to verify them. 2.1. High-Energy Physics experiments. The quest for the structure of the matter has been a progressive effort. In parallel with this effort, and enabling it, a big development effort has been dedicated to the design of new and more powerful machines that act as “microscopes” exposing the ever smaller and hidden constituents of the matter. These “microscopes” take the form of particle accelerators, where bunches of particles (for example ions, protons, electrons, etc) accelerated to very high energies are made to collide. The interaction between these particles, due to the bunch collision, results in the conversion of the original particles into a diversity of new particles, in a process akin to the breaking up of a nucleus into its constituent protons and neutrons, when bombarded by other energetic particles. It’s these new particles that are the object of the attention of the physicist, since they explain how the original particle is made and how it interacts with its environment. Surrounding the interaction point (where bunches of particles collide) is a complex set of detectors, sensitive to the different kinds of particles generated at the interaction moment. As these resulting particles transverse the detectors, some of their energy is captured by the detector, which converts it into an electrical signal (charge, current or voltage). This signal is then amplified and processed by the front-end electronics from where it is transferred to powerful computers. Page 9 Traditionally, only the pre-amplifier would be mounted close to the respective detector cell. Its function was to optimally shape the detector signal and drive it through 15 to 50 meters of cable up to the electronics hut, where all the front-end processing would be performed. In modern experiments, where very high granularity is needed, with well over 106 cells with independent sensors, this topology is no longer applicable. Fortunately, state-of-the-art technology can be used to integrate the required front-end electronics into a limited number, or even a single ASIC (Application Specific Integrated Circuit) that can be directly mounted on the detector. In this way, a vast quantity of cables is avoided and a higher function density and lower power dissipation is achieved [2]. All the phenomena that are studied in a HEP experiment abide to statistical laws. The quantities that are to be measured with a detector sensor, either the amount energy deposited or the moment and position of the particle crossing also include some uncertainty in relation to their exact value. Therefore, multiple similar events must be analysed, the standard deviation of their statistical distribution being of relevance to their identification. 2.1.1. A HEP experiment at CERN1: ALICE. One of such detector systems is being developed in the context of the ALICE collaboration (A Large Ion Collider Experiment) [3]. The main goal of this collaboration is to study experimentally the collision of heavy ions (for example, lead ions) at high energy densities. Figure 1: The CERN particle accelerator complex (simplified) [4]. These ions are accelerated to very high energies by a group of accelerator machines connected in series that culminate on the Large Hadron Collider (LHC), a 27Km 1 CERN: European Organisation for Nuclear Research, Geneva, Switzerland. Page 10 Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction. perimeter circular accelerator. The LHC will include the interaction point where the ALICE detector will be built to observe the particle collision (see Figure 1). The LHC accelerator itself is made of two identical rings where bunches of ions (or, alternatively, protons) travel in opposite directions with high energy. In the interaction points, the two rings intercept and the particle bunches are allowed to collide. The detector system itself is a group of detectors [3], each optimised to observe different ranges of particles emerging from the interaction point. These detectors comprise an Inner Tracking System (ITS) with six layers of high-resolution silicon tracking detectors, a cylindrical Time Projection Chamber (TPC) and finally a large area Particle IDentification (PID) array of Time-Of-Flight (TOF) counters. The TPC is the main tracking system of the experiment. The ITS in mainly used for detailed reconstruction of the vertex of the interaction very close to its origin. Both of them also aid the PID detector in the identification of particles. In addition, a few specialised detectors are included: the electromagnetic calorimeter (PHOS – PHOton Spectrometer), the High Momentum PID (HMPID), the muon spectrometer and others. An outer magnet is necessary to bend the trajectory of charged particles, thereby easing their identification (Figure 2). Particle are identified by two different mechanisms. Low and medium momentum particles are identified, respectively, in the ITS and in the TPC by the dE/dx technique (the rate at which they loose energy as they transverse the detector). Higher momentum particles are identified in the PID detector using the TOF technique (the time that the particle takes to progress from the interaction point to the detector surface). Figure 2: Longitudinal and transverse view of the ALICE detector [3]. The amount of data generated after each bunch collision (or event) is very large. To reduce the bandwidth requirements on the data acquisition (DAQ) system, and also the Page 11 amount of memory needed for data storage, on-line data reduction algorithms are applied to the data. The data reduction algorithms take advantage of the spatial and temporal characteristics of the events: only a limited number of detector cells are actually crossed by an emerging particle. The output of the other, idle, cells can safely be discarded since it contains no information. This operation is called “zero-suppression”. Furthermore, not all the events are interesting to study. It is possible to implement in hardware algorithms that sample the data of selected detectors to decide if an event includes some interesting characteristics that deserve further attention. Otherwise, all data pertaining to that event may be discarded. This operation is called “trigger based data reduction”. In general, several levels of trigger based data reduction are implemented. They correspond to a hierarchy of data reduction algorithms that are progressively more selective. However, they are also more complex and slow. Figure 3: The hierarchical trigger data reduction block diagram of the ALICE experiment [3]. The principle of the trigger based data reduction hierarchy in ALICE is pictured in Figure 3 [3]. A first level of data reduction (L0) is used simply to signal the existence of an interaction as soon as possible. It is not a very selective filter. The second level of data reduction (L1) already uses information on the quality of the event to produce a large reduction in accepted event rate. Both of these trigger processors produce a decision with a fixed latency. After the L1 trigger decision is taken, the read-out of the data from all detectors is started, pending the more selective decision of the third level trigger (L2). At Page 12 Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction. that moment, the read-out of the detector’s data into the DAQ system can be finalised. Overall, an event rate reduction of the order of 103 is obtained. Consequently, the bandwidth of the DAQ system that is needed is proportionally reduced. 2.2. High-resolution time interval measurements in ALICE. The efficiency of the particle identification using the TOF technique is directly related to its time resolution. This is especially critical in the higher momentum side of the identification range [5]. As a consequence, the TOF detector in the ALICE experiment is an array of sensors (counters) having a high time resolution (from σdet~40ps to 100ps, depending on the detector technology chosen). The detector sensor is only a small part of the system. The front-end electronics also generate some time uncertainties that will add up to the intrinsic detector resolution, limiting the overall time resolution of the system. A simplified view of the front-end electronics proposed for the TOF detector is shown in Figure 4. The time of flight of the particle resulting from the interaction is the difference between the instant when the interaction occurred, t0, which is captured by a specialised detector (the t0 detector) and the instant when the emerging particle transverses the TOF detector surface. Traditionally, this time interval would be measured in a single device (a Time-toDigital Converter – TDC). However, the dimensions of the detector system (>150,000 cells distributed over ~100m2) render impractical the distribution of t0 over the whole system. A better solution is to rely on the reference clock (clkref), which has to be distributed anyway, as the time reference of the measurements. Each limit of the time interval can then be measured individually and later subtracted digitally to obtain the original interval. 7m TDC TOF detector cells 3.5m Interaction t0 detector clkref distribution pre-amplifier & discriminator time of flight TDC time of interaction (bunch ID) Figure 4: Schematic view of the TOF detector front-end. The actual interaction and crossing instants are reflected in the timing characteristics of the electrical signal that the respective detector generates. These signals are the object Page 13 of some processing (amplification, discrimination, etc) in order to render them usable by the TDC that converts the timing information they carry into a binary word. The timing uncertainties created by such processing, and by the digital conversion procedure, must be added to the intrinsic uncertainty of the TOF and t0 detectors (σdet and σt0, respectively) in order to obtain the overall time resolution of the system. σt0 σdet detector cell (t0 / TOF) σfe σfe front-end electronics (pre-amp & discriminator) σTDC σTDC TDC σclk σclk clkref distribution clkref Figure 5: The error propagation chain. In such a distributed system, it is reasonable to assume that all the time uncertainties generated in the different blocks are uncorrelated. Therefore, following the error propagation scheme of Figure 5, the time uncertainty of the TOF system is: 2 2 2 σTOF = σ t20 + σ 2det + 2 ⋅ σ 2fe + 2 ⋅ σTDC + 2 ⋅ σ clk , where, for simplicity, the time uncertainty of the front-end block (σfe), of the T/D converter (σTDC) and of the clock distribution network (σclk) were considered having the same statistical properties in the two independent chains. If the intrinsic time resolution of the detector is to be respected, it is important to minimise the time uncertainty created by all the electronic components of the chain. The overall contribution of the electronics should only be a small fraction of the time uncertainty of the complete TOF system. To obtain an overall time uncertainty better than σTOF=150ps, as required by the ALICE experiment, the resolution of the T/D converter must be σTDC<50ps. It is assumed, as in [6], that the time uncertainty of the TOF counters is σdet=100ps, and that the values for σt0, σfe and σclk are, respectively, 50ps, 10ps and 50ps. Apart from the timing performance of the TOF electronics, the particular physical constrains of these experiments (large number of detector cells, electronics mounted directly on the detector), generate new demands on the electronics to be used. Commercial Page 14 Chapter 2: Time Interval Measurements in HEP Experiments – An Introduction. components and instruments like low noise and fast amplifiers, low time-walk discriminators, and high-resolution T/D converters exist, but their size and power dissipation are seldom adapted to the specific requirements of modern HEP experiments like the one described. Page 15 Page 16 Chapter 3. Conversion Basics. The remarkable development of computers and other digital means of processing data during the last few decades has enabled the creation of new and more powerful instruments for observing and studying the world that surrounds us. Of course, this is essentially an analogue world since observable quantities may suffer continuous time and amplitude variations. Their translation into electric signals also results in analogue quantities. The interfaces between the analogue domain and the digital domain are performed by the Analogue-to-Digital Converters. They capture the analogue quantities and convert them into their digital representations, which should be the exact counterpart of the respective analogue quantity, independently of the properties of the converter used. The capture of an analogue quantity in a discrete format by means of an electronic converter is unfortunately not error-free. Indeed, some loss of information is inherent to the amplitude quantising operation1. Furthermore, given the technological limitations and the environment in which these converters operate, other sources of errors will indubitably affect the conversion transfer function, making it different from the idealised one. Several converter architectures and several implementations of these architectures have been proposed over time. All of them have claimed their advantages by showing different, and some times conflicting, performance parameters. A quick scan of the literature [7],[8] and of commercial converter data-sheets shows that even if some performance metrics are commonly used (INL, DNL, etc), their definition may differ. It is therefore important to clarify which metrics will be used throughout this text to characterise the converters, and what is their significance. Furthermore, most of the performance metrics have been developed and used in the context of conventional A/D converters. Some of these are not directly applicable to the T/D converter characterisation, either because they are meaningless (maximum input frequency, hold time, droop rate, etc), or because their meaning is different (maximum sampling rate). Also some new performance parameters, adapted to the specific application, must be developed. 1 Given some restrictions to the signal bandwidth B, The Nyquist criterion assures that the sampling operation preserves all the characteristics of signal if the appropriate sampling frequency is used (fsample=2·B). Page 17 The performance metrics that will be used throughout this text are presented here. Their meaning and significance will be explained, as well as the way they can be measured, if relevant. 3.1. Performance metrics. A T/D converter performs the conversion of a time interval (a delay) into a binary word. This operation inevitably includes an amplitude discretisation (quantisation), which means that its transfer function is staircase shaped, as shown in Figure 1. digital output tapi+1 tapi bini LSB dynamic range analogue input Figure 1: Ideal transfer characteristic of a 3-bit converter. An ideal converter is characterised by its Least Significant Bit (LSB) and the conversion Dynamic Range. The LSB corresponds to the smallest delay that can be discriminated and the Dynamic Range corresponds to the larger delay that can be measured. After conversion, the delay is converted into a discrete number of Codes, each corresponding to a “stair” of the transfer curve. A delay is said to belong to bini if its length is smaller than the one corresponding to Codei+1 but not smaller than the one corresponding to Codei. For applications such as the T/D converters based on the architectures developed in this work, the definition of Code is interchanged with the more meaningful definition of tap. Departures from the ideal behaviour of the converters are usually characterised using a given set of metrics, such as Differential and Integral non-linearity, Gain error, Offset. Since some of these static performance metrics have different definitions depending on the application, a set of appropriate definitions is given and briefly discussed: Differential Non-Linearity (DNL) is the deviation of the output bin size from its ideal value of one least significant bit (LSB). For a given bini, the differential non- Page 18 Chapter 3: Conversion Basics. linearity DNLi is given by the following equation, where di is the measured cumulative delay from the origin to the tapi. DNLi = d i +1 − d i − LSB , i= 0..N-1. LSB The result is usually presented as a graph representing all the N bins being characterised, together with the standard deviation of the DNL. Integral Non-Linearity (INL) is the deviation of the input/output characteristic and a straight line of ideal gain (slope) that best fits the curve, obtained by adding an offset to the ideal transfer characteristic. Using this definition, Gain error is zero, because its effect is included in the INL result. The INL graph is usually presented, together with the standard deviation of the INL. This definition of INL does not exactly match the usual definitions, as summarised in [7]. However, it satisfies the particular requirements of the T/D converters to which these metrics are applied. The principle of operation of most of the T/D converters that will be presented here relies in the concatenation of repeated images of a transfer function with small LSB along the full dynamic range of the converter. The concatenation being guided by an external reference signal that also serves as the overall reference to the converter. In this context, it is standard practice to characterise in great detail only a limited section of the dynamic range, corresponding to one or more images of the above mentioned transfer function. The performance measured in this section is then extrapolated to the full dynamic range (which is itself a simple repetition of this section). The definition of INL used must allow for this extrapolation operation, therefore the gain error must be included in the INL measure. The concatenation of the transfer function must be verified to confirm that the extrapolation of the INL measure is valid. Given the principle of the operation of these T/D converters, it is only necessary to check that all of the images are present and are not superimposed. A coarse INL characterisation of the full dynamic range identifies any concatenation error that may be present. For a given bin i, the integral non-linearity INLi is given by the following equation, where di is the measured cumulative delay from the origin to the tapi and the Offset Odelay is defined below INLi = d i − Odelay − i ⋅ LSB LSB , i=0..N-1. Gain error is the deviation of the slope of the line used in the INL calculation from its ideal value. As stated before, the definition of INL used results in null gain error. Page 19 Offset is the vertical intercept of the line to which the transfer function is compared in the INL calculation. The Offset, Odelay, is such that the squared residual of εi is minimised, ε i = d i − Odelay − i ⋅ LSB , i=0..N-1. In our case, this definition results in a relative offset of the transfer curve. An absolute offset would have to take into account the offset due to different signal paths of the reference and hit signals within (and outside) of the circuit. Since an absolute offset value depends on the system where the TDC is incorporated and must anyway be measured at system level, no further mention is made of this metric. These static metrics (illustrated in Figure 2) reflect how close the transfer function of the converter is to the ideal curve. They can be obtained using statistical methods such as the histogram method, also known as the Code Density Test (CDT). A more detailed overview of this method and of the test set-up used can be found in [9] and Appendix A. digital output INLi DNLi+LSB Offset analogue input Figure 2: Example of a converter transfer function illustrating the static performance metrics. Another important characteristic of the converter, which reflects its behaviour in the presence of random error sources such as loop jitter, electrical noise or quantising noise is the Conversion error: Conversion Error is the deviation of the input/output characteristic from a straight line of ideal gain (slope) that best fits the curve. The result is presented as an histogram of the error, and its standard deviation is defined as the RMS Resolution of the converter. This definition is quite similar to the INL definition given above, the difference being on the method by which the transfer curve is obtained. In this case it is obtained via a linear time sweep over the dynamic range (see Appendix A), while the INL graph is (in our case) obtained using randomly generated hits in code density tests. Page 20 Chapter 3: Conversion Basics. This metric reflects a different way of characterising the circuit, very appropriate for High-Energy Physics experiments, where the response of most of the detectors to a particle crossing includes some time (and amplitude) uncertainty which is reflected in the standard deviation of their transfer function. Other performance metrics of a converter are included here, for completeness. Crosstalk between channels reflects the error introduced in the transfer function of a given channel when electric activity occurs in any other channel integrated (or not) in the same circuit. It is presented as a maximum deviation of the transfer function in any coupling conditions. Double hit resolution is a measure of the minimum time interval between two consecutive samples of the quantity being measured. In the TDC domain this quantity is a time interval. This metric is similar to the maximum sampling frequency used in the context of ADC characterisation. However it is more adapted to the characterisation of T/D converters due to the random nature of their sampling activity. The following characteristics do not reflect the timing performance of the converter, but they are important to establish the applicability of one particular converter circuit to the envisaged system. Number of integrated channels. Power dissipation per channel. Calibration requirements. System-level functionality integrated (memory, etc). 3.2. Error sources. The performance metrics already discussed describe the observable effects of all the error sources that influence the converter system. In this section the causes of these errors will be briefly exposed. Only the general error causes will be discussed. Particular conversion architectures are affected by different error mechanisms. These will be discussed together with the respective architecture. Quantisation error. The quantising operation is inherent to the operation of any converter. It consists of the approximation of the amplitude of the quantity being converted to a level that is part of a limited set of available levels. The resulting signal is a discrete amplitude representation of the sampled signal. It can be directly represented in a binary format. The effect of the quantising operation is an error in the conversion result. This error is proportional to the LSB of the conversion, varying between –LSB/2 and LSB/2. Quantising is usually seen as a source of additive noise. To formulate its impact on the Page 21 performance of the converter, this additive noise is assumed to be a random variable with a uniform distribution between –LSB/2 and LSB/2 and that it is independent of the input amplitude [8]. While these assumptions are not strictly valid, they do result in a reasonable approximation for converters above 4 bits. This random variable has a standard deviation of: σq = LSB 12 . Reference phase noise (Jitter). The quality of the reference that the converter uses is determinant to the operation of the converter. Some converter architectures include means of averaging the important properties of the reference over time, thereby filtering out harmful variations of these properties and reducing the conversion errors. However, this filtering function has limited effects and therefore it is safer to rely on a high quality reference that can be used as it is delivered to the converter. In the context of modern T/D converters, the reference is usually a periodic signal with its phase noise (or jitter) being the important quality factor. Jitter present in the reference will force the converter to permanently try to adapt to the changing period of the reference. Therefore any jitter on the reference signal will lead to an added random noise component to the conversion function. Other noise sources. Several other sources of conversion errors may be present in a T/D converter, just like in any other electronic circuit. A careful design minimises de sensitivity of the transfer function of the converter to these noise sources. A distinction can be made between intrinsic and extrinsic noise sources. Intrinsic noise is due to random motion of charge carriers in the devices (active or passive) that make up the circuit. It is always present in the signals flowing in the circuit. The origin of several kinds of intrinsic noise will be shortly described here [10]. However, given the large voltage levels of most of the signals used in the converters discussed in this dissertation, their influence in the performance of the converters is small. Thermal noise is a temperature dependent noise. It originates from the thermally induced random motion of charge carriers within the device. It has a flat spectral density (white noise) and a gaussian amplitude probability distribution function (PDF) with zero mean. The variance σ2(i) is a function of the temperature T and the resistance value R (k is the Boltzman constant and f is the frequency). σ 2 (i ) = 4 ⋅ k ⋅ T ⋅ Page 22 1 ⋅ ∆f (A 2 ) R Chapter 3: Conversion Basics. Shot noise is due to the random passage of charge carriers across a potential barrier in a semiconductor junction. Therefore it depends on the direct current flowing on the device. It has the same spectral and amplitude characteristics of thermal noise. The variance σ2(i) is a function of the direct current ID and of the electronic charge q. σ 2 (i ) = 2 ⋅ q ⋅ I D ⋅ ∆f (A 2 ) Flicker noise (or 1/f noise) describes the quality of the conductive medium with respect to the direct current flow. Several origins may contribute to this noise. Its amplitude PDF is often non-gaussian, but the spectral density is proportional to 1/f (hence the name). The expression of the variance σ2(i) of the amplitude of this kind of noise includes two terms that have to be experimentally determined, K and a: σ 2 (i ) = K ⋅ I Da ⋅ ∆f (A 2 ) f Other kinds of intrinsic noise having a spectral density with a higher order dependency on the frequency, such as popcorn, or burst noise (1/f2) reflect mostly the quality of the processing of the material. Their amplitude PDF is not gaussian. Finally avalanche, or breakdown noise is caused by the avalanche process just before junction breakdown. Its spectral density is usually flat and its amplitude PDF is not gaussian. Extrinsic noise, on the other hand, is a product of the interference of the external circuitry in the behaviour of the sensitive circuit [11]. Extrinsic noise requires a path via which the noise source can couple into the sensitive circuit. Therefore it is strongly linked to the circuit layout and to the signal distribution topology. This interference may be random or deterministic. Of several possible coupling methods we will only discuss the more relevant in the integrated circuit domain, Capacitive coupling, Conductive coupling (via shared signal paths) and Inductive coupling. Capacitive coupling is due to the existence of electric fields between any two conductors. The current flowing through the coupling capacitor is a function the rate of change of the potential difference across its terminals. Therefore any signal variation in one of the plates of the coupling capacitor induces a variation in the other plate. This effect is often known as crosstalk. It may be significant where the coupling capacitor is large (for example, two long parallel lines) or where high frequency, and large amplitude signal variations occur close to a weak signal path. Conductive coupling is due to the existence of a direct signal connection between the noise generating circuit and the sensitive circuit. These connections may be the input signals, the common power supply or ground node. Power supply and ground distribution within IC circuits requires complex networks. Although these networks are made of low resistivity lines, the overall resistance is not Page 23 negligible. In the presence of switching activity, periodic current surges flow through them, leading to voltage drops or bounces. These voltage variations may affect the sensitive circuit. Noise coupling through the power supply distribution is also known as supply noise. Inductive coupling is usually not considered in the context of the integrated circuit itself, given its small dimensions. However the package interconnects and the bond wires that establish the connection between the IC and the rest of the circuit can be sensitive to this coupling effect. It is due to a varying magnetic field around a conductor where current is varied. Since the magnetic field extends around other conductors in the vicinity, its variation may provoke a voltage change in them. In the case of the bond wires dedicated to power supply and ground, where relatively large current variations may be present due to the switching activity of the circuit, the inductance of the wire may cause a voltage change across the supply network. This effect is also named supply noise. As mentioned before extrinsic noise may be of random or deterministic nature. If it is of random nature, then it must be studied using statistical analytical methods. If it is of deterministic nature, circuit analysis methods can be used. In synchronous circuits supply noise disturbs the sensitive circuit in a systematic (and periodic) way. The knowledge of the characteristics of the noise generating circuit can be used to minimise its effects on the functionality of the sensitive circuit. Offset variation. In a TDC system, conversion offset is determined by the delay that the sampling signals experience throughout the system as it progresses until the converter. It is a system-wide characteristic, therefore it only makes sense to discuss it at system level. Typically, offset is calibrated at start-up time, performing a direct measurement of the propagation delay of the sampling signal (using the converter itself). At the TDC circuit level, it is possible to minimise the temperature sensitivity of the conversion offset, by forcing the reference and the sampling signals to have similar delays inside the circuit and to have the same temperature dependency of the two paths’ delay. Since the sampling signal will typically transverse a front-end chain consisting of some electronic devices, like buffers or signal conditioners, its delay will be sensitive to temperature changes. These changes are expected to be larger than the corresponding variations at the TDC level. Periodic system-wide calibrations are therefore required if environment changes are expected. Page 24 Chapter 3: 3.3. Conversion Basics. Converter calibration. Any converter system requires a known reference from which the conversion gain (or constant of proportionality) can be derived. The procedure that leads to the adjustment of the transfer function to the idealised characteristics is called Converter Calibration. In a wider sense, the offline determination of the transfer function that leads to the relationship binding the digital representation to the measured quantity can also be included in this definition, although it does not influence the converter operation. The calibration reference can be a set of pre-determined quantities, converted together with the actual signal, which can be used to derive the transfer function of the converter. A single start-up calibration is sufficient if the converter circuit is not sensitive to environment variations. On the other hand, if the constant of proportionality is sensitive to environment changes, this procedure has to be executed periodically and the updated transfer function applied to the data. This calibration procedure does not set any requirements to the converter, since it is executed offline. However some conversion dead time is incurred due to the conversion time of the reference quantities. The hardware necessary to calculate the transfer function from these reference quantities can be integrated in the converter. Its knowledge can then be used to perform internal calibration of the converter. In this case the output data will always be calibrated in relation to the given reference. Conversion dead time is, however, unavoidable. In all these schemes the calibration procedure is performed periodically, therefore changes that may occur between these calibration runs are not accounted for and large conversion errors may develop. To avoid this problem the best solution is make the converter perform continuous calibration in a non-intrusive way, so that no dead time penalty is incurred. In these schemes the transfer function is directly derived from a reference signal and does not depend on environment conditions. A consequence of this permanent auto-calibration is that (in normal operation) the conversion error is continuously minimised. Page 25 Page 26 Chapter 4. Review of TDC Architectures. Several methods have been proposed in the past to solve the problem of accurately measuring time. Traditional techniques fall into a few categories [12]: counter based techniques, vernier techniques, pulse overlap techniques and current integration techniques. TDC circuits can be built using discrete, standard, components and therefore avoid the need to develop special purpose monolithic circuits. Recently the demand has been pushing for higher level of system integration and lower power dissipation, domains where traditional methods find it difficult to compete. The advent of sub-micron digital CMOS technologies, due to their availability, has enabled the emergence of new TDC architectures. Time interpolation using delay line based architectures can achieve comparable resolution to the more traditional methods and profit from the new technology’s capabilities in terms of integration and power dissipation. An historical review of time interval measurement circuits can be found in [12]. In the meantime several architectures have been described in the literature, but only partial review papers have been published (for ex. [13]). In this Chapter, a small review of the most relevant architectures is presented, focussing on the topics that are fundamental for this work: time resolution, dynamic range, power dissipation, calibration, possibility of sharing a common time interpolator block between several integrated channels and cost. A table summarising the characteristics of each of the architectures described is presented in the end of the chapter. 4.1. Overview of TDC architectures. 4.1.1. Current integration techniques. Current integration is probably the most common technique used for time interval measurements. In this architecture, a capacitor is charged linearly with a constant current I. The charging of the capacitor is gated on by a “start” pulse at time t1 and off by a “stop” pulse (time t2). The charge stored in the capacitor is thus proportional to the time interval between the “start” and “stop” pulse. Assuming a voltage independent capacitor, the voltage drop at its terminals (Vcap) is also proportional to this time interval. Page 27 V cap = I ⋅ (t 2 − t1 ) . C Any kind of ADC can be used to convert the Vcap into a suitable digital code. The time resolution of these converters can be made very high. The stability of the current source, the linearity of the capacitor and the resolution of the ADC determine the resolution that can be achieved using this technique. Another important constrain is the high noise sensitivity of the current integrating node. Differential schemes have been developed to reduce noise sensitivity and enable higher resolution measurements ([14] and [15]). Figure 1 shows the basic scheme and timing diagram of one of these techniques. The time lapsing between the “start” and “stop” signals and the end of the “gate” signal are measured by two independent Time-to-Analogue Converters (TAC). The difference between these two measurements, given by an analogue voltage at the output of a differential amplifier, corresponds to the original time interval. Mismatches between the capacitors (C) and current levels (I) in the two TACs can be taken into account via the appropriate changes to the constant of proportionality of the measure. Vcap (start) Reset Gate TAC #1 Start Hit available TAC #2 Stop Vcap (differential) Vcap (stop) tstart Reset Gate Start Stop Vcap (start) Vcap (stop) Hit available tstop Figure 1: Block and timming diagram of a differential Current Integrating TAC (from [14]). The time difference being measured is, in this case: T = t start − t stop = Page 28 ( C ⋅ Vcap ( start ) − Vcap( stop ) I ) . Chapter 4: Review of TDC Architectures. In Current integration techniques, the converter is occupied for as long as the measurement is being acquired. This results in a considerable dead time between measurements. Flash-ADC’s can be used to reduce the analogue to digital conversion time. Unfortunately the cost penalty of using these devices can be prohibitive. Another approximation is to rely on the statistical properties of the event arrival time. An analogue memory could then store the measurements before conversion thus de-randomising the event rate (see [16]). In this way, a single Flash-ADC could be shared between several channels, or a slower ADC’s could be used without any throughput penalty. Another limitation of these techniques is their limited dynamic range. Given a maximum voltage to which a capacitor can be charged (for example, the supply voltage), the only way to increase dynamic range is to decrease the constant of proportionality of the measurement, either by decreasing the current level (I) or by increasing the capacitor (C). In some applications the dynamic range is divided in separate resolution ranges [17]. In this way it is possible to measure long time intervals with a limited resolution, and measure short time intervals with high resolution. The identification of the range to which the measurement belongs is performed by selection of the smallest non-overflowing range. Low-power operation is possible (disregarding the flash-ADC dissipation). However large-scale integration is difficult due to the requirements on good analogue process characteristics and the noise sensitivity inherent to the architecture. Current levels and actual capacity values depend on process, on temperature and supply voltage, forcing calibrations of the converter. 4.1.2. Counter techniques. Counter based time measurement techniques generally rely on a Gray code counter running at very high speed. A “start” and a “stop” pulse mark the moments when the counter is sampled, the difference between these two samples corresponds to the time interval measured. The frequency and stability of the reference clock determine the resolution and accuracy of this scheme [12]. This method offers a very large dynamic range, in a highly integrated digital design. However, to obtain high resolution a reference clock frequency on the GHz range (∆tmin<1ns) is required and thus very fast processes must be used to implement it. Also it results in a power consuming system, due to the large toggling rates present. Alternatively, several counters, synchronous to different phases of the same clock can be used to increase the resolution using a slower reference clock [18]. The time measurement can be easily interpolated from the results of all the counters. The accuracy of the synthesised clock phases sets the achievable resolution. These techniques are sensitive to the metastability in the counter’s registers. If the sampling “start”/“stop” signals arrive when the counter is toggling, the resulting output Page 29 may be unpredictable [19]. Simple Gray code counters are less sensitive to this problem, since only one bit toggles for each clock transition. Interpolation between several Gray code counters can worsen the problem because in that configuration one bit toggle in one counter corresponds to more than a single Least Significant Bit (LSB) change. 4.1.3. Delay line-based techniques. The clock rate requirements that limit the use of counter techniques for time interval measurements can be relaxed if the basic CMOS gate delay is used as the time unit. Modern CMOS technologies have gate delays in the order of 100ps thus the resolution of the conversion can be quite good. In this technique, several delay elements (usually inverters [20], alternatively segments of a transmission line can be used [21][22]) make up a delay line through which a signal pulse is propagated. The progression of the pulse along the delay line reflects the time interval being measured. In Figure 2 an example of such a line is shown. Delay elements made of two inverters make good building blocks for these lines since they respect the polarity of the input signal in every output tap. Alternatively differential cells can be used, but they result in higher static power dissipation. Pulse Tap 0 Tap 1 Tap 2 Tap 3 Tap 4 Tap N Figure 2: Delay line using double inverters as delay elements. Since standard CMOS technologies are used, an easy to design and highly integrated monolithic converter can be developed. Complex systems, including the converter and large logic units can be integrated in a single IC with low power dissipation. However the delay of a CMOS gate is highly dependent on the process parameters, temperature and supply voltage, therefore requiring frequent calibration. The linearity of the conversion transfer function is determined by the matching of the delay cells. Strict design rules must be followed to reduce device mismatch to acceptable levels. Large dynamic ranges can only be achieved if very long delay lines are used. Since long lines are difficult to obtain, this technique is limited to short dynamic ranges. Page 30 Chapter 4: Review of TDC Architectures. 4.1.4. Phase Locked Loop (PLL) techniques. Some of the limitations of the delay lines previously discussed can be overcome by continuously adjusting the delay of its elements, using as a reference a clock signal. If the delay line is closed in a voltage controlled ring oscillator (VCO) topology and the oscillation frequency is controlled via a feedback loop, a PLL is obtained. Control of the delay of each element can be performed by limitation of the current available to it [23]. Analogue control loops are common [24], but digital loops have also been implemented [25]. Alternatively to current limitation, the load at the output of each delay element can be controlled [26]. This kind of system is able to generate precisely timed signals that can be used in time interval measurement instruments. The inclusion of the oscillator in a closed loop guarantees self-calibration and, thus, low sensitivity to environmental and process changes. It’s interesting to note that the need to have dynamic control of the delay of the delay line leads to a slowing of the line in typical operation, meaning that the technology is not pushed to its limits. Like in any delay line based architecture, delay cell mismatch limits the linearity of the conversion. Using asymmetric ring oscillators [24] or differential pairs [27] as the cells of the oscillator, it is possible to obtain the convenient 2N number of time bins per clock cycle. Measurements performed using this technique are related to the reference clock. If a time interval is to be measured, the difference between two measures acquired at the end and at the beginning of the time interval must be subtracted. VCO Clkref Phase Frequency Detector Charge Pump Hit Hit registers Figure 3: Asymmetric ring oscillator [24], able to generate a 2N number of timing signals from an oddnumbered oscillator. PLL based circuits have the convenient property of being able (depending on the closed loop properties) to filter out phase noise (jitter) associated with the reference clock, therefore loosening the requirements for the time reference path. Jitter internal to the loop can also be filtered. However, the increased PLL bandwidth required to perform that filtering reduces the filtering capability of the jitter associated with the reference. Note that phase noise generated within the VCO is accumulated between oscillator periods, thus leading to increased output jitter, when compared to other delay line based schemes, such as the Delay Locked Loop (DLL) [28]. Page 31 Large dynamic ranges can be obtained by counting the number of oscillations of the ring oscillator. The less significant bits of the measurement are thus obtained from the PLL and the most significant bits from the counter. Since both parts of the measurement are generated using the same reference signal (the oscillation period), there is no ambiguity in the final result. PLL’s have been extensively discussed in literature (for example in [29] and [30]), demonstrating their flexibility, high integration level and low power dissipation. However they require careful layout design, to ensure that all the cell delays are identical and that the interconnection capacity on the output of each cell is matched. A PLL is a second (or higher) order system, therefore the loop stability must be carefully evaluated. 4.1.5. Delay Locked Loop (DLL) techniques. If the delay line is not closed and it is included inside a feedback control loop, then a DLL is obtained [13][31]. Various topologies of the control loop have been described, but they typically include a Phase Detector to measure the phase error and a filter that converts this information into a meaningful quantity. In contrast to PLL’s, the reference clock signal is injected directly into the voltage controlled delay line (VCDL) and its phase is compared with the corresponding phase in the output of the line (see Figure 4). A DLL has some characteristics in common with a PLL such as the ability to generate precisely timed signals with high resolution, the self-calibration of the system and the large dynamic ranges achievable. In order to guarantee a good linearity between consecutive delay elements, matching of devices is a critical parameter. VCDL Clock Phase Detector Hit Charge Pump Hit registers Figure 4: Delay Locked Loop and hit registers. Self-calibration is based on phase information from the extremes of the delay chain. To guarantee that the delay chain is permanently calibrated, the reference clock must be constantly circulated through it. A constant level of power is thus dissipated, regardless of the rate of the hits being acquired. Dynamic ranges wider than the reference clock period can be achieved by introducing a counter synchronous to the reference clock. Since both the DLL and the coarse counter measurement are obtained with the same reference, the expansion of the measurement’s dynamic range is unambiguous. Using this technique a time stamp Page 32 Chapter 4: Review of TDC Architectures. converter is obtained, where the time measurements is referred to the clock signal. In many applications the reference clock can be used as the “start” or “stop” signal. If that is not the case, “start”/“stop” measurements can easily be obtained by subtraction of the time stamps of these two signals. Unfortunately, this kind of controlled loop, unlike PLL loops, lack the capability of filtering jitter coupled to the reference signal. Therefore the time critical paths should be designed to be noise insensitive and the reference clock must be stable. Careful design of the delay locked loop is also essential, in order to guarantee that each of the delay cells have the same delay characteristics. DLL’s can be built using standard digital CMOS technologies, which allows for a high integration level and thus lowers system costs. Sensitivity to environmental conditions is factored out by the self-calibration mechanism and noise sensitivity can be lowered to acceptable levels by careful layout and power distribution. 4.2. Beyond the limits of the technology: techniques to improve resolution. The schemes previously presented have their time resolution limited to the unit cell delay, usually made of two inverter gates. As the demand for higher resolutions grows, faster technologies must be used. Unfortunately the access to these technologies is, at present, rather expensive. Another possibility to overcome the resolution limit is to devise different techniques that are able to interpolate time within the basic cell delay. Several architectures have been proposed in the literature, some of them are discussed in the next few sections. 4.2.1. Analogue time expansion. The analogue time expansion technique extends the current integration technique into a scheme where the time interval to be measured is stretched by a factor k dependent on the circuit’s parameters. The expanded time interval thus obtained can be measured by any TDC with smaller resolution. Several topologies can be used to obtain a time stretcher. The simplest one is in fact similar to the Wilkinson ADC. In this topology the capacitor that was charged during the measurement is discharged with a much smaller current. The ratio between the charge and discharge current is the stretch factor k. If this factor is big enough, a simple counter based TDC can be used to measure the discharge (stretched) time interval and thus obtain the original time measurement with improved resolution. Even finer resolution can be obtained using DLL based TDC’s. The dynamic range obtained using this technique can be extended if the start and the stop time are separately measured in relation to a reference clock and the number of clock Page 33 cycles elapsing from one measure to the other are also recorded [32]. A refinement of this technique allows for the simultaneous calibration of the stretch mechanism [33]. clkstretch pulse synchronised pulse integrator voltage output to TDC T1 T2 T3 T4 Figure 5: Timing diagram of the dynamic range extension using a clocked time stretcher [33]. For each pulse to be measured, the TDC captures two time intervals. The first (T2T1) reflects the unstretched time difference between the pulse arrival and an edge of the reference stretch clock, the second (T3-T2) reflects the stretched image of this time difference. The stretch factor k is: k= T3 − T2 . T2 − T1 To obtain a high precision measurement of k, an average of several random time difference measurements is performed. Since the normal data is uncorrelated with the stretcher reference clock, this averaging operation will reduce the error to acceptably small levels. Previous techniques are sensitive to noise in the integrating node or non-linearity of the capacitor. This sensitivity can be reduced by the use of two identical capacitors that are discharged by different currents respectively when the Start and Stop signals arrive. A comparator is used to identify the moment when the voltages on the two capacitors are again the same, as is shown in Figure 6. If the “stop” discharge current is k times the “start” current, then the resulting time expansion is given by the following expression, where tsame is the extended time interval limit: t same − t start = ( ) k ⋅ t stop − t start . k −1 In a differential architecture like this the expanded time is very insensitive to supply noise, or to any non-linearity of the capacitor or current sources used, as long as they affect both branches in the same way. Any mismatch between capacitor values or current levels will only produce a change in the expansion factor k, which can easily be calibrated at set-up time. Page 34 Chapter 4: Review of TDC Architectures. reset Q same C C start stop I k.I start stop same t Figure 6: Time expander circuit and corresponding timing diagram. The main disadvantages of this scheme is the demanding requirements it sets on the comparator in terms of offset and propagation delay stability along a considerable common mode. Its rather short dynamic range and considerable dead time between measurements also can limit its utility, especially in high hit rate applications. Large dynamic ranges can be obtained if the measurement is in some way synchronised to a reference clock. If the time difference from the start to a clock edge and from the stop to a clock edge is added to the time between these edges, dynamic range becomes independent of the charge/discharge current levels or capacitor sizes. 4.2.2. Vernier differences. This technique is an extension of the analogue vernier technique [12] where the two reference signals with slightly different periods are substituted by more convenient delay lines with different delay per cell [34]. A “start” and “stop” pulses are propagated through each of these lines. Start T1 T1 T1 T1 T1 D Q D Q D Q D Q D Q T1 > T2 Reset Stop T2 T2 Tap 0 T2 Tap 1 T2 Tap 2 T2 Tap 3 Tap 4 Figure 7: Time expansion using two delay lines with different cell delay. The rising edge of the “stop” pulse latches the state of the “start” delay line. If the cell delay T1 of the “start” delay line is slightly bigger than the cell delay T2 of the “stop” Page 35 delay line, the position of the first flip-flop not set (N) gives the time interval between the “start” and “stop” signals, that is Tin = N ⋅ (T1 − T2 ) Very good time resolution can be obtained with this technique. In order to save silicon area, several improvements can be made: the “stop” delay line can be replaced by the propagation delay of each D flip-flop. Another technique is to use a single delay line with different rise (Tr) and fall (Tf) times in the “stop” path, and to connect the “start” line to logical one [13]. This results in a shrinking pulse and the position of the first flip-flop that is not set gives the original pulse width in terms of Tr-Tf. Converters using these schemes have a very limited dynamic range and require very long delay lines for the desired resolution level. Also when pulses are propagating through the lines, no other hits should occur, leading to some dead time between measurements. Another drawback of these schemes is the difficulty of controlling the bin sizes in each line used. Process spreads, temperature and supply voltage influence these delays, therefore frequent calibrations of the circuit are required. Vernier techniques are very sensitive to the matching of the delay of the cells across the delay lines. The effects of mismatch are amplified by the nature of the time interpolation, where the high resolution is obtained from the small difference between the (comparatively) large delay of the cells in each of the delay lines. Circular vernier method. The need of very long delay lines to obtain a reasonable dynamic range can be obviated if the two lines are closed in a ring oscillator-like structure, such as the one shown in Figure 8. Theoretically this configuration corresponds to an infinite length line and thus arbitrary dynamic ranges should be obtainable. Start T1 T1 T1 T1 T1 D Q D Q D Q D Q D Q T1 > T2 Reset Stop T2 T2 Tap 0 T2 Tap 1 T2 Tap 2 T2 Tap 3 Tap 4 Figure 8: Circular vernier scheme for dynamic range expansion. Both the “start” and “stop” signals are fed into the respective delay line via a multiplexer. As soon as these signals are progressing within the delay line, the Page 36 Chapter 4: Review of TDC Architectures. multiplexers are switched thereby establishing a ring oscillator like structure. Counting the number of oscillations completed by each of the signals before they coincide enables the correct expansion of the dynamic range. Unfortunately the inversion of the signal propagating on these ring oscillators makes the decoding of the moment when the two signals coincide difficult. Solutions have been proposed where different structures are used to detect the coincidence of the two signals in a different way depending on the number of oscillations that occurred in each oscillator [35]. However the usage of different structures in the time critical circuitry makes it hard to equalise their dynamic response in all conditions. This may produce considerable non-linearity on the conversion transfer function. Another undesirable side effect of this closed loop topology is that all timing errors that may occur during the measurement time (due to noise or any other source) will accumulate in the final measurement. This scheme has the property of integrating all the errors present during the measurement time. Calibration, using a PLL-like control around the closed delay line may only be done off-line, when there are no measurements. In a high hit rate environment calibration can only be performed infrequently, which may result in loss of accuracy. Furthermore, coupling between the two closed delay lines may also be a problem. Due to layout considerations they should be implemented close together, and to obtain good resolution, their oscillation frequency (delay of cells) should be very similar. If coupling is present and there is no active control of the lines during measurement, one of the lines may be pulled to oscillate at the frequency of the other line, which would ruin the measurement. To avoid this problem, calibration can be performed using a dummy channel in a double PLL like structure. Control information derived from it can be used to control the delay of the lines even when measurements are being performed. In this way all the lines are actively pulled to their correct oscillation frequency. The calibration circuitry can be shared between all channels in a circuit, therefore resulting in an efficient use of silicon. Dual scale vernier method. There is an alternative implementation of the vernier technique where the dead time between measurements is small and the converter is self-calibrating. Contrary to previous techniques, this technique results in time stamp measurements. The principle of operation is the same as the vernier caliber (Figure 9) used to measure length [36]. Two scales are required, the reference scale, which has a time bin T and the vernier scale, which has a time bin slightly shorter, but spans N reference bins. The difference between the two scales determines the bin size of the converter. For example, to obtain a bin of 0.1·T the vernier scale must span 9 reference bins, being divided into 10 time bins. A measurement word is made of two components, the higher order bits are obtained from the reference scale and the lower order bits form the vernier scale. Page 37 0 1 0.43 Figure 9: A vernier caliber measuring a length of 0.43 mm. Note that the third tick mark in the vernier scale (lower) lines up with a tick mark in the reference scale (upper) [36]. The reference scale can be made with a counter counting cycles of a reference clock. The vernier scale is, for example, a DLL calibrated delay line that spans 9 clock cycles and is divided into 10 time bins1. When the hit signal is asserted the status of the two scales is captured. The low order bits of the measurement result from the identification of the next bin that will switch. If this bin number is n, then the resulting time measure is: 1 t = Mod 1 − ⋅ T ⋅ n, T + m ⋅ T , F where Mod(a,b) is the modulus operation, F is the interpolation factor and m is the reference scale measurement. The number of time bins into which the vernier line is divided is equal to the interpolator factor F. The number N of clock cycles that it spans is F-1. This technique is very sensitive to the accumulation of non-linearity along the vernier delay line. This sensitivity is amplified if a high interpolation factor is implemented since the length of the line is increased and the LSB is shortened. 4.2.3. Analogue time interpolation. In a locked DLL, the signals propagating through the delay chain have edges with almost constant slopes, directly related to the delay of the delay elements. By performing an analogue sum of the signals in consecutive time taps, it is possible to obtain a time interpolation between these taps, thereby increasing the resolution to a level that is better than the intrinsic delay of a delay cell (Figure 10). The design of such a system is made difficult by the need to match the delay through the summing circuitry with the direct signal from the taps themselves. An alternative approach is to store all the analogue voltages from each tap when a hit occurs, and later perform the interpolation, either by analogue summing, or by using the stored voltages as inputs to a weighted filter which output would then be converted using an ADC. 1 In fact the delay line includes many more delay elements to avoid interactions between leading and trailing edges of the signal that progresses in it. Page 38 Chapter 4: Review of TDC Architectures. Clock Phase Detector + + + Hit + + Charge Pump + Hit registers Figure 10: Time interpolation using voltage sums. Small ring oscillators, controlled by a PLL structure, can also be used as the basis of the time interpolation [37]. First order equalisation of the delay between different time taps is obtained by including a dummy analogue phase interpolator (weighted sum of the voltage at its two inputs) in the non-interpolating taps. In this scheme the phase interpolator circuit must be calibrated to improve the linearity of the interpolation. Other interpolation techniques try to generate the voltage ramp typical of current integration schemes in a “digital” form [38]. As the “start” signal progresses along the delay line, a voltage ladder is generated on the summing node. Each step represents the crossing of a new delay cell by the “start” signal. A high order filter can be used to smooth out the edges of the steps, thus obtaining the intended voltage ramp. The “stop” signal forces each delay cell into high impedance and disconnects the hold capacitor at the filter’s output, allowing the resulting measurement to be kept stable for the time necessary to process it via an ADC. 1 Stop Start Q enable 0 16 digital gates with tri-statable outputs Q 1 0 R R R R R Reset High order filter *1 Analog output *1 Hold capacitor Figure 11: Time to analogue converter using a time interpolation technique [38]. When compared to current integration techniques, this scheme has the advantage of being potentially less sensitive to noise coupling into the summing node. Since the interpolation is done resistively, the node has much less impedance than a capacitive node and there is no integration of noise effects over the measurement period. To convert the Page 39 measurement into a binary word, an ADC must be used, which will increase power dissipation and system costs. 4.2.4. Array of coupled oscillators. Some techniques have been proposed to increase the resolution of PLL based time interpolation circuits to time intervals smaller than the intrinsic gate delay. One way of achieving this is to use an array of coupled oscillator rings [39]. Each delay cell is made of a dual input voltage controlled buffer. Both inputs have the same polarity and together they define the output transition time. One of the inputs is used to form the ring oscillator, the other to couple consecutive ring in the array as shown in Figure 12. If a fixed phase shift is established between two consecutive oscillators, then the identical coupling between oscillators will create a uniform phase shift between all oscillators. The oscillation frequency remains the same for all oscillators. The time resolution achieved is the cell delay (td in Figure 12) divided by the number of rings in the array. The fixed phase shift is established by connecting the outputs of the boundary oscillator to the inputs of a cell located in a different position on the oscillator in the opposite extreme of the array. In this architecture the time bin is defined by two closely coupled delay cells that belong to separate ring oscillators. The inter-coupling between consecutive rings forces the size of each time bin to be set by the complete array. This intimate coupling guarantees a good linearity of the conversion function. However, device matching is a critical parameter for this topology. T4 T5 T1 T2 T3 T2 T3 T4 T5 td T1 Figure 12: Coupled oscillators (time resolution of td * 2 / 3). At initialisation time several modes of oscillation for which the array’s boundary conditions are met will be present. Each corresponds to the case of having a phase shift between the boundary oscillators that is a multiple of the oscillation period. The locking Page 40 Chapter 4: Review of TDC Architectures. procedure has to be able to force the circuit into the correct mode, where phase shift is smaller than one oscillation period. This task may not be trivial. The resolution achievable with this architecture is defined as: Tbin = x ⋅ T + k ⋅ T (2 ⋅ N ) , M where T is the oscillation period set by a PLL control loop, N is the number of delay cells per oscillator and M is the number of oscillators in the array. Variable k reflects coupling topology of the boundary oscillators (offset in number of delay cells) and x the arrays’ modes of oscillation. The correct mode of oscillation is when x= 0. It results in the smallest bin size. Layout of these circuits is critical to their correct behaviour. Every delay cell must drive exactly the same load, if a good linearity of the measurements is to be maintained. Therefore, a good layout of the consecutive rings is essential in order to guarantee that the rings on the extreme of the array are in the same conditions as the rings in the middle and that there is no systematic effect that affects the size of some time bins. The same considerations apply to the delay cells on the extreme of each ring. Interleaving oscillators and the delay cells that make them is, therefore, essential. This architecture enables high time resolution and large dynamic range in a conveniently dead-timeless converter system. It can be implemented in standard CMOS technologies, thereby allowing for high levels of integration and low system costs. However it suffers from the same drawbacks of other PLL’s such as sensitivity to VCO internal noise and error feedback from the end to the beginning of each oscillator ring, etc. Sharing the array of coupled oscillators between several channels is an effective way to compensate for the higher power dissipation required by the use of several ring oscillators. 4.2.5. Array of Delay Locked Loops. The use of an array of several uniformly offset DLL’s can increase the resolution of a system to a fraction of the intrinsic gate delay [23][40]. A different DLL (herein referred as Phase Shifting DLL), made with a smaller number of delay elements, is used to precisely generate the required offsets. In order to increase the resolution of the converter, the offset between DLL’s should only be a fraction of the delay of the basic cell. This fraction cannot be obtained directly, but a delay that is a fraction bigger than the basic cell delay is easily obtained using a phase shifting DLL locked to the same reference. An arrangement like the one in Figure 13, due to the symmetry of the array, is made to look like the DLL’s in the array are only offset by a fraction of the basic cell delay. The time bin of such a circuit is Page 41 Tbin = Tm − Tn = Tclk Tclk − . M N If the required time bin size is a fraction F of the basic cell delay of the DLL’s of the array, then the relation between M, N and F can be expressed as M =N⋅ F , F +1 where M, N and F are integers. One disadvantage of this scheme is its inability to divide the reference period in a number of bins that is a power of two. This means that the measurement obtained will not be in a pure binary unit of 1/2N, but rather in a unit of 1/(N·F). A special encoder that converts this code into a normal binary code must be used, if it is to be used together with other binary measurements such as dynamic range extension using the coarse time counter results. Clk N φ1 tn tm φ2 Vc φ1 tn φ2 Vc φ1 tn M φ2 Vc φ1 tn φ2 Vc M<N φ2 φ1 Vc Figure 13: Array of DLL’s with phase shifting DLL. Extensions to this architecture, where the use of auxiliary (controlled) delay lines allow for the realisation of any number of subdivisions of the clock period (including the pure binary number) have been proposed [41]. Unfortunately they increase the complexity of the array and thus render it more difficult to design. Power dissipation is also a concern on this architecture due to the large number of DLL’s that are continuously active. This drawback can be limited if several channels share the same array. Page 42 Chapter 4: Review of TDC Architectures. Like all DLL based techniques, this technique can be implemented in a standard “digital” CMOS technology. It is therefore easy to integrate it with digital processing logic in order to build a complex TDC system in a single IC. 4.2.6. Time interpolation using passive RC delay lines. Most of the techniques discussed so far use to their advantage a closed control loop to guarantee that the converter is permanently calibrated. Schemes to increase the limited time resolution that can be directly obtained are based on time interpolation. They usually require more closed loops in complex topologies, which invariably lead to higher power dissipation and increased non-linearity. Minimum delays for a given architecture can only be achieved if the parasitic RC delay lines present in every metal or polisilicon line are used (see [21] for an example). Delay lines built in this way suffer from a big parameter spread due to process constrains, rendering their exact delay difficult to predict. On the other hand they are rather insensitive to supply voltage and temperature variations. In order to obtain the desired delay from these lines, a calibration procedure must be used. Calibration is mainly needed at start-up, during normal operation the slow temperature variations and supply changes will not affect substantially the behaviour of the lines. Such a delay line can be used as a stand-alone delay generator, but a converter built this way would have very limited dynamic range. However, when used together with a DLL, this limitation is overcome. This converter adds the high resolution possibility to the other benefits of a DLL based scheme, such as large dynamic range, self-calibration, etc [22]. The block diagram in Figure 14 depicts the scheme. When the hit signal is asserted, several (M) consecutive samples of the status of the DLL are acquired with a constant time interval between them. If this time interval is made such that it is a fraction 1/M of the cell delay, it is possible to perform time interpolation within the delay of a DLL delay cell by identifying after which sample the reference clock has exited a given cell. If the reference clock has a period T, and the DLL is made of N delay cells, the bin size of the resulting converter is: Tbin = T . N ⋅M In this scheme there is no restriction to the values of N and M, therefore it is possible to directly obtain the measurements performed in a pure binary format. Page 43 N delay cells clkref PD hit M rows RC delay line hit register hit register hit register hit register M taps Figure 14: A T/D converter based on a DLL and a RC delay line. To operate in a truly self-calibrating mode, the circuit that implements this scheme should also include the RC delay line’s start-up calibration hardware. Fortunately a simple code density test is sufficient to characterise the RC delay line. From this characterisation the calibration parameters are obtained and then applied to the line. Any standard CMOS technology can be used to implement this scheme. 4.3. Summary of characteristics of the TDC architectures. In the following table, a summary of the interesting characteristics of the architectures that have been discussed in the chapter is presented. Architecture Resolution Dynamic Range Dead Time Current Integration Counter Delay Line PLL DLL Analogue Time Expansion Vernier Differences Circular Vernier Dual Scale Vernier Analogue Time Interpolation Array of Coupled Oscillators Array of Delay Locked Loops DLL / RC delay line + + + + ++ ++ ++ ++ ++ ++ ++ ++ inf. inf. inf. - / inf. inf. inf. inf. / inf. inf. inf. no no no ---no no / no no no Time Auto Power Interpolator Technology Calibration Consumption Sharing yes yes yes - / +yes +- / yes yes +- + + + + + + + +++ no yes no yes yes no no no yes yes / no yes yes yes (DLL) Ref. [14] analogue [12] digital [20] digital [24] digital [13] digital analogue [32]/[33] [34] digital [35] digital [36] digital analogue [37]/[38] [39] digital [23] digital [22] digital Table 1: Comparison between the different architectures discussed in the chapter2. 2 Inf. (infinite) means that there is no intrinsic limit to the dynamic range that can be implemented. No dead time means that there is no dead time in the time interpolation circuitry. There may be some dead time associated with the read-out of the measurements. + and – means that the characteristic under consideration is advantageous (disadvantageous). + – means that the condition is only partially met or that it is only met under certain conditions. Page 44 References for Part I. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] Rubbia, C., The quest for the infinitesimally small, CERN/PPE 94-15, Feb. 94. Verweij, H., Electronics for experiments at CERN, CERN/ECP 91-4, Feb. 91. The ALICE collaboration, ALICE – A large ion collider experiment technical proposal, CERN/LHCC 95-71, Dec. 95. Gomes, P. On-line algorithms for future HEP data acquisition systems, PhD. thesis, Universidade Técnica de Lisboa, 1995. Batyunya, B. et al., Influence of the time resolution of the time-of-flight system in ALICE on the measurement of observables, ALICE/SIM 98-08 Internal note, Feb. 98. Kluge, A., ALICE Time-of-Flight Readout – AFRO, ALICE Internal note, Jun. 99. Martins, R. C. et al., Taxonomic problems on ADC characterisation, Proceeding of the 5th. IEEE International Conference on Electronics, Circuits and Systems, Vol. 3, pp. 445-448, Sep. 98. Razavi, B., Principles of data conversion system design, IEEE press, Chapter 6, 1995. Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84. Gray, P. R. et al., Analysis and design of analogue integrated circuits, John Wiley & Sons, Inc, Chapter 11, 1993. Fish, P. J., Electronic noise and low noise design, McGraw Hill, Inc, 1994. Porat, D. I., Review of sub-nanosecond time-interval measurements, IEEE Transactions on Nuclear Science, Vol. 20, pp. 36-51, 1973. Rahkonen, T. E. et al., The use of stabilized CMOS delay lines for the digitization of short time intervals, IEEE Journal of Solid-State Circuits, Vol. 28, No.8, pp. 887894, Aug. 93. Tanaka, M. et al., Development of Monolithic Time-to-Amplitude Converter for High precision TOF Measurement, IEEE Trans. on Nuclear Science, Vol. 38, No. 2, pp. 301-305, Apr. 91. Sasaki, O. et al., A high-resolution TDC in TKO BOX system, IEEE Trans. on Nuclear Science, Vol. 35, No. 1, Feb. 1988. Stevens, A. E. et al., A Time-to-Voltage Converter and Analog Memory for Colliding Beam Detectors, IEEE Journal of Solid State Circuits, Vol. 24, No.6, Dec. 89. Yamrone, B. et al., LeCroy MQT300 charge-to-time converter, Conference Record of the IEEE Nuclear Science Symposium 1996. Vol. 1, pp. 436-438, Nov. 96. Page 45 [18] Veneziano, S. et al., Performances of a Multichannel 1 GHz TDC ASIC for the KLOE Tracking Chamber, Proceedings of the Elba conference on Advanced Detectors, 1997. [19] Kim, L.-S., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-State Circuits, Vol. 25, No. 4, pp. 942-951, Aug. 90. [20] Bailly, P. et al., A 16-channel digital TDC chip, Conference Record of the IEEE Nuclear Science Symposium 1997. [21] Gogaet, S. et al., A 10 ps resolution 1.6 ns tuning range CMOS delay line for clock deskewing in data recovery systems, Proc. ESSIRC'95, Lille - France, pp. 54-57, Sep. 95. [22] Mota, M. et al., A high-resolution time interpolator based on a Delay Locked Loop and an RC delay line, IEEE Journal of Solid-State Circuits, Vol. 34, No. 10, pp. 1360-1366, Oct. 99. [23] Mota, M. et al., A four channel, self-calibrating, high-resolution Time-to-Digital Converter, Proceedings of the 5th. IEEE International Conference on Electronics, Circuits and Systems (ICECS’98), Lisboa, Portugal, Sep. 98. [24] Arai, Y. et al. A time digitizer CMOS gate-array with a 250 ps time resolution, IEEE Journal of Solid-State Circuits, Vol. 31, No. 2, pp. 212-220, Feb. 96. [25] Dunning, J. et al., An all-digital Phase-Locked Loop with 50-cycle lock time suitable for high-performance microprocessors, IEEE Journal of Solid-State Circuits, Vol. 30, No. 4, pp. 412-422, Apr. 95. [26] Johnson, M. G. et al., A variable delay line PLL for CPU-coprocessor synchronisation, IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 12181223, Oct. 88. [27] Loinaz, M. J. et al., A CMOS multichannel IC for pulse timing measurements with 1 mV sensitivity, IEEE Journal of Solid-State Circuits, Vol. 30, No. 12, pp. 13391349, Dec. 95. [28] Weigland, T. C. et al., Analysis of timing jitter in CMOS ring oscillators, Proceedings of International Symposium on Circuits and Systems (ISCAS), Jun. 94. [29] Razavi, B. et al., Monolitic phase-locked loops and clock recovery circuits – theory and design, IEEE press, 1996. [30] Gardner, F. M., Phaselock techniques, John Wiley & Sons, 1979. [31] Christiansen, J. et al., An integrated 16-channel CMOS time-to-digital converter, Conference Record of the IEEE Nuclear Science Symposium 1993, pp. 625-629, Oct. 93. [32] Raisanen-Ruotsalainen, E. et al., A time digitiser with interpolation based on Timeto-Voltage Conversion, Proceedings of the 40th. Midwest Symposium on Circuits and Systems (MSCAS), Vol. 1, pp. 197-200, Aug. 97. [33] Blanar, G. et al., A self-calibrating high-resolution common stop time digitiser circuit, IEEE Transactions on Nuclear Science, Vol. 45, No. 3, Pt. 1, pp. 801-804, Jun. 98. [34] Bailly, P. et al., A 100 picosecond resolution, 6 microsecond full scale multihit time encoder, in CMOS technology. Proc. of Third International Conference on Electronics for Future Colliders, pp. 57-68, May 93. [35] Fota, C., Modélisation et étude de faisabilité d’un codeur de temps numérique à haute résolution en technologie intégrée sur Silicium et Arséniure de Gallium. Thèse de Doctorat de l’Université Pierre et Marie Curie (Paris VI), Dec 96. Page 46 References for Part I. [36] Gorbics, M. S. et al., A high-resolution multihit time to digital converter integrated circuit, IEEE Transactions on Nuclear Science, Vol. 44, No. 3, Pt. 1, pp. 379-384, Jun. 97. [37] Knotts, T. A. et al., A 500MHz time digitiser IC with 15.625ps resolution, Digest of Technical Papers of the IEEE International Solid-State Circuits Conference 1994, Vol. 37, pp. 58-59, Feb. 94. [38] Neyer, C. et al., Internal Note ALICE 94-07 (CERN). [39] Maneatis, J. G. et al., Precise delay generation using coupled oscillators, IEEE Journal of Solid-State Circuits, Vol. 28, No. 12, Dec. 93. [40] Christiansen, J., An integrated high-resolution CMOS timing generator based on an array of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol.31, No.7, pp. 952-957, Jul. 96. [41] Chu, H.-C. et al., A General High-Resolution Multiphase Clock Generator, submitted to the IEEE Journal of Solid-State Circuits in Oct. 97. Page 47 Page 48 PART II. A TDC ARCHITECTURE BASED ON AN ARRAY OF DELAY LOCKED LOOPS. Page 49 Page 50 In this Part of the dissertation we will discuss the work performed in order to develop and demonstrate an architecture suitable for high-resolution time interval measurements in the context of the ALICE Time-of-Flight detector collaboration. Particle identification in the ALICE experiment requires an accurate measurement of the time that the particles take to cross a cylindrical surface located at a fixed distance from the interaction point. For this purpose a dedicated Time-of-Flight detector will be built. The detector itself is able to resolve time with a resolution between 40ps and 100ps RMS, depending on the technology chosen [1]. All the front-end components must have a better resolution, in order not to compromise the characteristics of the detector. Given the time uncertainty associated with the response of the detector and with the underlying physical process, the main performance metric used to characterise the frontend electronics is the standard deviation of the error it generates, σ, also known as the RMS (root mean square) resolution. In this application, it is required that the Time-toDigital converter has a RMS resolution better than 50ps across the full dynamic range. Depending on the measurement method used (time tagging or start-stop), the dynamic range that is required varies. To avoid any ambiguity, especially when the time tagging method is used, the TDC must allow for a large dynamic range. Another important feature of the detector is its granularity. In order to differentiate particles crossing the detector close to each other, it is subdivided in a large number of independent detector cells, each having its dedicated front-end. Therefore a large number of electronic channels are required (> 150,000), of which the front-end must sit close to the detector. The number of channels involved and area constrains imply a large electronics integration level. It is our goal to demonstrate an architecture that adheres to all the previous requirements in terms of resolution and potential for dynamic range expansion. It allows for start-stop and time tagging measures and has a low dead time between measures. We Page 51 will use a standard “digital” CMOS technology that has a proven digital library available, so that digital functionality can be easily implemented at low costs. The architecture enables the integration of several TDC channels into a single chip and allows the sharing of common data processing and buffering logic. To demonstrate this feature, four conversion channels and a small number of simple system-related functions such as data encoding and buffering are included. In this way the basic functionality required to build a time acquisition system is included in the demonstrator. The first chapter of this part (Chapter 5) is dedicated to the presentation of the architecture being used. Analytic tools developed to study the way different errors that may occur in the conversion circuitry will be exposed in Chapter 6. The Chapter 7 includes a detailed description of the important electronic blocks that define the converter performance and in Chapter 8 this part of the dissertation is concluded by the exposition of the experimental results that were obtained using the prototype TDC. Page 52 Chapter 5. Architecture Overview. 5.1. The Delay Lock Loop (DLL). A simple instrument to measure time intervals with fine resolution can be made with a delay line tapped at regular (time) intervals. If a reference signal is progressing along that line and its position is sensed at the limits of the time interval, the measured time is proportional to the number of taps that the signal covered during this interval. The delay between two consecutive taps is the constant of proportionality. In standard CMOS technologies, the most commonly available delay cell is the logic gate. The usual choice for these cells is the inverter because of its simplicity and speed. Delays of the order of few hundreds of pico-seconds can currently be obtained under worst case operating conditions in a 0.7µm CMOS technology. Unfortunately, the gate delay is very sensitive to process parameters, temperature and supply voltage. This means that the circuit has to be characterised periodically in order to measure the delay of each gate. A simpler way of operating this circuit is to build delay elements which delay can be externally controlled. The delay of the cells is constantly sensed and forced to the desired value, regardless of environmental changes. This is the operating principle of a Delay Locked Loop (DLL). In a DLL the signal progressing through the delay line is a reference clock. A control loop encloses the delay line and constantly monitors the delay between the reference clock at the beginning and at the end of the line. If this delay is different from one clock period, the control loop adjusts the delay of the delay cells until the correct value is obtained. When the hit signal is asserted, the status of the line is stored in a set of hit registers. The stored data reflects the time difference from one edge of the reference clock to the moment the data was stored. A random time interval can be measured if two of such time differences are stored. The difference between them is the pretended measurement. The control loop has three main functions: sense the delay difference between the signal at the begin and end of the delay line, convert the error information into a meaningful quantity and integrate and hold the control information until a new decision is taken. Page 53 These functions correspond to the building blocks on Figure 1. The phase detector is used to determine if the delay line is too fast or too slow. A sequential phase detector is usually chosen to perform this function. The resulting (binary) information is then converted by the charge-pump into a “packet” of charge that is stored in (or taken from) a filter capacitor. The capacitor in this example behaves as the loop integrator. clock D Qb C Q phase detector hit hit registers charge pump Figure 1: Delay Locked Loop block diagram. In contrast with Phase Locked Loops (PLL), which have another integrator in the VCO (voltage controlled oscillator), the DLL loop is a first order system. The presence of the second integration and of the proportional term in PLL’s is due to the necessity of tracking both phase and frequency. A DLL only tracks delay (or, equivalently, phase), resulting in a simpler loop which is inherently stable. The scheme so far described acquires only a limited number of features of the hit signal, like the arrival time and possibly also the pulse length (if the delay between rise and falling edge is also measured). Alternatively, the DLL can be used to generate a time base for a set of registers that sample the hit signal with a short periodicity (determined by the cell delay) [2], as shown in Figure 2. In this way, a full picture of the timing characteristics of the hit signal can be sampled and stored. Digital signal processing algorithms can then be used to extract the interesting features from the data stream. clock D Qb C Q phase detector hit registers charge pump D D D D D D hit Figure 2: Delay Locked Loop used in a time base application. Page 54 Chapter 5: Architecture Overview. With this scheme it is easy to identify glitches (short pulses) or any other undesired pulse characteristics. However, this sampling scheme results in a continuos activity of the hit registers, increasing the power dissipation and, possibly, the noise in the power supply. Also, the data is produced at a quite high rate and therefore a large read-out bandwidth is necessary to assure that no data is lost. In these conditions, data reduction algorithms must be applied at a very early stage. 5.2. The Array of DLL’s (ADLL). The time resolution of a DLL based converter is determined by the gate delay. To obtain better resolution either a faster technology is selected, which results in shorter gate delays, or an architecture that is able to interpolate time within the gate delay is used. One way of achieving this interpolation is to use a group of F Timing DLL’s that have a small time offset between them. This offset is precisely determined by a Phase Shifting DLL, which is locked to the same reference clock (see Figure 3) [3]. clkref N tn φ1 tm Vc φ2 tn φ1 Vc tapm φ2 tn tapn φ1 M φ2 tapm+1 Tm tn tapn-1 Vc φ1 Vc φ2 Tn M<N φ2 φ1 Vc Figure 3: Array of DLL’s with phase shifting DLL, showing bin definition. A time offset smaller than the minimum gate delay is, of course, not possible to obtain directly. However, it is possible to obtain an offset that is slightly larger than the minimum gate delay. Assuming that the offset (Tm) is a fraction 1/F bigger than the delay of each delay cell in the Timing DLL’s (Tn) then, as shown in Figure 3, the time offset Page 55 obtained from corresponding taps in consecutive DLL’s is Toff = Tm= Tn·(1+1/F). If the previous tap of the second DLL is used to define the end of the bin the resulting bin size will be Tbin = Toff-Tn = Tn·(1+1/F)-Tn = Tn/F, as intended. Bins in the extremities of the Timing DLL’s are defined from taps in opposite ends of consecutive DLL’s, profiting from the periodicity of the clock. The size of a bin is defined as the delay difference of taps in the two ends of the bin: bin = tap m+1,n−1 − tap m,n = tap m+1 + tap n −1 − ( tap m + tap n ) ⇒ Tbin = (m + 1) ⋅ Tm + (n − 1) ⋅ Tn − (m ⋅ Tm + n ⋅ Tn ) ⇔ ⇔ Tbin = Tm − Tn , where m and n are the position of the taps that define the bin, as shown in Figure 3. Variable m represents the timing DLL and n is the tap number within that DLL (0 ≤ m < F ≤ M and 0 ≤ n < N). Delays Tn and Tm are related to the period of the reference clock (Tclk) by the number of taps M and N of the respective DLL’s: Tn = Tclk T , Tm = clk . N M From these equations, the relationship between M, N and F can be defined as: Tclk Tbin = N = Tclk − Tclk ⇔ F M N ⇔M =N⋅ F . 1+ F This definition unfortunately shows that for any given fraction F, the applicable values for M and N do not result in a number of bins N·F that is a pure binary number (N·F?2n, for any n). To obtain such a convenient representation, a code conversion should be performed latter in the data acquisition chain. Contrary to other vernier techniques that also use delay differences to obtain subgate resolution, in this architecture all the delay lines are locked to the same reference signal and only have to span a short length (corresponding to one reference clock period). Also, the ADLL can be shared between several channels therefore increasing the integration level and decreasing the overall power dissipation. The relations that have been established show that using this scheme, one can theoretically achieve a bin size that is any fraction of the original cell delay. In practice this is not the case since this interpolating procedure, where a small time difference (TmTn) is extracted from two large delays ((m+1)·Tm+(n-1)·Tn and m·Tm+n·Tn) is very sensitive to any errors present in the array. A small error in the definition of Tm or Tn is amplified by the nature of the interpolation and becomes a significant part of Tbin, therefore limiting the achievable resolution. Bins in the extremities of the DLL’s are also sensitive to the Page 56 Chapter 5: Architecture Overview. error accumulation, since they are interpolated from taps in opposite extremes of consecutive DLL’s. This interpolation method sets, therefore, stringent requirements on the DLL’s that make up the array. Minimisation of device mismatch and of phase error are very important design criteria. An interpolator based on the ADLL scheme can, in principle, be designed in such a way as to minimise reference clock jitter and all static sources of non-linearity. The degradation of the time resolution due to delay cell mismatch is, however, harder to deal with since it is a characteristic inherent to the fabrication of the circuit that cannot be completely eliminated by design. Therefore delay cell mismatch, and ultimately device mismatch, sets the limit to the resolution achievable with these converters. 0.3 delay cell mismatch (σ) ideal 0.25 0.2 1% 2% 0.15 3% 0.1 4% 5% 0.05 0 1 2 3 4 5 6 7 8 9 10 interpolation factor (F) Figure 4: Interpolation limits due to cell mismatch. The graphic of Figure 41 shows the root mean square (RMS) resolution that can be achieved using an ADLL based interpolator in the presence of delay cell mismatch2 (assuming N=35 delay cells per Timing DLL). As would be expected the effects of mismatch increase as the interpolation factor (F) increases. Therefore, the gain in resolution obtained by increasing the interpolation factor vanishes after a certain level of delay cell mismatch. The maximum interpolation factor that is rewarded by a consequent improvement in resolution varies between F=4 and F=5, depending on the actual delay cell mismatch. 5.3. Conversion dynamic range. The use of a periodic reference signal in the array of DLL’s makes it impossible to differentiate two measurements resulting from hit signals arriving separated by multiples 1 Given a number of cells per Timing DLL, N, some of the interpolation factors, F, displayed do not result in a realistic ADLL. However they are included for completeness. 2 As explained in Chapter 6, device parameter’s mismatch leads to identical delay cells having a different propagation delay. The delay of a cell is seen as a random variable with a normal PDF, having a variance σ2. Page 57 of the reference clock period. The dynamic range of such a converter is therefore limited to one reference clock cycle. The simplest way to increase the dynamic range would be to increase the clock period. However this solution requires the use of a longer delay chain (or, conversely, smaller resolution). A better solution is to include in the converter a counter synchronous to the reference clock. Reset n bit counter n bit counter Register #0 Register #1 Clk Hit Sel Coarse word Clk N Register #0 Register #1 N N+1 N+1 N+2 Sel Coarse word N N+1 Figure 5: Dynamic range extension using two coarse time counters. The counter is itself a converter with a coarse resolution (one reference clock cycle) but a large dynamic range (depending on the number of bits implemented). Its results can be appended to the results of the array conversion, which have a fine resolution but a small dynamic range. Since both coarse and fine time words are obtained using the same reference clock, no ambiguity is generated from the dynamic range extension. The critical moment for such a scheme is when a measurement is performed while the counter is switching and thus not yet stable. In this situation, the captured coarse word is not predictable or may be in an intermediate state and thus induce metastability in the hit registers. If two counters are used, synchronous to opposite phases of the reference clock, there is at any time one counter with stable outputs (see Figure 5). All the converter has to do is to select the correct counter results in order to obtain the correct coarse measurement. Page 58 Chapter 5: Architecture Overview. The selection of the stable counter is done in accordance with the phase of the reference clock at the moment that the hit signal is asserted. Fortunately the status of the DLL, that is acquired at the same moment, accurately reflects the phase of the reference clock, thus it can be used to determine the correct coarse result. Time stamp measurements obtained from such a converter are referred to an initial instant in the beginning of the clock period when the coarse time counters are at zero. One can thus see the counter reset signal as a common Start signal that sets the time zero at the beginning of the clock cycle. In these conditions, the initialisation of the coarse counters is also an important parameter for the performance of the converter. Start-Stop measurements don’t require any special initialisation of the coarse counter, since they are not referred to a particular initial instant. 5.4. Time critical paths. Timing information is delivered to the converter via two main signals: the reference clock and the hit signals. The reference clock is used as the basis for the measurements and the hit signals set the exact time the measurement is to be acquired. The high frequency spectral components of these signals are determinant to set the accuracy of the timing information received. These signal paths must be handled carefully since any deterioration of the respective signals’ time characteristics will not be regenerated inside the converter and thus will degrade the resolution of the converter. Jitter in the reference clock received by the array must be very small since the DLL loop is unable to filter jitter in its input signal (see Appendix B). Most of the noise that may couple into the time critical paths can effectively be factored out if differential signalling levels are used. In fact, noise coupling into signal paths at board and bonding level affect close-by paths in the same way, thus it is mainly common mode noise. Several standards are commercially available for differential signalling. Selection should be based on bandwidth, compatibility of supply levels, simplicity of receivers, etc. These considerations are not so critical inside the converter IC because there the signal paths are short and the noise environment can be designed such that the signals are not very sensitive to noise that is generated in the circuit. Increased noise immunity could be achieved if differential logic was also used throughout the time critical circuitry inside the circuit. However this increased noise immunity would be obtained at the expense of increased power dissipation. 5.5. Measurement acquisition and storage. The measurement instant is defined by the assertion of the hit signal. At this moment the status of the array and of the coarse counters is captured in a group of hit Page 59 registers. Data stored in these registers reflects the time that lapses from the beginning of the reference clock period to the instant the hit signal was asserted. The measurement consists only of the storing operation, therefore the time spent on this operation is minimal and the converter has no dead time. The hit register is the interface between the time measurement circuitry and the timing insensitive digital processing performed afterwards. Its activity has an important contribution to the converter linearity and should therefore be treated as a time critical circuitry. In order to avoid degradation of the linearity of the converter due to the acquisition stage, the latching instant must be well defined and the same for all the tap registers. This requires matching-minded approach to the design and layout of the registers, since mismatch at this level results in different latching times for each register and thus in increased non-linearity of the converter. Furthermore, the latching signal should arrive at the same time to every register involved in the measurement. If the intended resolution is very high, small propagation delays along the lines that distribute this signal will degrade the measurement accuracy, as will be shown in Chapter 6. In some topologic conditions propagation delays may accumulate resulting in non-negligible non-linearity. Due to the large number of registers integrated in one circuit, the power dissipation may be important. A side effect of the large instantaneous currents that may be required at the acquisition moment is the noise it induces in the power supply. Power supply noise at this stage may cause crosstalk between channels, if they are performing measurements concurrently. Careful power distribution is therefore necessary to reduce this effect and also the possible deterioration of the DLL’s closed loop dynamic behaviour. 5.6. Read-out architecture. A converter circuit is not complete only with the time acquisition circuitry. Important functions such as buffering, data encoding, data reduction and handling of the read-out protocol have an impact on the converter performance and enhance its functionality, turning it into an integrated time measurement system. Data buffering is probably the function that has the biggest effect on the converter’s performance (considering High-Energy Physics applications). Due to the random nature of the assertion time of the hit signal, measurements must be performed at unpredictable times. Usually the data acquisition system down-stream of the converter is only able to handle a limited data rate from any given origin, because the communication medium is shared between several data sources. Measurements acquired with shorter time separation than the read-out period would then be lost, even if the converter it-self was fast enough to process them. This would result in an increased converter dead time. This limitation can be circumvented in several different ways. The read-out rate can be made much higher, thus decreasing the minimum interval between two accepted Page 60 Chapter 5: Architecture Overview. measurements. Alternatively, a derandomising buffer can be included after the converter. This buffer holds data arriving in quick succession until it can be read-out. Also a data reduction function (trigger based data reduction) may exist that discards measures that do not qualify in the acceptance criteria. If applied it can reduce the data rate significantly. hit hit hit @ Hit rate Channel #0 Channel #1 Channel #N Channel buffer(s) @ Internal clock rate Group buffer @ Read-out rate Read-out Figure 6: Example of the first level of a read-out buffering hierarchy. The first solution is usually not applicable. Increased read-out speed increases system costs and results in an ineffective use of this resource since most of the time the high speed would not be needed. The two other solutions, if used together, are very effective in smoothing the read-out rate so that an effective usage of a low speed read-out channel can be made without increasing the dead time between accepted hits. Using one large derandomising buffer per channel would however be expensive in terms of silicon usage. A preferred solution is to build a buffering hierarchy, by partitioning the conversion channels into small groups, use a common buffer for each group and a small individual buffer for each channel (as in Figure 6). Each group of channels can then be merged into a larger “super-group” and so on, until the hierarchy that is best adapted to the application has been built. The size of the channel group and of the individual buffers is defined by the expected acquisition rate and channel occupancy, as well a by the read-out rate and allowed measurement loss. A good knowledge of the application in view is therefore required, prior to defining these buffers. Page 61 5.7. The prototype. A Time-to-Digital Converter (TDC) based on this architecture was built [4]. The circuit demonstrates the feasibility of the ADLL as a time interpolator. Furthermore, to emphasise the ability to integrate all the required functionality in a single, inexpensive, circuit, the prototype was implemented in a commercial 0.7µm CMOS technology. A block diagram depicting the prototype is shown in Figure 7. clkref PD rsttime PD 8 bit counters PD M=28 cells PD N=35 cells PD 4 channels hit<3:0> control. hit enable clkro 2-word channel buffer serial interface data encoder program interface 32-word FIFO read-out interface Figure 7: The prototype block diagram. The ADLL is made of four (F=4) Timing DLL’s each dividing the reference period in 35 parts (N=35). A 28-tapped Phase Shifting DLL (M=28) is required to achieve the correct adjustment of Timing DLL’s. An 8-bit coarse time counter is used to obtain a dynamic range extension to 256 reference clock cycles. Using an 80MHz reference clock (T=12,500ps), the bin size, over a full dynamic range of T·256=3.2µs, is Tbin = Page 62 T 12,500 = = 89.3ps . F⋅N 140 Chapter 5: Architecture Overview. The bin size of the independent DLL’s is Tm=446.4ps and Tn=357.1ps, respectively for the Phase Shifting and for the Timing DLL’s. The reference clock (and the hit signal) receivers are implemented differentially, to avoid common mode noise coupling into these time critical paths. The demonstrator includes a common data encoder that converts the ‘thermometer’ code in which the fine time measurements are encoded at the output of the ADLL into a binary encoded word. It also merges the correct coarse time word into the final measurement word. The encoding results in a data word reduction from 156-bit to 16-bit. Four TDC channels were integrated in the IC. Each channel includes a two-word deep asynchronous pipeline buffer (channel buffer). A common 32 word deep derandomising buffer (group buffer) is also included in order to ease the read-out rate requirements. This partition of the buffering hierarchy is well adapted to the low hit rate expected in the application and demonstrates the partition concept. The read-out interface logic, as well as the encoding and common buffering circuitry work asynchronously to the reference clock (clkref), using a clock (clkro) of up to 40MHz. A slow, serial read-out interface is also implemented to facilitate the necessary test and debugging tasks. All necessary programming is performed via an independent program port that is adapted for the daisy-chaining of several TDC’s in a single serial line. In fact, the prototype includes sufficient functionality to allow it to be used in the actual working environment, included in the data acquisition chain of a High-Energy Physics experiment. In the photograph of Figure 8, the main functional blocks of the prototype are highlighted. This circuit is encapsulated in a 68-pin plastic PLCC package. 5.7.1. Performance analysis. Timing characteristics. The short analysis that will be made here takes into account only the errors intrinsic to converters built using this architecture. Other sources of errors degrade the resolution of the measurements, but they can be avoided, or at least minimised by careful circuit design. The LSB (least significant bit) of the converter, in the configuration proposed, is Tbin=89.3ps. The theoretical RMS resolution σq is determined by the quantisation error: σq = Tbin 12 = 25.8ps . The resolution is, however, limited by the unavoidable delay cell mismatch. The analysis developed in the Chapter 6 shows that the maximum effect of cell mismatch is seen in the middle of the last Timing DLL (m=F-1 and n=N/2). Assuming a mismatch (σmatch) of 1%, the additional RMS error due to the array is: Page 63 2 σ ADLL = σ match ⋅ F ⋅ F −1 N F + 1 ⋅ (M − F + 1) ⋅ + ⋅ Tbin = 12.8ps . M 4 F Coarse time counter Array of DLLs Hit registers (4 channels) Read-out FIFO Read-out and encoding logic Figure 8: Prototype circuit showing main functional blocks. In addition, unavoidable jitter present in the reference clock and intrinsic to the closed loop operation is estimated to be on the order of σjitter=15ps. Adding these contributions quadratically, the overall RMS resolution should be ~32.5ps (0.36LSB). This value reflects the expected resolution if a number of converters are measured. Individual converters may have a greater or smaller resolution, depending on their actual matching parameters. Other sources of errors will most likely degrade the converter resolution, therefore this value can be used as a benchmark to evaluate the characteristics of the actual prototypes. The results of tests carried out with the prototype are detailed in Chapter 8. They show an overall RMS resolution of 34.5ps (0.38LSB), which is in accordance with the expected value previously shown. Page 64 Chapter 6. Analysis of the Limits to the TDC Resolution. In this chapter we will develop mathematical tools to predict and analyse the effects of different error sources in the linearity and in the time resolution of a DLL based converter. The analysis is extended to the more complex case of the ADLL. These analysis tools allow for the translation of important system level performance parameters into design variables that can then be used to judge the design against the expected performance. All the most important internal error sources are accounted for, namely the delay cell mismatch, the dynamic behaviour of the closed control loop and several causes of phase error. 6.1. Non-linearity due to cell mismatch. The delay cell defines the LSB of a DLL based converter. Delay differences between cells produce variations of the LSB along the dynamic range. Therefore, the conversion becomes non-linear and the resolution is degraded. Although all cells have identical layout and are biased in the same conditions, their delay is not the same. If the delay of a large number of these cells is measured, their distribution is found to have mean µ and variance σ2. The delay of a cell can, therefore, be seen as a random variable with a normal Probability Density Function (PDF) having a mean µ and variance σ2. The mean corresponds to the expected cell delay, and the variance gives a measure of the spread of the actual delays around that value. 6.1.1. Origins of mismatch. Delay mismatch has its origins in the variation, due to the fabrication process, of the electrical parameters of the devices that constitute the cell. Two kinds of parameter variations can be distinguished: local and global variations [5][6][7]. Local variations affect devices that are immediate neighbours. This kind of random variation is generally called parameter mismatch. Global variations affect devices that are located far away in the same die, in different dies or even in different wafers. At a circuit level, global variations can be seen as static errors that affect the absolute values of the respective parameters. These variations are mainly due to process and temperature gradients, non- Page 65 uniformity of the photo-lithographic processing caused by proximity effects and different orientation of devices. Circuit topologies that rely on relative, rather than absolute device parameters effectively counter global mismatch variations. The DLL structures only rely on relative cell delay, therefore the effects of global parameter variations will be disregarded in this study. The effects of local variations can be limited by proper layout of the cells, keeping a constant orientation of the devices, avoiding temperature gradients and guaranteeing that each cell has the same “physical” patterns in its vicinity. Local variations result from unavoidable deviations from the intended values of key parameters during fabrication. Thin oxide thickness, bulk doping levels, mobility, etc. suffer statistical variations that affect important electrical parameters, such as the threshold voltage (Vt), the device current factor (β) and the body factor (γ). These random variations are usually assumed to be uncorrelated, having a normal distribution with a variance that is inversely proportional to the gate area. As devices approach their minimum feature size, especially in deep submicron technologies, mismatch also becomes dependent on gate length, L and width, W, separately. To guarantee a good matching behaviour, devices should be drawn with an appropriate gate area and using conservative (larger than minimum) gate dimensions. 6.1.2. Effects of cell delay mismatch. The integral linearity error results from the accumulation of the individual cell delay errors, subject to the limits imposed by the closed control loop of the DLL (the overall delay of the line is the period of the reference clock). The analysis in Appendix C shows that the standard deviation of the integral error (σDLL(i)) in a N-tapped DLL is defined by the following expression, where σcell=σ/µ reflects the matching of the delay of the individual delay cells as a fraction of their mean value µ=T/N. σ DLL (i ) = σ cell ⋅ n ⋅ ( N − n) , N where the timing variable n is defined in accordance with the bin position i along the delay chain by n = Mod (i + 1, N ) 1, 0 ≤ i < N . This definition of the timing variable n will be used throughout this chapter, in the context of the analysis of isolated DLL’s. From the previous equation it can be observed that for the same matching between delay cells σcell, the standard deviation of the integral error is bigger for longer delay lines (higher N). Therefore, for a given cell delay, better results can be obtained using a short 1 The notation Mod(a,b) denotes the modulo operation. It is required to capture the reference periodicity of the DLL timing interpolation: The last bin (N-1) has its limits defined by tap N-1 and tap 0. Page 66 Chapter 6: Analysis of the Limits to the TDC Resolution. delay line operating at higher frequency than with a long delay line operating at lower frequency. In a ADLL, time interpolation is obtained using taps from several phase shifted DLL’s. The standard deviation of the overall integral error can be obtained by taking into account the error accumulation along the DLL’s in the delay path. For any bin under consideration, the path from the origin includes delay cells in the Phase Shifting DLL and in the respective Timing DLL. Since delay variations due to mismatch are not correlated between DLL’s, the standard deviation of the integral error is the square sum of the partial errors: 2 n F + 1 m σ array (i ) = F ⋅ σ cell ⋅ ⋅ ⋅ (M − m ) + ⋅ ( N − n ) , N F M where M, N and F are defined in accordance with the allowed combinations for the array. The phase shifting variable m and the timing variable n are, respectively the timing DLL number and the bin number in the corresponding DLL. They are calculated taking into account the staggering of the DLL’s across the clock period. If i (0 ≤ i < N ⋅ F ) is the array bin number, then: m = Mod (i + 1, F ) , i + 1 n = Mod Floor − m, N . F The Mod(a,b)2 and Floor(a) operations are, respectively the modulo and the integer truncation operations. The definition of n is a generalisation of the one presented for the isolated DLL case, where the interpolation factor was F=1. These definitions of the phase shifting variable m and of the timing variable n will be used throughout this chapter, in the context of the analysis of ADLL structures. In Figure 1 an example of the expected integral error due to cell delay mismatch is shown. It corresponds to the case of an array with N=35 (number of cells per timing DLL), F=4 (interpolation factor) and a cell delay with a standard deviation of 0.01 (1%) of the cell delay. When several DLL’s are assembled in an array structure, the single DLL’s rounded curve shape (also shown) is distorted by the introduction of the Phase Shifting DLL. There is a strong periodic component with a periodicity of F, corresponding to the folding of the array from the last Timing DLL to the first one. The larger non-linearity found on the first part of the curve is due to the fact that timing interpolation in this region is performed using cells in different extremities of successive timing DLL’s. 2 The use of the modulo operation reflects the folding operation introduced by the ADLL scheme. This results in some bins being defined from the time interpolation of taps in opposite extremes of consecutive DLL’s. Page 67 0.15 0.125 0.1 0.075 0.05 0.025 ADLL single DLL 0 0 20 40 60 80 100 120 140 bin Figure 1: INL standard deviation curve resulting from a cell delay mismatch of σcell=1% (ADLL: N=35 and F=4, single DLL: N=140). 6.2. Jitter due to internal phase noise. In the previous section the DLL was considered as an ideal closed control loop, able to keep the delay of voltage controlled delay chain (VCDL) exactly at one reference clock period. The deviations from the ideal behaviour found in real control loops can be classified into two categories, in accordance with their origin: • Deviations of external origin: The reference signal has some phase noise that is propagated, without attenuation, along the VCDL. The control loop tries to track these random reference period variations by constant changes of the delay of each cell. • Deviations of internal origin: The control loop tries to keep the delay of the VCDL as close as possible to the reference period. In the absence of an ideal feedback loop, the dynamics of the control loop will generate some variation of the VCDL delay around its ideal value. These variations are seen as jitter. Since we are mainly interested in the study of the DLL internal sources of errors, we will focus on the deviations of internal origin, assuming an ideal reference clock. The delay oscillation induced by the operation of the control loop translates into jitter in the signal seen at the end of the delay chain. This jitter can be approximated, without loss of generality, to a random delay error with a normal PDF. The mean value of this error is µ jitter = 0 and the standard deviation, normalised to the reference period, is σjitter. The error due to jitter affects all delay cells in the same way but, since it is completely correlated, the variance of the integral error increases linearly along the VCDL of a DLL. The resulting standard deviation of the delay σDLL, normalised to the delay of a single delay cell is (following the same definition of n as before and σ j = σ jitter ⋅ N ): σ DLL (i ) = σ j ⋅ Page 68 n . N Chapter 6: Analysis of the Limits to the TDC Resolution. In the case of the array of DLL’s, the same considerations of the previous section apply and, using the same naming conventions, the resulting variance is: 2 2 m n σ array (i ) = σ j ⋅ F ⋅ + . M N Note that the DLL’s in the array have statistically independent jitter, therefore standard deviation components from different DLL’s are added quadratically. 0.15 ADLL 0.125 single DLL 0.1 0.075 0.05 0.025 0 0 20 40 60 80 100 120 140 bin Figure 2: Standard deviation curve resulting from a closed loop jitter of σ=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). The curve in Figure 2 describes de effect of jitter with σ=0.1% of the reference clock period (σj=3.5% of the cell delay if N=35). The topology of the ADLL is reflected on the saw-tooth shape of the curve. The same periodic components described in the previous section are present. For comparison, the effect of the same amount of jitter on a single DLL is also shown. 6.3. Non-linearity due to static phase error. Systematic offsets and unwanted delays present in the converter adversely affect the linearity of the system. They should be carefully identified and minimised. Main sources of non-linearity, identified in Figure 3, are: • Phase detector’s phase error (F(D1,D2) D1=D2-T). • Mismatch of the propagation delay of the lines carrying phase information from the delay chain to the phase detector (τ1 τ2). • Unbalanced load and signal characteristics on the delay cells at the extremes of the delay chain (d0, dN-1 n, 1 n N-2). • Propagation delay along the sampling signal distribution for the hit registers (thit Page 69 Clock d0 d1 d2 dN-2 dN-1 τ2 τ1 Tap 0 Tap 1 D Hit τhit Tap 2 D Tap N-2 D τhit Tap N-1 D τhit D2 F(D1,D2) D1 Phase Detector D τhit Figure 3: Detail of a delay locked loop depicting the important delays within the loop. 6.3.1. Effects of phase detector’s phase error. The phase detector responds to differences in the phase of its input signals by generating an electrical quantity A (voltage, charge, etc) proportional to the measured phase difference. A(t ) = F (φ1 (t ), φ 2 (t )) , where F (φ1 (t ), φ 2 (t )) = K ⋅ (φ 2 (t ) − φ1 (t )) − C and K and C are, respectively, the gain and the phase error of the phase detector. φ1(t) and φ2(t) are the phases of the two signals being compared by the phase detector. In the context of DLL analysis it is more convenient to discuss the properties of the loop in terms of delay instead of phase. These concepts are equivalent, their relation being given by the transformation 2 ⋅ π ⇒ T . The previous equation is therefore transformed in: A(t ) = F ( D1 (t ), D2 (t )) , where F ( D1 (t ), D2 (t )) = K ⋅ ( D2 (t ) − ( D1 (t ) + T )) − C and the 2π phase difference between the two extremes of the delay line is explicitly stated (clock period, T). D1(t) and D2(t) are the two delays being compared. The loop equilibrium is obtained when A(t)=0, which should correspond to D2 (t ) = D1 (t ) + T . However, this is not the case if C "!#%$"&'((&!#)*,+-"!# detector error C will be reflected in the effective static delay (phase) error. The origin of C may be attributed to an unbalanced phase detector, resulting in an offset in the output signal. The following discussion assumes an N-tapped DLL spanning a time interval Dtot that corresponds to a reference clock of period T. It is further assumed that no errors, other than the one under study, are present. In equilibrium, K ⋅ ( D2 (t ) − D1 (t ) − T ) − C = 0 ⇔ D2 (t ) − D1 (t ) − T = Derr (t ) = C K . Page 70 Chapter 6: Analysis of the Limits to the TDC Resolution. The total time interval spanned by the delay chain is Dchain = T + C K . Therefore the length of each bin is di = T 1 C + ⋅ , N N K 0 ≤ i < N −1. Since the periodicity of the reference clock is T, the total time covered by the delay chain must be Dchain = T . The remaining delay is subtracted from the last bin of the chain d i , i = N − 1 , which is defined from the time difference (modulo T) between two taps on opposite extremes of the delay chain (tap N-1 and tap 0 in Figure 3). di = T N −1 C − ⋅ , N N K i = N −1. In Figure 4 the effect of this error mechanism is illustrated. Each rectangle corresponds to a bin. For comparison the ideal case is shown in the top of the figure. Notice that due to the periodicity of the scheme (period T), the last bin corresponds to a fraction of the delay of the last cell. T/N (ideal) bin 0 bin N-1 T bin N-1 bin 0 C/K bin 0 T/N+1/N.C/K T/N-(N-1)/N.C/K T+C/K Figure 4: Illustration of the effect of the phase detector’s phase error (N=5). The error of the phase detector can be referenced to its input, and translated into an added delay to one of the input signals. If we set delay τ ’diff = C/K, this delay can be lumped into the propagation delay mismatch of the input paths (τdiff) and the phase detector considered ideal. The behaviour of digital, two-state phase detectors is quite different, because they don’t extract information on the magnitude of the phase error. However the static phase error of such a phase detector may also be referenced to its input and therefore can be studied in the same way. Page 71 6.3.2. Effects of phase detector input paths’ delay mismatch. If the propagation delay of the signals carrying the phase information from the two extremes of the delay line to the phase detector is different, then this difference will induce conversion non-linearity: Dchain = T + (τ 2 − τ1 ) ⋅ T = (1 + τ diff ) ⋅ T , where τ1 and τ2, the propagation delays shown in Figure 3, are normalised to the reference period T. Therefore, 1 1 d i = + ⋅ τ diff ⋅ T , N N 0 ≤ i < N −1 1 N −1 di = − ⋅ τ diff ⋅ T , N N i = N −1 This effect is illustrated in Figure 5: T/N (ideal) bin 0 bin N-1 T bin N-1 bin 0 τdiff.T bin 0 T/N.(1+τdiff) T/N.(1-(N-1).τdiff) T+C/K Figure 5: Illustration of the effect of the phase detector input paths’ delay mismatch (N=5). Assuming, in the interest of simplicity, that C/K and τdiff are represented as a fraction of the reference period T, the conversion integral non-linearity due to these errors is obtained from the expression: INLDLL (i ) = 6.3.3. 1 C ⋅ n ⋅ ( + τ diff ) , 0 ≤ i ≤ N − 1 N K Effects of unbalanced conditions of the cells in the extremes of the delay chain. Cells in the extremes of the delay chain are under the effect of different environment conditions. For example, the last cell in the chain drives a smaller load than internal cells and the signal arriving in the first cell has different rise time than the signals inside the delay chain. For simplicity we will consider that these conditions affect only the bins on Page 72 Chapter 6: Analysis of the Limits to the TDC Resolution. the extremities of the delay chain. In this case the resulting bin delays due to an increase of δ in and of δ out in the delay of the first and the last bin are: ( N − 1) ⋅ δ in − δ out T d i = 1 + ⋅ , N N i = 0, δ + δ out T d i = 1 − in ⋅ , N N 1≤ i ≤ N − 2, ( N − 1) ⋅ δ out − δ in T d i = 1 + ⋅ , N N i = N −1. The effects of unbalanced conditions of the cells in the extremes of the delay chain are separately illustrated in Figure 6 (for the first cell) and in Figure 7 (for the last cell). In both cases the larger first (or last) cell leads to a larger first (or last) bin and to smaller other bins, thus maintaining the clock periodicity of the scheme. T/N (ideal) bin 0 bin N-1 bin 0 T bin N-1 bin 0 T/N.(1+(N-1)/N.δin) T/N.(1-1/N.δin) Figure 6: Illustration of the effect of unbalanced conditions in the first cell of the delay chain (N=5). T/N (ideal) bin 0 bin N-1 bin 0 T bin N-1 bin 0 T/N.(1-1/N.δout) T/N.(1+(N-1)/N.δout) Figure 7: Illustration of the effect of unbalanced conditions in the last cell of the delay chain (N=5). The expression for the conversion integral non-linearity due to these errors is, therefore: INLDLL (i ) = δ in ⋅ n′ n − δ out ⋅ , N N 0 ≤ i ≤ N −1, where n ′ = N − 1 − i and n was previously defined. Page 73 6.3.4. Effects of propagation delay on the sampling signal path. All non-linearity sources within the DLL loop have been covered, but there is also an external source that affects the linearity of a DLL based converter. In fact, due to unavoidable propagation delays in the hit sampling signal distribution, the sampling of the status of the DLL occurs at different times for different taps. The error generated by this effect is a function of the hit register topology. This effect is corresponds to the vernier interpolator configuration previously described (see Chapter 4) Considering, for example, the linear hit sampling signal distribution configuration shown in Figure 3 and a constant3 τhit propagation delay per hit register, the resulting apparent cell delay is: d i = (1 − τ hit ) ⋅ T , N d i = (1 + ( N − 1) ⋅ τ hit ) ⋅ 0 ≤ i ≤ N − 2, T , N i = N −1. This effect is illustrated in Figure 8. In this case the last bin is extended to the end of the clock period so that the full period is covered. T/N bin 0 bin N-1 bin 0 T bin N-1 bin 0 T/N.(1-τhit) T/N.(1+(N-1).τhit) Figure 8: Illustration of the effect of the propagation delay on the sampling signal path – case of the linear hit signal distribution network (N=5). The linearity of the conversion is given by: INLDLL = − τ hit ⋅ n , 0 ≤ i ≤ N −1. In order to reduce this effect, lines with smaller propagation delays can be used. Alternatively more complex distribution configurations, such as the T-shaped distribution network, can be used. In this distribution network the hit sampling signal is distributed in two separate branches starting from the middle of the hit register row. In this way, the distance from the source to the register further away is halved, and therefore the propagation delay τhit is reduced. A positive side effect of this network is that in one of the branches the vernier interpolation results in smaller bins and in the other in larger bins. 3 A signal propagating along a finite RC delay line does not progress at constant speed. Typically it accelerates along the line, therefore τhit is not constant. However this convenient simplification enables a faster understanding of this effect. Page 74 Chapter 6: Analysis of the Limits to the TDC Resolution. The Figure 3 is repeated in Figure 9 for this configuration. This configuration reduces the integral non-linearity (see Chapter 7 for a detailed analysis of this distribution network). For this particular distribution, the resulting effective cell delay is (assuming, for simplicity, an even number of delay cells, N): d i = (1 + τ hit ) ⋅ T , N 0≤i< d i = (1 − τ hit ) ⋅ T , N N ≤ i ≤ N −1. 2 Clock d0 dN/2-1 d1 dN/2 N , 2 dN-2 dN-1 D2 τ2 F(D1,D2) D1 τ1 Tap 0 Tap 1 D τhit Tap N/2-1 D Tap N/2 D τhit Tap N-2 D τhit D τhit Phase Detector Tap N-1 D τhit Hit Figure 9: The T-shaped hit signal distribution network. The illustration of this effect, for the T-shaped sampling signal distribution network is shown in Figure 10. Notice the larger initial bins and the smaller final bins. T/N (ideal) bin 0 bin N-1 bin 0 T bin N-1 bin 0 T/N.(1+τhit) T/N.(1-τhit) Figure 10: Illustration of the effect of the propagation delay on the sampling signal path – case of the Tshaped hit signal distribution network (N=5). The linearity of the conversion due to this error source is given by: N N INLDLL = τ hit ⋅ − n − , 2 2 0≤n< N. Page 75 6.3.5. Overall non-linearity due to static phase error. The effects of all static error sources can be included in a single integral nonlinearity expression, where i is the bin position 0 ≤ i < N . Making the following variable C substitutions, Din = δ in , D PD = + τ diff , Dout = δ out , Dhit = − τ hit and K n = Mod (i + 1, N ) , n′ = N − 1 − i , the overall integral non-linearity expression is obtained: INLDLL (i ) = Din ⋅ n′ n + ( DPD − Dout + Dhit ⋅ N ) ⋅ , N N in case the linear hit signal distribution is being used or INLDLL (i ) = Din ⋅ n′ n + (DPD − Dout ) ⋅ − Dhit N N N N ⋅ − n − , 2 2 if the alternative T-shaped distribution hit signal distribution is being used. In the case of the array of DLL’s, the integral non-linearity along the delay path is added linearly. We assume that, regardless of the actual detailed hit signal distribution, to each of the Timing DLL’s corresponds a set of hit registers that are driven through a separate signal path. In this context, Figure 3 and Figure 9 correspond to one of the Timing DLL’s that make up the array. The Phase Shifting DLL is not directly sampled therefore this effect is only visible in the Timing DLL’s. Taking into account the staggering of the multiple Timing DLL’s we define the following variables as a function of the position of the bin i ( 0 ≤ i < N ⋅ F ): m = Mod (i + 1, F ) , i + 1 n = Mod Floor − m, N , F i + 1 n′ = Mod m − Floor , N . F The overall integral non-linearity expression is: m F + 1 n′ m n INLarray (i ) = Din ⋅ F ⋅ − ⋅ + + DPD ⋅ F ⋅ + + F N M M N , m F +1 n − Dout ⋅ F ⋅ ⋅ + + Dhit ⋅ F ⋅ n F N M if the linear hit signal distribution is being used or Page 76 Chapter 6: Analysis of the Limits to the TDC Resolution. m F + 1 n′ m n INLarray (i ) = Din ⋅ F ⋅ − ⋅ + + DPD ⋅ F ⋅ + + F N M M N , N N m F +1 n − Dout ⋅ F ⋅ ⋅ + − Dhit ⋅ F ⋅ − n − F N 2 M 2 if the alternative T-shaped distribution is chosen. The curves in Figure 11, Figure 12, Figure 13 and Figure 14 are intended to illustrate the shape of the INL curve resulting from the indicated sources of linearity errors. No attempt is made to compare them, since they don’t reflect an actual value. For completeness, the corresponding DNL graphs are also shown. They are directly obtained from the respective INL curve. 0.2 0.15 0.1 0.15 0.05 0 0.1 -0.05 0.05 -0.1 ADLL single DLL -0.15 0 0 20 40 60 80 100 120 140 0 20 40 60 bin 80 100 120 140 bin Figure 11: DNL and INL curves resulting from a phase detector’s phase error (or phase detector input path’s delay mismatch): DPD ( C + τ diff ) =0.1% of the reference period K (ADLL: N=35 and F=4, single DLL: N=140). 0.1 0.05 ADLL 0.075 single DLL 0.025 0.05 0.025 0 0 -0.025 -0.025 -0.05 -0.05 0 20 40 60 80 100 120 140 0 20 40 bin 60 80 100 120 140 bin Figure 12: DNL and INL curves resulting from unbalanced conditions of the delay cells in the extremes of the delay chain : Din(δin)=1% and Dout(δout)=1% of the average cell (ADLL: N=35 and F=4, single DLL: N=140). Page 77 0.15 0 0.1 0.05 -0.05 0 -0.1 -0.05 -0.1 ADLL single DLL -0.15 -0.15 0 20 40 60 80 100 120 0 140 20 40 60 80 100 120 140 bin bin Figure 13: DNL and INL curves resulting from the propagation delay on the sampling signal path (linear hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). 0.05 0.1 0.025 0.075 0 0.05 -0.025 0.025 ADLL single DLL 0 -0.05 0 20 40 60 80 100 120 0 140 20 40 60 80 100 120 140 bin bin Figure 14: DNL and INL curves resulting from the propagation delay on the sampling signal path (T-shaped hit signal distribution network): Dhit(−τhit)=0.1% of the reference period (ADLL: N=35 and F=4, single DLL: N=140). In Figure 15, The combined effect of all these sources of non-linearity, when using the T-shaped hit signal distribution network, is shown. 0.15 0.15 0.1 0.125 0.05 0.1 0 0.075 -0.05 0.05 -0.1 ADLL 0.025 single DLL -0.15 0 0 20 40 60 80 100 120 140 0 20 40 bin 60 80 100 bin Figure 15: DNL and INL curves resulting from the combination of the previous curves (ADLL: N=35 and F=4, single DLL: N=140). Page 78 120 140 Chapter 7. Detailed Implementation. The circuitry included in the ADLL, as well as the channel buffers are the critical circuit blocks responsible for the performance of the converter. Their implementation will be analysed in detail, highlighting the advantages expected from the design options taken. 7.1. DLL building blocks. 7.1.1. Phase detector. The DLL closed loop operation is, in normal conditions, only required to track variations of the delay between the two extremes of the VCDL, the frequency of the reference signal being constant. In these conditions a simple two-state phase detector can be effectively employed. This phase detector presents some advantageous characteristics, such as implementation simplicity and ±T/2 operating range1. @clk VCDL_out D Q D=1 VCDL_in Qb D=0 VCDL_fast Q=1 Q=0 D=0 VCDL_slow D=1 Figure 1: D-flip-flop operating as a two-state phase detector. A D-flip-flop (D-FF) connected as in Figure 1 behaves as a two-state phase detector. It samples the signal coming out of the DLL delay chain (VCDL_out) at the rising edge of the reference clock entering the chain (VCDL_in). Therefore, the phase detector output reflects the absolute value of the delay difference (referred as the phase error). When a zero phase error situation is approached, the output of the phase detector will permanently shift from one state to the other, resulting in what is called a “bangbang” behaviour of the closed loop it controls. Therefore the average phase error of the 1 The standard notation to describe phase detector operation refers to phase instead of period. Following that notation the operating range would be termed ±π, instead of ±T/2. However these are equivalent notations and in the context of DLL’s and TDC’s it seems more adequate to deal with time and delay instead of frequency and phase. Some exceptions to this rule are made, for example, we use the usual nouns Phase Error and Phase Detector, instead of Delay Error and Delay Detector. Page 79 closed loop is zero, but its instantaneous value oscillates around this ideal value without ever settling into it. The oscillation amplitude is independent of the phase detector. It is set by other loop parameters. Vpd -T/2 Vpd T/2 φe -T/2 T/2 φe Figure 2: General and D-FF based two-state phase detector transfer characteristic. The transfer curve of a general two-state phase detector is shown in Figure 2. The bi-stable characteristic of the D-FF based phase detector is also shown. It does not carry quantitative information about the phase error, however when integrated along the time, the general transfer curve is obtained. Optionally a 3-state sequential phase-frequency detector (PFD) [8],[9] could have been used, and the “bang-bang” behaviour avoided. However, 3-state PFDs are more complex devices and must be carefully designed to avoid developing a dead-band around the zero-phase error. The main application of 3-state PDFs is in PLL control loops, where their ability to capture frequency error information is required. Furthermore, since the main function of a PLL is to track frequency, they can usually tolerate small phase errors. This is not the case for a DLL, whose main function is to track delay (phase). Since the amplitude of the “bang-bang” oscillation can be made arbitrarily small by setting the corresponding loop parameters, it is preferable to use the simpler 2-state phase detector configuration. The information on the amplitude of the phase error carried by the PFD output also enables it to perform faster corrections to the VCDL, in case of severe reference clock period variations. However this feature is not necessary in a TDC, where the reference clock is, by definition, stable. It is therefore more important to avoid the dead-band and obtain a better discrimination around zero phase error, which is easier to achieve if the 2state phase detector is used. D-flip-flop implementation. In Chapter 6, the degradation of the converter linearity due to a phase error generated in the phase detector was discussed. It is therefore important to understand what are the phase detector characteristics that generate a phase error, in order to be able to counteract them. Page 80 Chapter 7: Detailed Implementation. Two conditions may generate a phase error in a D-FF based phase detector: • Sampling moment shifted from the input signal’s arrival time (for example, due to unbalanced loads in internal nodes). • Metastability conditions. To avoid these conditions, the sampling uncertainty of the D-FF must be limited to a very narrow time window exactly centred on the arrival time of the input signal rising edge (the sampling instant). These characteristics should not change in any operating conditions and should be immune to process variations or device mismatch. The configuration we will study is the balanced implementation of a D-FF, as described on [10] and shown in Figure 3. In this topology, all internal nodes have the same fanout and all gates have the same driving capability. A very balanced circuit is obtained and therefore no shift should be seen in the sampling instant. The critical nodes that define the speed of the data latching are included in the SR#1 block highlighted in Figure 3. This latch should be very fast to achieve its final state, after a change in the inputs. In these conditions, the sampling time is well resolved under any operating conditions. dummy gate D SR#1 dummy gate Figure 3: Balanced D-flip-flop topology. Metastability will affect the phase detector operation by delaying the phase detector decision. This, in turn, will limit the amplitude of the corrections the closed control loop can perform in one reference clock period. If the delay is large enough, the decision may not be taken at all, resulting in the absence of a correct control loop decision during that period (corresponding to one clock cycle). If the metastability probability is large, a “dead band” where the loop is unable to react to delay differences, will appear around the zero phase error point. To avoid this situation the D-FF must be able to get out of the metastable condition very quickly. Again, the critical SR#1 latch must be designed having in mind this problem [11]. Page 81 This D-FF topology does not produce any hysteresis in its transfer function, since the state of the critical SR#1 latch is independent of the output state of the flip-flop. Therefore no “dead band” related to hysteresis can exist. The D-FF implemented is a variation of the one shown in Figure 3, where maximum priority was given to the correct operation of the critical latch. For this the inherently slow 3-input SR latch was substituted for a faster 2-input latch, as shown in Figure 4. The layout of two and gates that had to be introduced in the decision path is equal and is made in close proximity so that their delay matching is optimised and they are simultaneously affected by supply noise. In this way, these two gates only affect the latency of the phase detector and not its timing resolution or its static phase error. dummy gate and#1 D SR#1 and#2 dummy gate Figure 4: Balanced D-FF topology featuring fast SR#1 operation. Device matching also affects the performance of the circuit, by making the delay of identical gates different from each other. All devices have, therefore, large gate area and their layout is done following matching minded rules [12],[13]. The width of the gate is also determined by the speed requirements. Simulations have shown that, for the technology used, a 3:1 ratio between effective gate sizes of the PMOS and the NMOS branch of the gates results in an improved phase detector accuracy and a smaller dependency on environment variations. The accuracy of the phase detector, obtained from simulations is better than 12ps under any environment or process conditions. In the presence of large mismatch (simulated by varying the gate length of selected devices) a maximal degradation of the accuracy to 22ps was observed. 7.1.2. Charge-pump and loop filter. The behaviour of closed control loops built with a sequential phase detector, a charge-pump and a filter have been analysed in detail [14],[15] and numerical simulation Page 82 Chapter 7: Detailed Implementation. models have been built [16]. These loops present several advantages in comparison with the conventional loops built with a combinatorial phase detector and filter. The main advantage for our application is their ability to obtain zero static phase error using a passive loop filter. The charge-pump, together with the loop filter convert the logic state of the phase detector into an analogue quantity that can be used to control the delay chain. Since the control loop is only required to track delay variations between the two extremes of the VCDL, the loop filter can be made of a simple capacitor. The resulting closed control loop is a first order system, therefore it is inherently stable. The charge-pump is made of a current source and a current sink that, depending on the state of the phase detector will either deliver a “packet” of charge, or extract a “packet” of charge from the loop filter capacitor. The capacitor behaves as an integrator of the charge, converting it into the control voltage for the VCDL. Icp Vctrl (from phase detector) Icp (to VCDL) Cfilter Figure 5: Charge-pump and filter capacitor block diagram. This configuration of charge-pump and 2-state phase detector leads to the “bangbang” behaviour of the closed control loop. After delay lock has been achieved, the actual delay of the delay chain will be permanently oscillating around the zero phase error delay. This oscillation translates into loop jitter. Assuming an otherwise ideal loop behaviour, the amplitude ∆Vctrl of the oscillation corresponds to the charging (discharging) of the filter capacitor (Cfilter) by a constant current (Icp) during the reference period (T): ∆Vctrl = I cp ⋅ T C filter . Therefore, given a fixed reference period, the only way to decrease the amplitude of the oscillation and the loop jitter is to reduce the charge-pump current and/or increase the filter capacitance. The current on the two branches of the charge-pump is assumed matched. However this is not a very critical parameter if only low amplitude ∆Vctrl oscillations are allowed, since the static phase error it may entail is very small (smaller than the amplitude of oscillation). Page 83 Charge-pump implementation. The implementation of the charge-pump is driven by the necessity of accurately switch current sources into a capacitive node. In this context, the current switches are critical to the correct behaviour of the circuit. Gate signal feedthrough in these switches results in unwanted changes in the amount of charge stored in the filter capacitor. If these changes are comparable to changes due to normal loop function, the behaviour of the loop becomes unpredictable and a large static phase error may develop. Mdp M:1 Icp Icp Mswp Cgdp VCDLfast Vctrl Cgdn Cgdp VCDLslow Cgdn Mswn Icp Mop Vctrl Mon Icp M:1 Mdn Figure 6: Charge-pump topologies (simplified). In the first schematic of Figure 6 the feedthrough mechanism is illustrated. The gate drain overlap capacitance of the switch transistors (Msw) and the filter capacitor work as a capacitive voltage divider. Therefore when the switch of a charge-pump branch opens, Vctrl will experience a variation proportional to: ∆Vctrl = C gd ⋅ ∆V g C gd + C filter . The gate voltage swing ∆Vg is, in this case, the supply voltage. To guarantee that the Vctrl variation due to the control loop is bigger than the parasitic variation due to feedthrough, the charge-pump current should be: I cp >> ∆V g T ⋅ C gd ⋅ C filter C gd + C filter ≈ ∆V g ⋅ C gd T . The second schematic in Figure 6 shows the circuit used to reduce the feedthrough into the Vctrl node. In this circuit the switching activity is mixed with the current mirroring. Switching is limited to move the Vgs of the output transistors (Mo) to just below their threshold voltage, reducing ∆Vg to a small swing. Cgd is also reduced, since these transistors are made narrow to obtain low charge-pump currents. Page 84 Chapter 7: Detailed Implementation. A diode-connected transistor (Md) defines the lower limit to the ∆Vg swing. To make its Vgs voltage lower than the threshold voltage of the output transistor, it is designed very wide and short, its threshold voltage resulting smaller. The output transistor, on the other hand, is conveniently narrow and long, therefore it has a slightly higher threshold voltage, as intended. Since the output transistors (Mo) are only lightly switched off, the sub-threshold current is not completely eliminated. However, this current is substantially smaller than the “on” current, therefore it does not affect the operation of the charge-pump. When the charge-pump operates at low current levels, the mirror transistor operates with a Vgs only a few hundred milivolts higher than threshold voltage, resulting in an order of magnitude reduction in the ∆Vg swing. Overall, a 20 to 50 times reduction in the minimum usable charge-pump current can be obtained using this scheme. However, the switching speed of this charge-pump scheme is low. When a branch is released, the gate of the output transistor must be charged using the limited current available from the current mirror source. In order to increase the switching speed a current dividing mirror should be used. The switching speed limits the reduction of Icp that can be achieved, since the effective time T’ in which the charge-pump current is available to act on the Vctrl is smaller than the period T. Using this configuration, current levels as low as 200nA can be used. Taking into account the limited speed of the switch at this low current levels and other design constrains, the charge-pump implemented was designed to deliver a (programmable) current between 10µA and 100µA. Filter capacitor. The filter capacitor was made as a n-well isolated PMOS transistor working in accumulation mode [17]. In this mode of operation, a majority carrier channel is always present under the gate. This results in a voltage independent capacitance across the transistor gate2 and, due to the ready availability of carriers, it also has good high frequency characteristics. A capacitor built this way has the back plate always tied to ground. Therefore the control voltage Vctrl is defined having the ground node as a reference. Using a large transistor gate area, a capacity of ~47.7pF is obtained. If minimum charge-pump current levels are used, the resulting voltage control step is 2.6mV per reference clock period (T=12,500ps). 2 This statement holds true for most of the applicable gate voltage range with the exception of a narrow very low gate voltage range, where a depletion region subsists underneath the gate oxide and the gate capacitance is voltage dependent. Page 85 7.1.3. Delay cell. The VCDL is made of a number of identical delay cells. In these cells the control voltage generated in the closed control loop is translated into a propagation delay. The ADLL is made of two different types of DLL’s, the Timing DLL, that requires a cell delay of T/N = 357.1ps and the Phase Shifting DLL, that requires a delay of T/M = 446.4ps per cell. These DLL’s are built using the same building blocks (but a different number of delay cells) therefore the delay cell operating range must cover the two distinct operating points, in any conditions. Using four Timing DLL’s in a ADLL architecture, a time interpolation F=4 times better than the simple Timing DLL is obtained, leading to stringent matching requirements for the delay cells. The ADLL architecture uses a large number of fast identical cells. Furthermore, the delay matching required between these cells leads to the specification of large sized devices, which results in high gate capacitance. To drive these high loads at the necessary speed, large power dissipation is required. It is therefore important to choose a cell structure that reduces the dissipation, for a given speed and matching requirements. The delay of a cell is sensitive to temperature and supply voltage variations. It also depends on the process parameters. The correct operation of the DLL closed lock loop therefore, requires that a sufficient delay range is available to cover any operating conditions. Choice of cell structure. In summary, the choice of delay cell structure must conform to the following criteria: • Power dissipation. • Noise sensitivity. • Device matching. • Cell delay control range. Two structures where compared having in mind the particular operation of a DLL. These structures where the differential cell using symmetric loads as developed in [18] and the single-ended cell, based on a current-starved inverter structure. The sudden supply current variations due to the switching activity of the singleended delay cell structures entail noise in the power supply network. Supply noise translates into changes in the instantaneous decision threshold of each inverter and therefore in the time characteristics of the other cells in the delay line. Differential delay cells enjoy an apparent advantage in this respect, since their large common mode rejection ratio (CMRR) insures good supply noise immunity. Also their constant power dissipation Page 86 Chapter 7: Detailed Implementation. generates less supply noise. On the other hand, the constant tail current used in the differential delay cell significantly increases the power dissipation of the ADLL structure. One important characteristic of the operation of a locked DLL is that all switching activity in the delay line occurs evenly spread along the reference period, as illustrated on Figure 7. As a consequence, the instantaneous current requirements are averaged along the time and, therefore, the inductive supply voltage variations are strongly reduced. In these conditions, the delay cells that make up the DLL are not adversely affected by the switching activity and a careful distribution of the power supply, separating the DLL from any noisy digital circuitry, will suffice to obtain a good noise performance. Simple, and more power conservative single-ended delay cells are, therefore, a viable alternative to differential logic. Clock Phase Detector Voltage at tapi VDD 0 1 2 Charge Pump N-1 N 0 Current from supply T/N T T/N T t Iave 0 Figure 7: Rising edge propagation along the DLL delay line and corresponding current consumption. In order to obtain a high CMRR [18], a differential amplifier must have a linear resistive load in each branch. Furthermore, the impedance of the tail current mirror must be high. In the delay cell shown in Figure 8, a variable linear load is obtained using the symmetric load structure. If correctly biased, this structure guarantees a first order linearity of a high impedance load around the half-swing output voltage. Automatic bias is derived from the control voltage using the self-biasing circuitry (also shown). Delay control is obtained by variation of the load impedance. Simultaneous variation of the tail current ensures that the symmetrical load remains linear throughout the range of operation. Page 87 + Vctrl - outb out in inb + N cells Figure 8: The self-biased differential delay cell (from [18]). Single-ended architectures traditionally rely on the current starvation of two series CMOS inverters (Figure 9). Current starvation is usually performed on both branches (NMOS and PMOS) of the inverters in order to guarantee a perfect symmetry of operation. However, this is not a limiting requirement, since the two inverters in series already guarantee the correct operation of the delay cell. The cell delay is defined by the amount of current available to charge the load at the output of each inverter. The matching characteristics of the current-starving transistors are, therefore, critical to ensure the matching of the cell delay. These transistors must have large gate areas. The matching characteristics of the switching transistors are not critical, since they are sized in such a way that they don’t limit the current available to charge the output load. in Vctrl out + N cells Figure 9: The current-starved inverter delay cell (simplified version). The delay cells are isolated from the hit registers by a tap buffer. In the case of current-starved inverter based cells, it is recommended to implement also a dummy buffer in the output of the first inverter, in order to guarantee symmetry of the propagation delay of the rising and the falling edge. Page 88 Chapter 7: Detailed Implementation. These two delay cell structures where analysed in detail to verify their power dissipation and noise immunity. Simulations where used extensively, in order to accurately capture the delay variations due to noise. Only power supply noise was considered in this study, since it was found to be the dominant effect. Other noise sources, such as thermal noise, are completely hidden by supply noise. The procedure that was followed in this study was to simulate the two VCDL’s (one for each of the structures) with a square signal of a given amplitude modulated into the power supply voltage. The phase of the square noise signal was made to vary in relation with the phase of the signal propagating within the delay line. In this way it is possible to identify a time window where the delay cell is sensitive to supply noise and also the maximum delay shift. The same procedure was also used to analyse the delay sensitivity to noise in the control node. Noise can couple into this node via two different paths, the substrate and capacitive coupling with the switching nodes. In a locked DLL, there are always two opposite edges of the signal propagating inside the delay line, therefore their opposite effects should keep the control node balanced. However, since the sensitivity of this node is high, it is important to minimise any coupling into it. The resulting supply noise delay sensitivity graphics are shown in Figure 10, where all delay cells are tuned for a 390ps delay. A window of increased sensitivity, corresponding to the cell switching moment (time=0ns), can be identified. A summary of the sensitivity of each structure, within the sensitive window, is tabled in Table 1. The average power dissipation obtained when the cells are biased to operate with the required delay is also shown. The single-ended structure also shows (time<0ns) a noticeable delay variation due to slow (or DC) changes in supply. However, it should be noticed that the amplitude of these slow variations depends on the control voltage applied and they are effectively countered by the closed control loop. 10 10 8 8 Differential delay cell 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -1.5 Current-starved delay cell -6 -1 -0.5 0 0.5 -1.5 time (ns) -1 -0.5 0 0.5 time (ns) Figure 10: Cell delay variation due to a 100mV supply voltage step, respectively for the differential and the current-starved inverter structure. The differential structure needs 5.6 times more current that the single-ended CMOS inverter structure, for the same propagation delay. Page 89 Step noise sensitivity amplitude Symmetric load differential Current-starved inverter Supply 100mV 3ps 5ps Control 20mV 11ps 15ps Power dissip. (average/cell) 4.2mW 0.74mW Table 1: Summary of noise sensitivity and power dissipation analysis. Offset and gain selection. Apart from gate area, the matching characteristics of a device also depend on its operating point. As the gate voltage approaches the threshold voltage (Vth) and the operation of the device moves closer to weak inversion, its matching characteristics are severely degraded [6],[19]. Therefore, the operating point of the devices that make up the delay cell should be reasonably away from Vth, in any conditions. However, depending on the process parameters and on the specific conditions under which the cell is being used, the closed control loop may force the current-starving devices to operate in disadvantageous matching conditions. The current-starved inverter structure was changed in order to force the cell to operate in optimal matching conditions under any circumstances. delay partitioned original Vo Vp Vctrl Figure 11: Simplified representation of the delay range partition. The principle of operation of this cell is to divide the delay range into small and partially overlapping ranges, as shown in Figure 11. These delay ranges are wide enough to enable the DLL to track delay variations due to changes in the environment conditions that may occur during operation. The selection of the operating range is performed at start-up. It is a function of the device matching, the delay tracking coverage and the particular operating conditions found. To enable the automation of the range selection algorithm, the range partition is made such that in any conditions lock can be achieved in at least three ranges. By selecting the appropriate delay range, the cell can be made to Page 90 Chapter 7: Detailed Implementation. operate at a point Vp further away from the threshold voltage of the current-starving transistors than would be the case in the original cell (point Vo). Another advantage gained from partitioning the operation range is the reduced cell gain (the slope of the cell transfer curve in s/V). Therefore the forward gain of the control loop is smaller and a finer adjustment of the delay is possible. In the “bang-bang” configuration used, it translates into smaller amplitude of the periodic delay oscillation. Alternatively, the filter capacitor can be made smaller without degrading the closed loop performance. The sensitivity to noise in the control node is also reduced. The proposed cell topology is shown in Figure 12. The selection of the operating range is done using the offset signal. The offset control is implemented in the NMOS and PMOS branches of the inverter. It generates a fixed delay offset in the transfer curve. To improve the cell flexibility, the gain of the current-starving transistor connected to the loop control node can be changed, using the slope signal. It is, therefore, possible to increase the tracking coverage (range length) of each range, if it is necessary for a specific application. The slope control is only implemented in the NMOS branch of the inverter. The offset selection signal is obtained from the digital-to-analogue conversion of a digital control signal and the slope selection is performed digitally, therefore they correspond to discrete settings. Figure 13 shows the actual delay ranges. Depending on the offset and slope selection, the cell gain will be different. A method for automatic selection of the range will be described in Section 7.1.6. offsetP 2 inverters delay in out offset slope N cells Vctrl offsetN Vctrl + - slope<0:1> Figure 12: The selectable-range current-starved inverter cell. The simulation results exposed in Figure 13 show that the maximum delay cell gain for a given range varies from 50ps/V to 713ps/V depending on the selection of offset and slope. Page 91 900 cell delay (ps) 800 700 600 500 400 300 200 0 1 2 3 4 5 control voltage (V) Figure 13: The selectable delay ranges (simulation). The same noise sensitivity analysis was also performed for this cell. The results, for the sensitivity window (time=0ns), are given in Table 2. When compared to the differential cell structure, substantial power savings (3.3 times) can be obtained using this cell, if a similar increase in supply noise sensitivity is accepted. In relation to the simple current-starved inverter, better matching and closed loop characteristics can be obtained at the expense of increased power dissipation. Step noise sensitivity amplitude Range partition Supply 100mV 8ps Control 20mV 3ps Power dissip. (average/cell) 1.29mW Table 2: Summary of noise sensitivity and power dissipation analysis for the proposed cell. In summary, the advantages of such a delay cell are: • Lower power dissipation. • Smaller device matching sensitivity. • Variable cell gain. • Increased immunity to noise in the control node. 7.1.4. Delay chain. The delay cell is a part of a chain of cells whose overall delay is the clock period. To achieve maximum delay matching between cells, all cells should have the same physical and electrical environment. This consideration is especially true for the cells in the extremities of the delay chain. Physically they have no cell in one of their sides, therefore their matching is worse [6],[7]. Electrically the last cell does not have to drive the load due to the input of the next cell and the first cell is driven with a signal that doesn’t have the same timing characteristics (namely slew rate) as the other cells. Page 92 Chapter 7: Detailed Implementation. In order to equalise the environment of all the cells, additional dummy delay cells are implemented in both extremes of the delay line. The purpose of these cells is to force the environment of all cells to be the same, and therefore improve their delay matching. They have no other timing functionality. 7.1.5. Closed control loop. The implementation of the delay chain, the phase detector, the charge-pump and the filter capacitor has been discussed. Together they make up the closed control loop. The layout of the complete DLL should follow conservative layout rules, with especial care being given to the power supply distribution network and to the transport of the signals carrying phase information to the phase detector. If the propagation delay of the two feedback signals going to the phase detector is not the same, then the delay difference ∆tpd translates into closed loop static phase error, resulting in similar consequences as a phase error generated in the phase detector. The origin of this delay error is depicted in Figure 14. Since the delay chain of the Timing DLL’s is physically long (~2mm, in this prototype), the propagation delay of the feedback signals is considerable, and a large ∆tpd may arise. This situation was analysed in detail [20] to derive the topology that minimises the delay difference while not imposing a heavy area penalty on the design. The propagation delay of the two transmission lines was made as small as possible by their careful sizing. Also the load at the output of each of drivers was equalised to keep their slew rate similar. In this way it was possible to keep the delay error under 20ps. The Phase Shifting DLL has a shorter delay chain, therefore the delay difference is even smaller. clkref phase detector & charge pump C dummy dummy ∆tpd Figure 14: Detail of the closed control loop illustrating the propagation delay mismatch of the phase signals. The variation of the delay of the chain within a clock period can be estimated from the following equation, where Kcell is the gain of each delay cell: ∆t DLL (T ) = ∆Vctrl ⋅ K cell ⋅ N = I cp ⋅ T C filter ⋅ K cell ⋅ N . Page 93 Assuming the minimum charge-pump current level is being used and the cell delay range with minimum gain is selected, the delay variation of the Timing DLL is: ∆t DLL (12,500ps) = 10µA ⋅ 12,500ps ⋅ 50 ps V ⋅ 35 = 4.6ps 47.7pF The “bang-bang” oscillation amplitude is half this variation (~2.5ps). If, on the other hand, the delay range with the maximum gain is selected, ∆tDLL(T) may become as big as 65.4ps, resulting in an amplitude of oscillation of ~33ps. 7.1.6. Initialisation procedure. The loop initialisation is the procedure by which the loop acquires initial lock to the reference period. If the loop natural delay is close enough to the reference period, lock is acquired without any external help. Since this cannot be guaranteed in all circumstances, ways to pull the loop to within its locking range must be implemented. In the case of the loop architecture using the delay range partitioning, the best operating range must also be selected. Achieving lock. The transfer function of the phase detector has a periodicity of T, which means that it is unable to distinguish signals whose delay is multiple of a period T. Therefore, it may try to lock the DLL into a state where the VCDL delay is a multiple of T. An initialisation procedure must be used to force the closed loop to lock to the correct delay. One way to resolve this ambiguity is to initialise the VCDL with a delay that is known to be smaller than the reference period T. In this situation it is possible to qualify (the correctness of) the error information generated in the phase detector. Starting from this point, regardless of the phase information generated by the phase detector, the loop is constrained to slowly increase the delay of the VCDL until the phase detector is within its locking range (±T/2). This range can be identified by the generation of the correct error information by the phase detector, when it recognises that the VCDL delay is too short. At this point the loop is released to proceed with the locking acquisition. Since the forward open loop gain is small, the lock acquisition is a slow procedure. One way to improve the loop initialisation speed is to increase the charge-pump current levels before lock is achieved. Therefore the lock acquisition time can be decreased without compromising the dynamic behaviour of the loop. Range selection. The range selection is an iterative procedure. In a first step, the tracking range width necessary for the application is selected using the slope signal. Typically the smaller width is selected, because it results in the minimum forward open loop gain. However, Page 94 Chapter 7: Detailed Implementation. other range widths can be selected if a wider tracking range is desired. The second step corresponds to the actual range selection. This step can be automated and included in the loop locking procedure. It uses the offset signal. The range selection is performed by sequentially scanning the ranges for lock, starting with the fastest range (smallest offset). After having identified the ranges where lock can be achieved, the middle range3 is selected, because it corresponds to an operating point in the middle of the respective range, leaving a wide delay tracking margin. This property is depicted in Figure 15, where the viable initialisation regions within each range are identified with a heavier line. Note that these viable regions correspond to the initial locking range. They are a small part of the full range (thinner line) that is available for tracking of environment variations, after the initialisation has been completed. delay Vctrl Figure 15: Schematic representation of the delay range partition illustrating the viable locking regions. 7.2. The ADLL. Fine time interpolation is obtained by accurately phase shifting each of the Timing DLL’s by a fraction of their cell delay. The Phase Shifting DLL is used for this purpose. The ADLL taps result from the distribution of the phase shifted Timing DLL taps in accordance to the arrangement in Figure 16, where each rectangle represents the size of a DLL bin. The shaded bins represent a copy of the actual bin introduced to make the time interpolation on the extremes of the Timing DLL’s more clear. Due to the clock periodicity, the copy and the original bin occupy exactly the same time interval. An ADLL bin is defined from the difference between two taps in consecutive Timing DLL’s. This distribution of bins highlights some of potential sources of nonlinearity inherent to the architecture: 3 In extreme conditions, corresponding to the extreme ranges, lock may only be obtained for one or two ranges (see Figure 15). If only one locking range is identified, range selection is evident. If two locking ranges are identified, the extreme range should be chosen, because it results in the widest delay tracking margin. Page 95 • Some bins are defined by taps in opposite extremes of consecutive Timing DLL’s (see, for example, bin 5 in Figure 16). Potential phase errors in any DLL will accumulate in these bins, resulting in large non-linearity. • There is a potential F (=4) bin periodicity in the linearity error due to the folding of the tap distribution. Non-linearity of any DLL will increase this error (see, for example, bin 23 in Figure 16). • There is another potential F+1 (=5) periodicity in linearity error which corresponds to the spacing between two taps driven directly by the Phase Shifting DLL. Non-linearity of this DLL determines this error. The non-linearity generated by these errors is limited by reduction of any source of phase error and cell mismatch caused by the DLL building blocks. Coupling between DLL’s can also be a source of conversion errors. It can be reduced by proper electrical isolation of individual DLL’s, using careful supply distribution and providing guard-rings to isolate them from capacitive and substrate noise coupling. T/28 =5· ∆T PS-DLL ps0 T-DLL 0 T-DLL 1 T-DLL 2 T-DLL 3 T-DLL 0 ps2 ps1 136 0 4 137 1 8 5 138 2 9 12 20 18 15 16 bin 5 24 132 136 21 17 14 11 8 16 13 10 7 4 ps4 12 6 139 3 136 0 ps3 133 137 1 22 19 20 134 138 2 23 135 139 3 24 6 7 11 132 136 bin 23 T/140=∆T T/35=4· ∆T T Figure 16: The ADLL tap distribution arrangement. 7.3. Channel memory. The channel memory is made of a two-word deep pipeline. In order to reduce the hit rejection rate to acceptable levels, an asynchronous state machine controls the pipeline. This state machine generates the latching signals (store) for the two pipeline levels and controls the interface with the subsequent logic blocks [3]. The functional diagram is shown in Figure 17. Page 96 Chapter 7: Detailed Implementation. write enable write store reg. level #1 D rst ∆t store reg. level #2 read ∆t clear data available Figure 17: Functional diagram of the channel memory controller [3]. When the store signal is asserted, the data is stored in the level #1 register. If the level #2 register is free, the data is moved to this register, where it becomes available to be passed on to the digital processing unit. A data available flag is asserted to signal the existence of data in the channel memory. If the two register levels are full, further hits will be lost, until memory space becomes available again. The channel memory was designed to store data corresponding to two consecutive hits separated by at least 6ns. The hit register itself is required to capture the data present at the DLL taps in the instant that the store signal is asserted. Mismatching of the hit registers can generate a spread in the register acquisition time, which translates into an increased differential nonlinearity of the converter. The effects of tap register mismatch are not distinguishable from the effects of delay cell mismatch. The reduction of acquisition time mismatch can be done in two different ways: • Increase of device matching by increasing the gate area of critical devices. • Increase of acquisition speed by increasing the transconductance of critical devices so that delay variations from register to register are smaller. Since the time critical data sampling is performed only on the level #1 register, only the performance of this register is critical. The gate level diagram of a single bit of the hit register is shown in Figure 18. output (inverted) tap data enable store reg. level #1 store reg. level #2 Figure 18: The two-level hit register (1 bit). Page 97 The load on the ADLL tap output node must be kept low, in order to reduce the power necessary to drive it. It is therefore important to make the register’s input inverter smaller. The adverse effects of the increased device mismatch are limited by keeping the propagation delay of this gate low. On the other hand, the back-to-back inverters that make up the memory can be made bigger, so that their matching properties and their driving characteristics are good. In order to achieve a good accuracy of the acquisition time, the level #1 register is transparent until the acquisition of a hit. Since the tap outputs are switching at the reference clock frequency, it is necessary to limit the activity of the register by blocking the level #2 register until data has been acquired in the previous level. For the same reasons tri-statable gates are used, instead of pass-gates. This approach leads to slower signal propagation, but it reduces to half the number of switching devices when the circuit is idle. Corresponding supply noise reduction and power savings are obtained. In this application, the data signal is changing asynchronously to the store signal, therefore there is a finite probability that metastability conditions will occur. However, it should be noticed that this condition only affects one register, where the transition on the data and store signals occur “simultaneously”. Whichever logic level the register ends up resolving leads only to a measurement error that is at maximum the same as the metastability window width. Since this window is very small, the measurement error is also small. In order to synchronise the clkro synchronous read-out and processing control logic and the asynchronous tap register control state machine and avoid metastability to disturb the correct circuit functionality, two-stage synchronisers [21] were implemented in the signal paths interfacing the two domains (see Figure 19). Using two-stage synchronisers greatly reduces the probability of triggering the output of the synchroniser to its metastable condition. In addition, the latency that it introduces between the moment data is available in the tap register and the moment that these can be passed on to the processing logic is sufficient to resolve any metastability that may have occurred in the tap registers. signal D D signal_sync clkro Figure 19: Two-stage synchroniser using D flip-flops. When a measurement is performed, the status of the 140 taps that make up the ADLL must be accurately captured. The effect of this activity in the accuracy of the measurement is limited by the fact that it affects the same way all measurements performed in a given channel, it only contributes to generate an offset in the measurement. Page 98 Chapter 7: Detailed Implementation. Noise generated from activity in a neighbouring channel, due to its random nature, may disturb the other channels, generating crosstalk. To limit channel to channel crosstalk and obtain an acceptable performance out of these registers, the supply and control distribution must be carefully designed. 7.3.1. The store sampling signal distribution. The organisation of the individual tap registers follows naturally the organisation of the ADLL. Therefore four rows of 35 two-bit deep tap registers make up the channel memory. Four similar registers are appended to each of these lines, to store the coarse counter results (half of each counter word width per row). These register rows are quite long (>2mm), therefore the store signals arrive to the individual registers with a time difference proportional to the propagation delay of the line that distributes them. Two distribution configurations are shown in Figure 20. The linear distribution configuration corresponds to the vernier time interpolation scheme described in Chapter 4. The resulting bin size is the difference between the delay of the bin defined by two consecutive taps and the difference between the arrival time of the store signal to the corresponding tap registers. This error accumulates in the bin that is defined by taps in both extremes of the row (which correspond to registers in the opposite extremities of the register row). This error is equivalent to a static phase error in the Timing DLL’s. Alternatively, the T shaped distribution configuration can be used. In this case the error distribution is somewhat more complex. Depending on the branch of the distribution T network the bins become larger or smaller than the corresponding delay cell (see Chapter 6 for detailed analysis). (linear distribution) store control st. machine 0 1 2 N/2 N/2 -1 N-2 N-1 N-bit register row (T distribution) store control st. machine 0 1 2 N/2 N/2 -1 N-2 N-1 N-bit register row Figure 20: Alternative control signal distribution configurations within a channel memory row. Page 99 However, two advantages are obtained from this configuration. The first is that since each branch of the T is half as long as the complete row (and is loaded by half the number of cells), the propagation delay along the branch is smaller, resulting in a smaller difference between the store signal arrival time to each register. The second advantage is that the accumulation of the error is only relative to one branch of the T, corresponding to half the number of registers in the row, therefore the accumulated error is smaller than on the linear configuration. In Figure 21, a comparison between the integrated error obtained when using the two configurations is shown. The actual register row configurations are simulated, including the lumped loads connected to the lines due to the registers. They also include the registers needed to store the coarse time measure, which explains the imbalance of the two branches of the T configuration. 10 integrated error (ps) 5 0 -5 -10 -15 -20 -25 Linear -30 T-shape -35 0 5 10 15 20 25 30 35 regis ter Figure 21: Integrated error for the two proposed distribution configurations (simulation). Using the T shaped configuration, it is possible to obtain a 6-times reduction of the integrated error, as shown in Figure 21. The non-linearity of the ADLL due to the propagation delay of the store signal is improved correspondingly. Page 100 Chapter 8. Experimental Results. The performance of the demonstrator of the ADLL architecture described in this part of the dissertation is resumed in this Chapter. Only the relevant timing characteristics will be discussed here, a detailed test report is included in the HRTDC users manual [4]. The test bench used to characterise the converter is explained in Appendix A. 8.1. Delay cell range selection and charge-pump current level. Selection of the delay cell working range is an important feature of the architecture, because it allows adapting the cells to the specific operating environment. The initialisation procedure was tried for every working range, using an 80MHz reference clock. The ranges for which lock was obtained are shown in Table 11. working range DLL offset 0 1 2 3 4 slope 2 1 0 2 1 0 2 1 0 2 1 0 2 1 0 Phase Shifting ok OK ok ok ok ok Timing ok ok ok ok ok ok ok OK ok ok ok ok Table 1: Locking status for each working range, after the initialisation procedure. Following the range selection algorithm explained on Chapter 7, the ranges highlighted in Table 1 are chosen. This selection was used throughout the tests performed. The smallest possible current level was selected for the charge-pump, since it results in the smallest closed loop jitter. The cycle to cycle jitter measured at the output of the last delay cell in the fourth Timing DLL is σjitter=15.6ps, for the selected range. It does not vary substantially with the current level of the charge-pump (σjitter=19.4ps at maximum settings), confirming that the charge-pump operation does not adversely affect the performance of the converter. 1 Each offset and slope selection pair corresponds to a working range. Offset selection is divided into five options, ranging from 0 (maximum range offset) to 4 (minimum range offset). Slope selection in divided into 3 options, from 0 (minimum range slope) to 2 (maximum range slope). Page 101 8.2. Converter linearity. The measurement of the converter’s linearity required the collection of 840,000 random hits generated from an external pulse generator. The results obtained with this Code Density Test (CDT) test are, with a 98% confidence level (1-α=0.98, therefore α=0.02), comprised within a tolerance of 3% (DNL) and 17.7% (INL) of the actual values (respectively β=0.03 and β=0.17). If individual DLL’s are evaluated using the same data, a tolerance of 1.5% and 4.4% are obtained, respectively for the DNL and INL, with the same confidence level (see the Appendix D for details on how to measure the tolerance and confidence level of the test results). In an architecture such as the one used in this converter, the conversion transfer function is made of successive replications of the fine time interpolation transfer curve along the dynamic range. The coarse time counter is responsible for the correct fine interpolation repetition. Therefore, the linearity of fine time interpolator, made by the array of DLL’s (ADLL), has the largest contribution to the overall linearity. The ADLL will be characterised in great detail, whereas a simpler verification will be performed for the extended dynamic range mechanism. 1 1 0.75 0.75 0.5 INL (LSB) DNL (LSB) 0.5 0.25 0 -0.25 0.25 0 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 bin bin Figure 1: DNL and INL graphs for the ADLL. The graphs in Figure 1 show the differential and integral non-linearity of the ADLL. A DNLmax of 0.71LSB (σDNL=0.17LSB) and an INLmax of 0.67LSB (σINL=0.19LSB) is obtained. The main feature of these graphs is the significant non-linearity found in the first few bins in the array. These errors occur in the bins whose limits are defined by taps in opposite extremes of consecutive Timing DLL’s. They are the result of the presence of phase errors and of the delay cell mismatch on the Timing and Phase Sifting DLL’s. These phase errors2 are originated by any of the mechanisms previously exposed in Chapter 6. 2 The phase error must be understood in its wider sense. It may be caused by an actual phase error in the phase detector, to different propagation delay of the phase detector’s input signals or to a significant propagation delay in the distribution of the sampling signal to the hit registers. Page 102 Chapter 8: Experimental Results. The DNL and INL graphs can be compared to the curves in Figure 2. These curves were obtained from the analytical studies that were carried out3 in Appendix F. It can be seen that, using the analytical model and reasonable assumptions of the direction of the static errors that affect the converter, it is possible to estimate the main characteristics of the actual non-linearity graphs (the amplitude of each error is normalised). Note that delay cell mismatch was not taken into account on the analytical results shown here. 0.25 0.2 0.15 0.1 0.15 INL (LSB) DNL (LSB) 0.25 0.2 0.05 0 -0.05 -0.1 -0.15 0.1 0.05 0 -0.05 -0.2 -0.25 -0.1 0 20 40 60 80 100 120 140 0 20 40 60 80 bin 100 120 140 bin Figure 2: Analytical DNL and INL curves (Din=1% and Dout=-1% of the cell delay, DPD=-0.1% and Dhit=0.1% of the reference period). From the same set of data, the characteristics of the four Timing DLL’s can be extracted. These graphs are shown in Figure 3. The relevant feature is the presence of a phase error4 apparent in the large first bin of each DLL. It turns out to be significant in one of the Timing DLL’s (DLL0). A summary of the characteristics of the individual Timing DLL’s is presented in Table 2. From the data presented in the table, the delay cell mismatch obtained for these DLL’s is estimated to be ~4%, a bigger value than what was expected. 0.25 0.2 0.1 0.05 0 -0.05 -0.1 INL (LSBDLL ) DNL (LSBDLL ) 0.25 0.2 0.15 -0.15 -0.2 -0.25 DLL0 0 5 10 DLL1 15 DLL2 20 bin DLL 25 0 -0.05 -0.1 -0.15 -0.2 -0.25 DLL3 30 0.15 0.1 0.05 35 0 5 10 15 20 25 30 35 bin DLL Figure 3: DNL and INL graphs for the different Timing DLL’s (LSBDLL=4.LSB). 3 Due to implementation details, tap 0 of each of the Timing DLL’s was placed in the end of the respective delay chain. This position is delayed by one reference clock cycle from the original position, therefore their timing is the same. However, in a non-ideal converter the non-linearity graphs corresponding to the two cases are different. The analytical results shown here are obtained taking this into account. 4 See footnote2 in page 102. Page 103 There may be two origins for this larger value. It may be an effect of the actual device mismatch, a technological property seldom disclosed with adequate accuracy by the vendors, or due to electrical noise coupling into the channel buffers or the delay cells. Note that device mismatch may also affect channel registers and that this effect is not distinguishable from the delay cell mismatch. Timing DLL 0 σDNL DNL σINL INL unit LSBDLL (LSB) LSBDLL (LSB) LSBDLL (LSB) LSBDLL (LSB) 0.21 (0.84) 0.06 (0.23) 0.18 (0.71) 0.06 (0.24) 1 2 3 PS scheme 0.13 (0.52) 0.12 (0.49) 0.11 (0.46) 0.06 (0.28) 0.05 (0.18) 0.04 (0.17) 0.04 (0.18) 0.04 (0.21) 0.10 (0.41) 0.11 (0.44) 0.11 (0.43) 0.04 (0.22) 0.05 (0.19) 0.04 (0.16) 0.04 (0.15) 0.03 (0.15) LSBDLL-PS (LSB) Table 2: Summary of linearity obtained for each DLL in the array (LSBDLL=4·LSB and LSBDLL-PS=5·LSB). The phase shifting DLL can also be characterised using the same data set. The graphs in Figure 4 show the non-linearity of the first few cells of the Phase Shifting DLL. 0.25 0.2 INL (LSBDLL-PS ) DNL (LSBDLL-PS ) 0.25 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 1 2 3 bin DLL-P S 4 1 2 3 4 bin DLL-PS Figure 4: DNL and INL graphs for the Phase Shifting DLL (LSBDLL-PS=5·LSB). The non-linearity of the Phase Shifting DLL and the phase error accumulated in the first bin of each Timing DLL add up to a large ADLL non-linearity particularly in the ADLL bins number 4, 9, and 14, 139 and their neighbours, as would be expected. The auto-correlation function was applied to the DNL graph of the ADLL (Figure 5). It reveals peaks in the auto-correlation factor with a periodicity of 4·λ, which corresponds to the interpolation factor F used. Secondary peaks at λ=5 and 10 can also be identified, corresponding to the phase shifting performed by the Phase Shifting DLL, which introduces a delay of F+1=5 (LSB) between consecutive Timing DLL’s. Page 104 Chapter 8: Experimental Results. 1 auto-correlation 0.8 0.6 0.4 0.2 0 -0.2 -0.4 0 4 8 12 16 20 24 28 32 36 λ coefficient Figure 5: The ADLL auto-correlation graph. Although the extension of the dynamic range beyond the reference clock period is achieved by successive translations of the ADLL transfer curve, it is important to verify the correct behaviour of this operation. In the graphs of Figure 6 only four reference clock periods are analysed. This dynamic range is judged sufficient for the test being carried out. A detailed characterisation of the full dynamic range would be unpractical because of the large number of hits that would have to be collected. The result of a specific test enabling the verification the correctness of the dynamic range extension across the full dynamic range, is described later in this chapter. The periodicity of the non-linearity graphs is evident over the extended dynamic range. 1 1 0.75 0.75 0.5 INL (LSB) DNL (LSB) 0.5 0.25 0 -0.25 0.25 0 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 0 40 80 120 160 200 240 280 320 360 400 440 480 520 560 bin 0 40 80 120 160 200 240 280 320 360 400 440 480 520 560 bin Figure 6: DNL and INL graphs for the converter along four reference clock periods. For this test 1,680,000 hits where collected, therefore its results have a tolerance of 4.2% and 50% respectively for the DNL and INL curves (β=0.04 and β=0.5) with a confidence level of 98% (α=0.02). It is impractical to collect more hits, due to the long time it would require, therefore the tolerance in the INL measurements is wide. However, the values obtained for DNLmax and INLmax, respectively 0.73LSB (σDNL=0.18LSB) and 0.78LSB (σINL=0.21LSB) are similar to those obtained for the array itself, the differences being well within the tolerances accepted for such tests. Page 105 8.3. Linear time sweeps. The nature of statistical tests such as the CDT results in the averaging of random effects like phase noise and electrical noise. Phase noise (or jitter) can be present in the reference clock received, in the hit signal path, or may be due to the closed loop behaviour of the DLL’s. Electrical noise may couple into the DLL’s or into the hit sampling registers through the power supply or the substrate. To evaluate the effect of such random noise in the conversion error, a linear delay sweep is performed, using the test bench described in Appendix A. The following graphs result from a linear delay sweep where 42,000 samples where collected, corresponding to the accumulated effect of 5 samples collected for each delay interval of 3ps (5 ‘trombone’ delay steps of ~0.6ps). In Figure 7 the error graph resulting from a linear delay sweep spanning two reference clock cycles is shown. The RMS resolution of the converter is determined from the standard deviation of the error histogram. Its value is σ=0.39LSB (34.5ps). The maximum observed error is 1.62LSB (144.9ps). 2 2400 1.5 2000 1600 0.5 counts error (LSB) 1 0 -0.5 1200 800 -1 400 -1.5 0 -2 0 6000 12000 18000 24000 30000 36000 -2 42000 -1.5 -1 -0.5 0 0.5 1 1.5 2 error (LSB) delay step Figure 7: Error graph and histogram resulting from a delay sweep of two reference periods (σ=0.39LSB). From the linear delay sweep results, the linearity of the conversion can also be characterised. As shown in Figure 8, the DNLmax and INLmax measured using this method are, respectively, 0.73LSB (σ=0.18LSB) and 0.61LSB (σ=0.22LSB). 1 1 0.75 0.75 0.5 INL (LSB) DNL (LSB) 0.5 0.25 0 -0.25 0.25 0 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 bin bin Figure 8: DNL and INL graphs obtained from the linear delay sweep results. Page 106 Chapter 8: Experimental Results. This alternative method enables the confirmation of the results obtained using the statistical CDT test. The difference found is within the expected tolerance limits for this test. They can be justified by the sensitivity of the linear time sweep to the accumulation of errors generated during the delay generator alignment step (see Appendix A). The conversion error of a single Timing DLL (the first one) was also evaluated using the same set of data. The error histogram is shown in Figure 9. The measured RMS resolution of this DLL is σ=0.30LSBDLL (105.5ps), very close to the quantising limit (0.29LSBDLL). The maximum error observed was of 0.67LSBDLL (239.3ps). 2400 2000 counts 1600 1200 800 400 0 -1 -0.5 0 0.5 1 error (LSBDLL ) Figure 9: Conversion error histogram for the first Timing DLL (σ=0.30LSBDLL). 40000 35000 30000 bin 25000 20000 15000 10000 5000 0 0 500 1000 1500 2000 2500 3000 3500 delay (ns) Figure 10: Delay sweep over the full dynamic range. The correctness of the dynamic range extension up to 3.2µs is confirmed in the graph of Figure 10. It shows a coarse delay sweep over the conversion dynamic range. The delay step is, in this case, only 1ns. The limit of the dynamic range is clearly identified by the step visible in the transfer function after bin number 35,839. 8.4. Inter-channel crosstalk. Crosstalk between channels is an important characteristic of a multi-channel converter. The (almost) simultaneous acquisition of hits in several channels should not affect the individual channel’s performance. Evaluation of the channel performance in the Page 107 presence of activity in other channels was done following the procedure exposed in Appendix A. All channels in the IC (except one) were excited simultaneously, and the time difference between the hit arrival into the channel being evaluated and these channels is varied so that it covers all the reference clock cycle, a pessimistic, worst case, crosstalk sensitivity value is obtained. The measurements performed showed that, even in the presence of the most unfavourable conditions, the crosstalk is smaller than ±2LSB. This situation is shown in Figure 11. Notice that the measurement error is larger than ±1LSB only when the skew between the reference channel and the three crosstalk channels is within a time window of ~0.5·T (6.25ns). 5 4 3 2 1 0 -1 -2 -3 -4 -5 0 1 2 3 time skew (T) Figure 11: Measurement error due to crosstalk in the worst configuration. 8.5. Double hit resolution. To verify the correct functionality of the asynchronous channel buffers, and their ability to capture hits arriving in quick succession, a double hit resolution test was performed in accordance to the procedure in Appendix A. Burst of two pulses (the same as the depth of the channel buffer) with a separation down to 8.5ns (limited by the instrument used) were correctly acquired, as intended. 8.6. Power dissipation. The power dissipation of the fully operational circuit was measured to be 800mW. It includes the activity of the encoding, buffering and read-out logic integrated in the same IC. The demonstrator was built using a technology that requires a 5V supply voltage. 8.7. Summary of results. A summary of the relevant timing features observed in the prototype’s test is shown in Table 3. A full description of the converter characteristics may be found in [4]. Page 108 Chapter 8: Experimental Results. LSB 89.3 ps max 0.71 LSB / 63.4 ps DNL σ 0.17 LSB / 15.2 ps max 0.67 LSB / 59.8 ps INL σ 0.19 LSB / 17.0 ps RMS resolution (σ) 0.38 LSB / 34.5 ps dynamic range 3.2 µs crosstalk < 2 LSB double hit resolution < 8.5 ns reference clock 80 MHz number of channels 4 power dissipation 0.8 W technology 0.7µm CMOS 2 timing circuitry 6.1 mm area 2 IC 23 mm package 68 pin PLCC Table 3: Characteristics of the TDC prototype. 8.8. Conclusion. This implementation of the ADLL scheme demonstrates that it is possible to obtain a high-resolution time measurement system using cheap commercial CMOS technologies. The timing characteristics measured on the Time-to-Digital Converter match well with what had been predicted during the analysis and development of the circuit. Four TDC channels were integrated in the IC, together with the necessary encoding and buffering logic. Therefore, sufficient functionality is included to allow it to be used in real high-resolution time measurement systems. A batch of 1,000 TDC circuits was produced in order to be used in the preliminary system tests necessary for the development of the ALICE TOF detector [1] and also in the front-end of the PesTOF detector [22][23] used in the NA49 experiment running at CERN. The drawback of timing interpolator architectures based on the ADLL principle is the large power necessary to drive a significant number of active DLL delay elements. Since the time interpolator is shared between all the channels in the IC, the power dissipation per channel would be reduced if more channels are integrated in the same circuit. However, the overall IC power dissipation would increase, which could render impossible the utilisation of standard plastic packages. Page 109 Page 110 References for Part II. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] ALICE collaboration, A large ion collider experiment – technical proposal, CERN/LHCC 95-71, Dec. 95. Aray, Y. et al., A CMOS four-channel x 1K time memory LSI with 1ns/b resolution, IEEE Journal of Solid-State Circuits, Vol. 27, No. 3, pp. 359-364, Mar. 92. Christiansen, J., An integrated high-resolution CMOS timing generator based on an Array of Delay Locked Loops, IEEE Journal of Solid-State Circuits, Vol. 31, No. 7, pp. 952-957, Jul. 96. Mota, M., A high-resolution Time-to-Digital Converter – users manual, CERN/EPMIC. Lahshmikumar, K. et al., Characterisation and modeling of mismatch in MOS transistors for precision analogue design, IEEE Journal of Solid-State Circuits, Vol. 21, No. 6, pp. 1057-1066, Dec. 86. Pelgrom, M. et al., Matching properties of MOS transistors, IEEE Journal of SolidState Circuits, Vol. 24, No. 5, pp. 1433-1440, Oct. 89. Nekili, M. et al., Spatial characterisation of process variations via MOS transistor time constants in VLSI and WSI, IEEE Journal of Solid-State Circuits, Vol. 34, No. 1, pp. 80-84, Jan. 99. Kaenel, V. et al., A 320MHz, 1.5mW @ 1.35V CMOS PLL for microprocessor clock generation, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 17151722, Nov. 96. Maneatis, J., Low-jitter process-independent DLL and PLL based on self-biased techniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732, Nov. 96. Johnson, M. et al., A variable delay line for CPU co-processor synchronisation, IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, pp. 1218-1223, Oct. 88. Kim, L. et al., Metastability of CMOS latch/flip-flop, IEEE Journal of Solid-State Circuits, Vol. 25, No. 4, pp. 942-951, Aug. 90. Vittoz, E., The design of high performance analogue circuits on digital CMOS chips, IEEE Journal of Solid-State Circuits, Vol. 20, No. 3, pp. 657-665, Jun. 85. Bastos, J. et al., Matching of MOS transistors with different layout stiles, Proceedings of the IEEE International Conference on microelectronic test structures, pp. 17-18, Mar. 96. Gardner, F., Charge-pump Phase-Lock Loops, IEEE Transactions on Communications, Vol. 28, No. 11, pp. 1846-1858, Nov. 80. Gardner, F., Phase accuracy of charge-pump PLL’s, IEEE Transactions on Communications, Vol. 30, No. 10, pp. 2362-2363, Oct. 82. Page 111 [16] Paemel, M., Analysis of a charge-pump PLL: a new model, IEEE Transactions on Communication, Vol. 42, No. 7, pp. 2490-2498, Jul. 94. [17] Behr, A. T. et al., Harmonic distortion caused by capacitors implemented with MOSFET gates, IEEE Journal of Solid-State Circuits, Vol. 27, No. 10, pp. 14701475, Oct. 92. [18] Maneatis, J. et al., Precise delay generation using coupled oscillators, IEEE Journal of Solid-State Circuits, Vol. 28, No. 12, pp. 1273-1282, Dec. 93. [19] Forti, F. et al., Measurements of MOS current mismatch in the weak inversion region, IEEE Journal of Solid-State Circuits, Vol. 29, No. 2, pp. 138-142, Feb. 94. [20] Mota, M. et al., A high-resolution Time-to-Digital Converter based on an Array of Delay Locked Loops, Proceedings of the 3rd. Workshop on Electronics for LHC Experiments, pp. 338-342, Oct. 97. [21] Horstmann, J. U. et al., Metastability behaviour of CMOS ASIC flip-flops in theory and test, IEEE Journal of Solid-State Circuits, Vol. 24, No. 1, pp. 146-157, Feb. 89. [22] Pestov, Y., Timing below 100ps with spark counters: work principle and applications, Invited talk at the 36.th International Winter Meeting on Nuclear Physics, Bormino, 98. [23] Almasi, L. et al., New TDC electronics for a PesTOF tower – in NA49, ALICE/2000-02 internal note/TOF, Mar. 00. Page 112 PART III. A TDC ARCHITECTURE BASED ON A DLL AND A PASSIVE RC DELAY LINE. Page 113 Page 114 Future High-Energy Physics experiments will require complex electronic systems in order to handle the millions of data channels that constitute them. A significant part of these systems will be housed within the respective detectors’ structure. Given the large number of electronic circuits close to the detector, overall power dissipation is an issue. Increased detector temperature due to power dissipation is usually unacceptable and the weight and area that the power network occupies puts a heavy burden in the detector infrastructure. It is therefore essential to reduce the power dissipation of the individual circuits to minimal levels. In the Array of Delay Locked Loops (ADLL) architecture previously discussed, resolution improvements can be obtained if faster delay cells are used, or if the interpolation factor is increased using extra timing DLLs. Both methods result in higher power dissipation. In this part of the dissertation, an alternative time interval measurement architecture is introduced. This architecture uses a different time interpolation principle, which results in higher time resolution and lower power dissipation. This architecture offers the same potential of integration as the ADLL and has ability to perform automatic self-calibration, thus addressing all the requirements set forward by the ALICE TOF collaboration. In Chapter 9 the proposed architecture is introduced and the method used to obtain increased time interpolation is explained. Two time interpolation schemes are presented, together with the means necessary to achieve correct operation. Chapters 10 and 11, respectively, include a detailed look into the performance of these two schemes and to their calibration requirements. Finally, in Chapter 12 the results of the tests performed on a prototype IC that implements these two schemes are reported. Page 115 Page 116 Chapter 9. Architecture Overview. The advantageous characteristics of the DLL’s have already been described in this dissertation and their use in the context of time interval measurements shown. An alternative architecture, which takes advantage of these characteristics to build a highresolution time interpolator, is now introduced. The basis of the time interpolator is a single DLL. Finer time interpolation can be achieved either by further dividing the clock period, using extra phase-shifted timing DLL’s as was done in the ADLL or, alternatively, by sampling the status of the DLL several times with a small time interval between samples. In the later case, after determining which sample of the DLL has the reference clock edge arriving to the output of a given cell, it is possible to derive the hit arrival time with a resolution that is equal to the sample interval. To get full time coverage over the clock period, the samples must be obtained at uniform intervals over the full delay of a single DLL delay cell. This interpolation method is clarified in Figure 1. Vcontrol tap n-1 tap n tap n+1 tap n+2 Tcell/5 tap n tap n+1 Tcell tap n+2 s0 (= thit) s1 s2 s3 s4 t Figure 1: Detail of DLL signal propagation illustrating time interpolation through multiple delay line samples (in this example the number of samples acquired is M= 5). Page 117 If a single sample of the DLL status (cell delay is Tcell) is acquired at hit signal arrival time (s0), a transition 1 to 0 is found between the data corresponding to tap(n) and tap(n+1) of the status word. In this case, the hit time referenced to the clock is1: t hit = Tcell ⋅ n . Therefore, the resolution of the measurement is the intrinsic resolution of the DLL, Tcell. However, if several (M) uniformly spaced samples are acquired across the cell delay, the number of samples (m) elapsed before the reference edge appears in the output of the cell (no transition found) improves the time measurement accuracy: M −m t hit = Tcell ⋅ n + , 1≤ m ≤ M . M The resolution of this measurement is Tcell/M, where the interpolation factor M corresponds to the number of cell delay sub-divisions created by multiple sampling. Considering an N-tapped DLL, the overall resolution, related to the reference clock period, Tclk, is: Tbin = Tclk . N ⋅M Parameters N and M are, in this scheme, independent. This means that there is no numerical limit to the ratio to which the reference clock can be divided. Chiefly, it is possible to divide the reference period into a binary number of bins ( N ⋅ M = 2 n , with n being an integer). This division was not possible in the ADLL scheme. 9.1. Time interpolation circuit. A time interpolation circuit based on this principle is shown in Figure 2. It includes an N-tapped DLL and M rows of hit registers in order to store the M samples of the DLL status that are acquired for each measurement. The multiple sampling signals are defined at fixed time intervals from the moment the hit signal arrives. It is, therefore, natural to generate these signals using taps of an open-ended delay line through which the hit signal is propagated. However, guaranteeing short delays with high precision is not easily done. Active devices (even if they were fast enough) have timing characteristics that vary significantly with operating temperature, supply voltage and process parameters. Continuous calibration schemes similar to the DLL are not applicable, since no reference signal exists, therefore a different delay line should be used. Passive RC delay lines have been used in the past for timing generation [1], because of their low sensitivity to supply and temperature changes. Typically, a sensitivity of around 500ppm per Volt or oC is usually found in standard technologies. On the other 1 By convention, the limits of bin n are tap n and tap n+1. Page 118 Chapter 9: Architecture Overview. hand, their delay is strongly dependent on the circuit processing, since the characteristics of parasitic devices, such as resistivity, capacitance and even physical dimensions are only weakly controlled in digital CMOS technologies. Large circuit to circuit delay variations are thus expected, which makes start-up calibration of the lines essential to the performance of the proposed architecture. However frequent calibration is not needed due to the low supply and temperature dependencies. N delay cells Reference clock PD controllable delay line hit registers ( M rows ) Hit signal from calibration Figure 2: Time interpolation circuit. 9.2. Adjustable RC delay line. In order to be able to perform start-up calibration, the delay line should be made adjustable. Continuous and discrete adjustment schemes are possible, the choice between them must take into account the linearity requirements and the scheme’s complexity. Continuous adjustment schemes can achieve maximal interpolation linearity, at the expense of circuit complexity and higher noise sensitivity. For example, it is possible to vary the depth of the depletion region along the length of a diffused resistor (see Figure 3) by changing the voltage drop across the parasitic junction. This results in a change of the cross-section of the resistor, and therefore of its distributed resistance. The depletion region across the junction also acts as the dielectric of a distributed capacitor. Therefore a change in its depth affects its capacitance. These resistance and capacitance variations have opposite effects on the time constant of the delay line, but the overall result is a continuous control of the line’s propagation delay. Page 119 However, this method presents some drawbacks that render its implementation impractical. The depletion region extends mostly into the less doped n-well, leaving little control of the line resistivity. The control voltage range limits the amplitude of the progressing signal. The signal, in fact, also influences the depletion region width, making the time constant of the line a complex function of the signal itself. in n+ out diffusion p+ depletion region n-well n- substrate p- Figure 3: Continuous delay adjustment scheme based on control of the distributed parameters (simplified). Discrete adjustment methods provide a better solution for our application. Their implementation can be simple and the noise sensitivity of the adjustment scheme can be quite low. A time interpolator that uses these methods has, by their discrete nature, lower linearity. Fortunately their non-linearity can be limited to very good levels by a careful choice of adjustment range. Of the several possible schemes for discrete adjustment, two will be described shortly. 9.2.1. Adjustable delay line by tap selection. One implementation of the discrete adjustment scheme for an RC delay line is to divide it into a large number of small segments. Their extremities are made accessible via buffered outputs, as shown in Figure 4. Calibration of the line consists in selecting the outputs that best approximate the interpolation linearity criteria. Since the delay line time constant has a strong dependency on parasitic technological parameters, and these are only weakly controlled during IC production, wide delay variations are expected from one circuit to the other. This leads to some overlap between the adjustment range of consecutive taps. Therefore, it must be possible to connect some of the segment outputs to various taps. Page 120 Chapter 9: Architecture Overview. ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C calibration tap n-1 tap n tap n+1 Figure 4: Adjustable delay line using a tap selection scheme. All the output buffers are identical and due to the symmetry of their operation their delays can, to a first approach, be subtracted and factored out. However device mismatch and temperature gradients will affect them differently, contributing to the degradation of the interpolation linearity. These effects can be minimised by careful buffer design and are in fact taken into account when the line is calibrated. 9.2.2. Adjustable delay line by lumped capacitor selection. Another implementation of the discrete adjustment scheme is to insert a variable lumped capacitor in selected positions along the delay line, as in Figure 5. These capacitors participate in the definition of the line’s time constant, therefore changes in their capacity affect the delay of the line. The variable capacitors can be made of a bank of unit-sized capacitors that can be selectively connected to an RC delay line node in order to obtain the best interpolation linearity. As before, the effects of delay mismatch of the tap buffers are factored out during calibration. Contrary to the previous scheme, where the adjustment of the position of one tap does not affect any other tap, in this scheme calibration is obtained by changing the delay properties of the line. Therefore the adjustment of the delay of one tap affects the delay of the whole delay line. An iterative adjustment procedure will adequately take into account this effect. ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C ∆R, ∆C calibration ( independent calibration per tap ) tap n-1 tap n tap n+1 Figure 5: Adjustable delay line using a variable lumped capacitor scheme. Page 121 9.3. Auto calibration. The automatic self-calibration of the time interpolator is a key part of the architecture. The DLL closed control loop is able to perform constant self-calibration, tracking temperature and supply variations. The passive RC time interpolator, on the other hand, requires initial calibration so that its delay matches the delay of a DLL delay cell. The calibration could be performed at production test time, either by laser trimming or by pre-programming calibration parameters in a ROM-like structure. However this method would be expensive and would limit the correct interpolator operation to a very specific reference frequency, leaving the user no with flexibility to adapt the circuit to his particular needs. A more flexible calibration procedure can be obtained if internal means are provided for in-situ start-up calibration. Collection of hits generated at random time intervals offers an accurate method of characterising the interpolator [2] (see Appendix D). If the hits are collected into time bins corresponding to the output codes and these are histogrammed, the resulting count differences accurately represented the size of the bins. Using this simple procedure, the whole interpolator can be characterised. The characterisation obtained can be used to identify the calibration corrections necessary. This procedure requires a random hit generator and a simple arithmetic unit. Hits generated from a simple, slow, oscillator can be used for characterisation. The main requirement is that the oscillation frequency is such that it doesn’t beat with the reference clock. A sufficient condition to satisfy this requirement is that the ratio of its frequency and the frequency of the reference clock is a rational number given by the ratio of two prime numbers [3] (Appendix E). The arithmetic unit needs only a few accumulators and comparators. The calibration can be performed in an iterative fashion, thereby improving its accuracy. 9.4. The prototype. 9.4.1. Choice of technology. A demonstrator circuit was implemented in order to explore the capabilities of the proposed architecture. A major goal of this work is to define architectures that are well suited for high-resolution time measurements, independently of the technology in which they are produced. Therefore no special features should be required apart from the ones available in standard CMOS technologies. Since actual results are partially determined by technological properties such as transistor transconductance, gate capacitance, parasitic resistance and capacitance, a fair comparison of the capabilities of the architecture is best obtained if the technology used for the ADLL demonstrator is also used to build the demonstrator of this architecture. Page 122 Chapter 9: Architecture Overview. Furthermore, to emphasise the suitability of the architecture to standard technologies, the same 0.7µm CMOS technology was used for this prototype. 9.4.2. Prototype characteristics. The prototype includes all the blocks necessary to demonstrate the proposed architecture: The complete time interpolator, together with the respective hit registers, a simplified read-out control unit, a serial programming interface and wide bandwidth differential receivers. The key feature that should be verified with this demonstrator is the ability to perform internal calibration. The adjustment algorithm is made only of registers and combinatorial logic. These are easily implemented using standard cell libraries available for most commercial technologies. It was, therefore, decided that it could be implemented in software, allowing for a higher flexibility of the demonstrator. The calibration hit generator, on the other hand, should be implemented in the circuit to evaluate the correctness of the assumption that a hit frequency with the required characteristics can be generated inside the circuit. The prototype is schematically represented in the block diagram of Figure 6: R-C delay line tap selection adjustment scheme channel 0 φ ref. clock RC delay line channel 1 hit generator read-out controller hit registers hit registers lumped capacitor adjustment scheme calibration interface Figure 6: Block diagram of the prototype. In this prototype, the two interpolation schemes previously described where implemented using a single shared DLL. Together they form a two-channel Time-toDigital Converter. Each interpolator channel is made of the differential receiver, the signal Page 123 selection multiplexer, the adjustable RC delay line, the hit registers and the shared DLL. It was shown in Chapter 6 that the integral error due to cell mismatch in a single DLL is a function of the cell mismatch σcell and of the number of cells N that make up the delay chain. The maximum standard deviation of this error has the following expression, at the centre of the delay chain: σ DLL = σ cell N ⋅ . µ cell 2 It is therefore important to build the delay chain with a small number of delay cells. In this prototype, we chose N=16 cells as the best compromise between reducing the integral error due to mismatch and keeping the reference clock frequency within the limits imposed by the technology. The DLL delay cells to be used in this circuit have the same time characteristics of the ones in the previous (ADLL) circuit. The same cells are therefore used, together with the same control loop building blocks. The cell reutilization is advantageous since they have already proved to have the necessary characteristics in terms of control range, of matching and of noise sensitivity. Since some implementation details are common, the comparison between architectures also becomes easier. To obtain the intended ~390ps time interpolation in the outputs of the DLL, a reference period of 6,250ns, corresponding to a frequency of 160MHz was used. As shown in the block diagram, the reference clock is only used in the DLL, the read-out and calibration interfaces are asynchronous to this clock and work at lower frequencies. The high interpolation factor is obtained using either of the two adjustable RC delay line schemes already described. In both schemes, the delay of the DLL delay cell is divided into M=8 similar time intervals, resulting in a LSB of ~48.8ps. A total of M·N=128 hit registers are required to achieve full reference period coverage for each channel. The hit signal integrity is of paramount importance, since the critical time information is mostly contained in the high frequency components of the signal. Differential receivers are used in all external time critical signal paths so as to avoid the common noise coupled to these signals as they traverse the system outside the circuit. A simple hit generator is also included. It is built as a slow, free running, fiveinverter ring oscillator whose output frequency is further divided by an 8-bit ripple counter. The oscillation frequency is selectable via a program word. In a final circuit the oscillator frequency must have a fixed relation to the reference clock frequency, defined by the relations established in Appendix E. Since the clock frequency may change depending on the application it is reasonable to generate the calibration hit signal based on the reference clock (see also Appendix E). Page 124 Chapter 9: Architecture Overview. The externally generated calibration parameters are fed to the delay lines via a calibration interface. Changes on these parameters are only performed at start-up time, when calibration is being performed. Therefore, a slow, serial interface is used. Tap Selection In the photograph of Figure 7 the main functional blocks of the prototype are highlighted. The circuit uses 10.7mm2 of silicon and was packaged in a 68 pin ceramic JLCC package. Hit Registers DLL Hit Registers Oscillator Lumped Capacitor Figure 7: Prototype circuit showing main functional blocks. 9.4.3. Performance analysis. Timing characteristics. The configuration proposed for this converter results in a LSB of Tm=48.8ps. The theoretical RMS resolution σq is determined by the quantising performed during conversion: σq = Tm 12 = 14.1ps . Matching limitations of the DLL degrade the conversion resolution. The maximum cumulative effect of cell mismatch is seen in the middle of the DLL delay chain (see Chapter 6). Assuming a mismatching (σmatch) of 1%, the additional RMS error due to the DLL is: Page 125 σ DLL = σ match ⋅ N ⋅ Tm ⋅ M = 7.8ps . 2 The calibration of the RC delay lines acts on its integral non-linearity in such a way as to limit it to acceptable values. A worst case ±0.5LSB delay line non-linearity results in an additional RMS error of: σ dl = Tm ⋅ 0.5 12 = 7.1ps . Jitter intrinsic to the closed control loop of the DLL is estimated to be on the order of σjitter=8ps. Adding all these contributions quadratically, the estimated intrinsic RMS resolution should be ~19.3ps (0.40LSB). External sources of errors, such as reference clock jitter are not included in this estimation. The tests performed with this prototype, and the measurement results, will be discussed in Chapter 12. Power dissipation. In DLL based converters, power is mainly dissipated in the DLL itself. Reduction of the power needed for the switching of its delay cells is mainly hampered by the device mismatch. Since the matching requirements are quite high for these architectures, reduced power dissipation per cell would come at the price of reduced resolution. In this architecture, power dissipation is reduced by minimisation of the number of DLL’s. The fine time interpolation is obtained using a passive delay line. Since the DLL is built with the same building blocks used in the ADLL circuit, the power dissipated by the DLL in this circuit can be estimated from what was measured in the previous prototype to be of the order of 180mW. Page 126 Chapter 10. Adjustable RC Delay Line using a Tap Selection Scheme. In this chapter, the implementation of the tap selection adjustment scheme is described. We start with a general analysis of how to build and analyse high accuracy RC delay lines. These lines must abide to some layout constrains: the line dimensions must match the dimensions of the circuits to which it interfaces and the delays are generated by parasitic devices. The particular characteristics of this scheme are then described, together with the calibration algorithm required to obtain the uniform time intervals. 10.1. RC delay line. An integrated microstrip RC delay line can be built from any of the interconnection layers available in the chosen technology. Diffused layers usually suffer a high temperature and supply voltage dependency, due to carrier mobility degradation and to the variation of the depth of the junctions’ depleted region [4]. A polysilicon layer, on the other hand, has a lower temperature dependency and negligible supply dependency (if built over the thick oxide layer). Metal (or silicided polysilicon) layers have even smaller environment dependency, however their small resistivity renders them impractical for delay generation applications. The polysilicon layer will, therefore, be used to build the delay line. The interpolating microstrip line spans a fraction (M-1)/M of the delay of a DLL delay cell, where M is the interpolation factor, regardless of the operating conditions. This delay is generally short and, traditionally, the line would be analysed as a lumped electrical element. However, such analysis would lack accuracy, since most of the critical time information is contained in the rising edge of the propagating signal. Accurate delay estimation must take into account the large bandwidth of the signal and thus the long electrical length of the line at high frequencies. In these conditions, transmission line analysis methods must be used. Several analytical and numerical methods to perform the transient analysis of a complex network of distributed RC lines have been proposed [5][6][7] resulting in equally complex expressions for the propagation delay along the network. A voltage step injected in an open-ended distributed RC line of length L propagates according to the following equation [8], where x is an arbitrary position along the line: Page 127 2 2 ∞ (− 1)k v ( x, t ) 1 x 1 t = 1+ ⋅ ∑ ⋅ cos k − ⋅ π ⋅ 1 − ⋅ exp − k − ⋅ π 2 ⋅ . 1 π k =1 vcc 2 L 2 RC k− 2 The total resistance R and capacitance C of the line are obtained from the distributed resistivity rsq and plate and fringing capacitance, respectively cplate and cfringing. ( ) L L2 RC = rsq ⋅ c plate LW + 2 Lc fringing = rsq c plate L2 + 2rsq c fringing . W W An important characteristic of RC lines that determines the dimensions of the interpolator is not evident from the propagation delay equation above: as the signal propagates along the delay line it experiences an apparent increase of propagation velocity. The reasons for this contra-intuitive effect can be found in the slow slope of the input pulse when compared to the propagation delay of the RC delay line. In such a short open-ended line, the reflected pulse travelling back along the line catches up the forward pulse before its level has crossed the logic threshold. Looking at Figure 1, if the overall pulse amplitude is observed at position x, the closer x is to the end of the line, the earlier is the superposition of the reflected pulse and the original pulse and, thereby, the fastest the edge of the overall pulse crosses threshold. R. x , C. x ~ x R.(L-x) , C.(L-x) L-x Figure 1: RC line divided in two segments at access point x. R and C are, respectively resistance and capacitance per unit length. The delay line interfaces with the rest of the interpolator through output buffers. Efficient layout style requires that these buffers have the same physical design so that the resulting structure is regular and no layout related mismatches occur. Since the signal edge does not propagate along the line at a constant velocity, an uniform delay division of the line is obtained only if the line is accessed at irregular distances. To accommodate these contradictory demands, the line is divided into equal delay segments that are positioned with a pitch similar to the pitch of the output buffers, as shown in Figure 2. The gaps opened in the line are filled with a spacer1 made of a conductor whose parasitic resistance and capacitance is small. These spacers can be built in the metal1 layer. They are included in the signal path therefore their contribution to the total line delay must be correctly evaluated. 1 The distinction made between microstrip delay line and spacer reflects only a functional difference. In reality they are microstrip lines made of different materials but embedded in the same silicon oxide dielectric and having as reference plane the IC substrate. In consequence they are both modelled as devices with distributed parameters. Page 128 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. Other solutions based on non-uniform lines are difficult to implement because of the small dependency of the delay with the line width, and the limited number of interconnection layers available. in segment of equal delay polysilicon microstrip layer segment of equal delay and equal length in metal1 spacer layer tap tap tap tap tap Figure 2: Delay line division into equally sized sections. 10.1.1. RC delay line simulation model. The complex propagation delay expression shown in the previous section does not lend itself to easy analysis. Approximate delay estimation methods have been developed for applications in the design automation domain [9][10]. Unfortunately, they tend to reflect a particular network geometry and the accuracy of the delay estimations is generally limited. In order to obtain an accurate estimation of the interpolator’s time characteristics, a simulation model was developed that includes all the elements that influence them. These include the polysilicon microstrip delay line segments, the metal1 spacers, the connection lines, the inter-layer contacts and the devices that make up the driver, output buffers and capacitors. The simplest and most accurate model of a uniform line (polysilicon or metal1) is obtained by dividing it into small segments. The number of segments should be enough so that each of these can be correctly modelled using a network of lumped elements. The overall behaviour of a complex line can be obtained by connecting the uniform line segments through the equivalent circuit of the discontinuities present in the network. HSPICE [11] has internal models for transmission lines (U-model) which internally divide the line into multiple T-network sections as the ones in Figure 3. However, in our work we chose to explicitly use T-network sections as the basis of the model. It is thus possible to avoid any dependency on the particular implementation of the simulator. In a microstrip line with the characteristics of the one under study, inductance Ll and dielectric conductance Gl are very small and, therefore, are not considered. The reference plane is modelled as a single node. In reality this plane is the lightly doped IC substrate, however Page 129 its resistivity Rref can be minimised if some layout rules are followed. These will be explained latter. Inter-layer contacts are modelled as single resistors whose values are extracted from the technology parameters. In reality their resistivity depends on factors such as current flow, and a small capacitance to the reference plane is present. However the total contact resistance can be made small by increasing its area, rendering its variation negligible. All other (lumped) circuit elements can be directly modelled using their equivalent circuit. 0.5.Rl.δx 0.5.Ll.δx 0.5.Rl.δx Gl.δx 0.5.Rref.δx Cl.δx 0.5.Ll.δx 0.5.Rref.δx line element length = δx Figure 3: Electrical model of an infinitesimal segment of a transmission line (the T-network). A detail of a section of polysilicon line together with the metal1 spacers and contacts is shown in Figure 4. The distributed electrical parameters are highlighted, for illustration purposes. The inter-layer contact is modelled as a resistor to which the capacitors corresponding to the ends of the connected layers are added, since they turn out to be significant for the line width being considering. metal1 Rm Cm l(plate+fringe) l contact Rc polisilicon Rc Rp Rm l Ct Ct Cpl(plate+fringe) Rsub l Cm l(plate+fringe) substrate thick oxide T-network T-network Rc Ct metal1 contact T-network T-network T-network Ct Rc Ct polysilicon contact T-network T-network Ct metal1 Figure 4: Detail of the physical microstrip line and its equivalent simulation model. A sample of the Spice model of a delay line with dimensions W (width) and L (length), divided in N infinitesimal lumped elements is shown in the next lines. It includes a single T-element plus the contact. Page 130 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. .subckt T-element in out ref (layer parameters, N) r1 in 1 ‘Rsq_layer*L/(W*N*2)’ r2 1 out ‘Rsq_layer*L/(W*N*2)’ c1 1 ref ‘Cpl_layer*L*W/N+Cfr_layer*2*L/N’ .ends T-element .subckt Contact in out ref (layer parameters) c1 in ref ‘Cfr_metal1*W’ c2 out ref ‘Cfr_polysilicon*W’ r1 in out ‘Rcontact/(W/2) .ends Contact Rsq, Cpl and Cfr are, respectively, the resistivity, the plate and the fringing capacitance of the respective layer. Rcontact is the resistance of the contact. The parameter spread inherent to the fabrication processing is included in the model through three different sets of technology parameters. Each set corresponds to representative corners of the process distribution, the centre and the two tails. The effects of temperature and supply variation are correctly taken into account in the active device models. This is not the case for parasitic devices, such as the microstrip delay line. However, since this dependency is small, it can safely be disregarded. 10.2. Tap selection delay line. The design of an RC delay line conforming to the requirements of the built converter starts by the definition of the general dimensions of the line and of the number of access points needed. At this stage only the overall properties, such as the total delay, the total length and the width of the line are important. Each line segment is made identical, for simplicity. After having defined these properties, individual line segments can be adjusted so that their delay becomes identical but the overall line delay does not change, resulting in the desired RC line characteristics. In the following lines, a more detailed description of the general design guidelines that were followed is carried out. Definition of the line width: The microstrip line should be made wide so that its distributed characteristics dominate the interpolator’s behaviour. If this was not the case, the temperature and supply sensitivity of the lumped loads connected to the line could undermine its behaviour. The line should also be wide enough to minimise the dimensional uncertainties due to IC processing. Page 131 In the tap selection adjustment scheme, the buffers are the only significant loads connected to the line. These are simple two-stage buffers made of static inverters. The input inverter transistors’ gate area defines the lumped loads attached to the RC line. Mismatch considerations lead to the utilisation of large gate areas for these transistors. First order calculations based on technological parameters result in a total gate capacitance of ~33fF. Since the gate capacitance has only a weak dependency on temperature and supply voltage, it is enough that the distributed capacitance of each line segment has a larger value than the lumped capacitor. A line width of 52µm is sufficient to obtain these characteristics. Definition of the number of access points: The number of access points is determined by the adjustment scheme followed. For the tap selection scheme, the criteria is to define the maximum allowed time interval between access points that results in an acceptable linearity after line calibration. Given a LSB of 48.8ps, a maximum non-linearity of ~15ps (less than 1/3 LSB) is accepted. Therefore the maximum delay between access points has been set to 30ps. Using the simple definition of time constant, τ=RC, as a rough approximation of the microstrip line delay, a time constant variation of ±30% is found as process parameters are changed, for the selected technology. Conversely, to obtain a worst case access interval of 30ps, separation in typical conditions should not be bigger than ~21ps. Dividing the line into 32 segments (defining 33 access points) more than covers this requirement. Definition of the line length: A total delay of 350ps (~LSB·(M-1)) must be achieved regardless of the process corner. The same considerations as before show that in the fast corner, the time constant is ~30% smaller than in typical conditions. Conversely, if the line covers 350/0.7=500ps in typical conditions, then the initial condition is met in any operating conditions. The line length is determined from parametric simulations of the complete interpolator model, including all the devices connected to it. The output buffer pitch defines the length of the line segments between access points. During simulations the length of all the microstrip segments is simultaneously varied until the total required delay is obtained. Assuming a buffer pitch of 31.2µm, the required overall line delay is obtained when each of the 32 segments includes a polysilicon microstrip line 7.4µm long and a metal1 spacer of 23.8µm. The resulting total distributed capacitance in each line segment is larger than the buffer input capacitance, as desired. Page 132 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. Adjustment of the delay of the line segments: With identical segments all over the line, the delay between access points is smaller towards the end the line. Adjusting these delays could be done following a trial and error procedure, but instead a simpler approach was used: delay delay The previous step resulted in a constant microstrip length vs. segment curve, and in the corresponding non-linear delay versus segment curve. If an analytical function that transforms the delay curve into a constant curve is found, the corresponding microstrip length curve can be obtained using the same transformation (see Figure 5). f(segment) f-1(segment) segment tap length length segment segment metal1 spacer microstrip line -1 f (segment) segment segment 0 m 0 m Figure 5: Delay line segments’ length adjustment. This transformation is valid if the microstrip line is uniform and the edge propagating along the line has constant characteristics. This is not the case of the line under study, since metal1 spacers interrupt the microstrip line and the output buffers load the line in discrete points. However the uniform line approximation has enough accuracy since the metal1 spacers, due to their low resistivity, have little effect on the delay characteristics of the line. The first design criteria also guarantees that the characteristics of output buffers can be neglected in this analysis. The signal characteristics along the line are, to a large extent, invariant. The original delay vs. tap curve can be accurately described by a high order polynomial. In this case a fifth order is accurate enough: delay(x) = a0 + a1 ⋅ x + a 2 ⋅ x 2 + a3 ⋅ x 3 + a 4 ⋅ x 4 + a5 ⋅ x 5 , where the polynomial constants are obtained from a best squares fit to the delay curve. The inverse function converts the curve into a constant value. It is sufficient to multiply this result by the desired segment delay (delayave) to obtain the required transformation: F( x) = delay ave ⋅ delay −1 ( x) = delay ave a0 + a1 ⋅ x + a 2 ⋅ x 2 + a3 ⋅ x 3 + a 4 ⋅ x 4 + a 4 ⋅ x 5 . Page 133 The multiplying factors obtained for each segment of the actually implemented line are shown in Figure 6. The factor that corresponds to the buffer pitch is also shown. The transformation results in three segments being larger than the buffer pitch. This leads to a longer delay line, which in turn affects the total line delay. 8 7 A d ju stm en t F u n ctio n 6 B u ffe r P itch 5 4 3 2 1 0 0 4 8 12 16 20 24 28 32 seg m en t Figure 6: Adjustment function values. The lengthening of the line after application of the transformation stems from the, limited, inaccuracy of the uniform line approximation used. In particular the assumption that the characteristics of the signal propagating along the line do not change is not true. The rise time of this signal is longer towards the end of the line, as shown in Figure 7. 2.15 2.1 rise time (ns) 2.05 2 1.95 1.9 1.85 Original 1.8 Adjusted 1.75 0 4 8 12 16 20 24 28 32 segment Figure 7: Signal’s rise time along the original and adjusted delay line, in typical conditions (simulated). However, the deviation caused by this assumption is small and it is effectively countered by designing the original line with a shorter delay range. Simulations of the adjusted line result in the graphics shown in Figure 8. A maximum segment delay nonlinearity of 4ps is found under typical conditions. Only minor linearity degradation is observed as operating conditions are varied. These simulations confirm that the maximum segment delay is 31.3ps and that the line spans a minimum of 378ps thus abiding to all design criteria. Page 134 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. Other considerations: An RC line has a low-pass filter behaviour. The attenuation of the high frequency signal components as it progresses along the line contributes to the delay characteristics of the line, due to the degradation of the edge slope it provokes. This effect should be kept small, so that the uniform line approximation we have been considering is valid. Therefore, the line should be made such that the edge slope along the line segments used for time interpolation is constant or has only a small degradation, regardless of variations on the input signal due to temperature or supply variations. 35 1000 900 cummulative delay (ps) 30 delay (ps) 25 20 15 10 5 typical 800 700 fast slow 600 500 400 300 200 100 0 0 0 4 8 12 16 20 24 28 0 32 4 8 12 16 20 24 28 32 segment segment Figure 8: Delay and cumulative delay of each line segment (from simulations). The inclusion of a leading adaptation section in the beginning of the line is a simple way of achieving this goal. This section is not directly used for time interpolation, but it adapts the signal bandwidth to the delay line’s characteristics. The signal delay due to the input adaptation section results in an added offset to the measurements, however it does not influence the time interpolation function. The signal’s velocity increase along the line is very marked in the last interpolation segments. The adjustment function would thus generate very large multiplication factors for these segments. The resulting long microstrip segments would make an inefficient interpolator layout. The use of a trailing adaptation section to behave as a load to the last segments of the line allows for a smaller spread of the segment delays and thus, shorter final segments are possible. The length of the adaptation sections is limited by the driving capability of the input driver. The use of these adaptation sections is illustrated in Figure 9. leading section spacer trailing section taps Figure 9: The leading and trailing adaptation sections. Page 135 The graphs in Figure 10 clearly show the effects of the inclusion of a leading and a trailing section of 79µm length. They result from simulations of the complete interpolator, including input driver and output buffers. The segment delay sensitivity to operating conditions is minimal if these sections are included, whereas it increases if they are excluded. The absence of trailing section also generates very small segments towards the end of the line. 22 22 line with leading and trailing sections 20 line without leading and trailing sections 20 18 18 16 16 14 14 12 5V/25C 12 10 4.5V/100C 10 8 5.5V/0C 8 6 6 0 4 8 12 16 20 segment 24 28 32 0 4 8 12 16 20 24 28 32 segment Figure 10: Segment delay sensitivity to operating conditions (from simulations). The first and second graphs correspond, respectively, to the same line with and without leading and trailing sections. All the graphs presented so far are obtained from simulations of the complete model of the RC delay line. This model assumes an ideal reference plane for the distributed line, which is only roughly approximated by the lightly doped p-substrate used. In order to reduce the reference plain resistance a ground connected wide guard-ring structure is implemented enclosing the RC line. This way the path of the charges displaced on the substrate as the signal progresses through the line is reduced and its effective resistance is small. The guard-ring also collects charges that are coupled to the substrate by other devices on the circuit, therefore obtaining a better isolation of the RC delay line. 10.2.1. Tap selection circuitry. The selection of access points for the taps is performed after the output buffers, so that it doesn’t influence the delay line. To achieve maximum design flexibility, it was decided that all access points be accessible to all the taps. This results in a somewhat complex connectivity and in a long serial selection chain, as shown in Figure 11. The selection of the actual access point is performed by the assertion of the respective programmable selection bit. This closes the adjoining transmission gate switch, establishing the intended connection. No hard-wired restriction exists to the parallel connection of a tap to more than one access point, which would result in a finer time interpolation. However this option will not be used since it would require an unnecessarily complex calibration algorithm, leading to increased silicon consumption. Page 136 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. The programmable serial chain is quite long, having 256 bits. It should be noticed that the program word can be loaded at a low rate and that once the final calibration parameters are established, all activity in this circuitry is stopped, reducing the power dissipation and eliminating potential noise sources. The full selection circuitry shown in Figure 11 uses 0.85mm2 of silicon. The area occupied by this block can be reduced by limitation of the selectivity of the access points. RC delay line access points 0 sel. out sel. in sel. strobe 32 serial selection chain 0 7 taps Figure 11: The access point selection circuitry. 10.3. Auto calibration circuitry. The adjustable line requires some means of automatic calibration in order to be complete. The calibration procedure can be divided into two major steps. In a first step the delay line is characterised (characterisation step). These characteristics can then be used to compute the access points that tune the taps to the required position (tuning step). Characterisation step. The characterisation of each segment of the RC delay line could be done using the delay of one DLL cell as a reference. However, the delay of these cells suffers some variation due to mismatch and, therefore, they are not a good reference. Since the number of bins into which the reference period is divided is fixed, this knowledge can be used to derive the size of the ideal bin (LSB) and use it as a reference for calibration. A statistical code density test (CDT) [2] offers a characterisation method with the required properties and, furthermore, is easily implemented on chip. The code density test applied to a time interpolator requires the collection of a large set of random hits. These hits are registered and the number of hits collected for each possible output code (or bin) is Page 137 histogrammed. The difference between the bin contents is a direct measure of the relative size of each time bin. The histogramming can be performed for all the individual time bins in the circuit ( M ⋅ N bins) to obtain a detailed characterisation of the combination of the DLL, the RC delay line and the hit registers. However only the RC delay line must be characterised. Therefore, the values corresponding to the same RC line bin can be summed across the DLL, effectively obtaining an average measure of the line across the DLL. An added advantage is that the effects of hit register mismatch are also averaged, therefore an accurate characterisation of the size of the RC line bins is obtained. The size difference between bins due to mismatch of the output buffers is indistinguishable from the difference due to mismatch of the actual line segments. It is, in fact lumped together with it and so the line characteristics obtained reflect this increased error. However this is advantageous, since in this way also the buffer delays are calibrated. Tuning step. In the tuning step the measured line characteristics are analysed. Non-linearity surpassing a given limit is identified and correction measures computed. These measures are then translated into a calibration word that is serially programmed into the adjustable delay line. Computation complexity, and therefore the amount of hardware needed, depends on the amount of information that can be extracted from the characterisation step. A trade-off can be established between these two steps. A faster calibration requires a larger hardware block and a slower calibration can be performed with little hardware. Two calibration algorithms, representing the two extreme cases, will be presented. In the first one, an iterative procedure is established where a global line characterisation is used to make small adjustments to the line. The procedure is repeated until the line has been tuned to the desired linearity range. This algorithm requires a small calibration hardware block, but it may result in long calibration time for extreme parameter deviations. In the second algorithm, a lengthy, but complete, characterisation of the line is performed. From this the calibration parameters are obtained in one step at the expense of significant hardware requirements. 10.3.1. Calibration algorithms. The RC delay line adjustment allows only for a discrete number of adjustment options, therefore the accuracy of the calibration results are limited by the adjustment quantising step. In the calibration algorithms that we developed for this purpose, the concept of tolerance, or of non-linearity limit, is used to express the maximum calibration tolerance allowed for a given application. In the case of the iterative algorithm, the calibration tolerance can be traded of for calibration time. Page 138 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. The algorithms presented here must be simple to implement in hardware, therefore INL was chosen as the only accuracy criteria. Integral non-linearity error, due to its cumulative action, is the limiting factor in the overall linearity of the converter. Algorithms that also take into account DNL as an accuracy criteria are presented in Appendix G. Their hardware implementation is more complex, and their convergence slower. Iterative algorithm. The starting point of this algorithm is the bin size histogram, obtained after running the characterisation step with the calibration parameters extracted from simulations corresponding to the typical process and environment conditions. Each iteration of the algorithm consists in the sequential analysis of a bin to verify if it conforms to the nonlinearity limits. If this is not the case, new calibration parameters, corresponding to the addition or subtraction of one delay segment to the respective tap, are calculated. The same variation is applied to all the taps in front of it so that the time difference between these taps (the bin size) is unchanged. These steps (characterisation, analysis and tuning) are repeated until the bin linearity conditions are met. The procedure is then repeated for the next bin in the sequence. The analysis of the linearity of a bin is based on the bin cumulative histogram ch[bin]. It is compared to the ideal histogram (developed from the knowledge of the ideal converter’s bin size LSB). The following operations check if the line conforms to the integral linearity limit and takes corrective measures for the offending bins. for i= 0 to M-1 tap[i]= segment_from_simulation_of_typical_conditions; for bin= 0 to M-2 repeat until no_changes Characterisation step; if ( ch[bin]< LSB·( bin+1-limINL)) for i= 0 to M-bin-2 tap[bin+i+1]= tap[bin+i+1]+1; else if ( ch[bin]> LSB·( bin+1+limINL)) for i= 0 to M-bin-2 tap[bin+i+1]= tap[bin+i+1]-1; else no_changes Page 139 In Figure 12 the algorithm is illustrated. The acceptable limit of the integral nonlinearity is limINL. This limit must be chosen in accordance to the calibration steps available. Limits in the order of 0.5LSB guarantee sufficient linearity and only require a limited number of iterations per tap. The access point selection for each tap is captured in tap[i]. tap[all]=typical conditions for bin=0..M-2 CDT cumulative histogram[bin] repeat until changes=0 for i=0..M-bin-2 tap[bin+i+1]= tap[bin+i +1]+1 Y (bin+1-limINL).LSB < N changes=1 Y (bin+1+limINL).LSB for i=0..M-bin-2 < tap[bin+i +1]= tap[bin+i +1]-1 N Figure 12: Calibration procedure for the tap selection adjustment scheme. In Figure 13 the results of a simulated calibration run using the proposed algorithm with limINL=0.3LSB are shown. The interpolation non-linearity is kept within the established limits (0.3LSB). By construction, the algorithm doesn’t search for the optimal calibration parameters; it stops immediately after the non-linearity limits have been achieved. The calibration of the particular line conditions exposed required only 10 and 8 characterisation steps, respectively for the “fast” and for the “slow” parameter conditions. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 typical 0 1 2 fast 3 bin RC 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 slow 4 5 6 0 1 2 3 4 5 6 binRC Figure 13: Results of calibration for different conditions, using the iterative algorithm (from simulation). Page 140 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. The definition of the calibration starting point as being the typical calibration parameters reflects the probability of starting the iteration close to the final result. In fact, any starting point could be used since it would only affect the speed of convergence of the algorithm. If tighter linearity limits are enforced, it is possible to obtain better results. The graphs in Figure 14 where obtained with limINL=0.1LSB for the “fast” conditions. However, in worst case conditions (“slow”) this limit cannot be enforced, since the delay line segments are longer than that limit. If the linearity limit is set too tight, than the convergence of the simple algorithm here proposed may not be guaranteed. A simple way to solve this problem is not to allow the algorithm to oscillate between two calibration settings for any bin. The DNL graphs obtained after calibration was performed are also shown. They emphasise the fact that due to the regularity of the structure, the maximum DNL is smaller than what would theoretically be its limit (2·limINL). 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 typical 0 1 2 fast 3 binRC 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 slow 4 5 6 0 1 2 3 4 5 6 bin RC Figure 14: Results of calibration using the optimum linearity limit (from simulation). Single step algorithm. The first step of this algorithm is a detailed characterisation of the RC delay line, where the size of all the line segments are histogrammed. It is then possible to select the tap access points that lead to the best interpolation linearity. To characterise the 32 line segments into which the delay line is divided using only the 8 taps available, 5 characterisation steps are needed. The small overlap between the range that each covers is required to guarantee that also the segments in the extremities of each range are covered. After these 5 characterisation steps, all information required to build a cumulative histogram of the segment size is available and it is sufficient to compare this histogram with the ideal cumulative bin size curve to derive the desired access points. In the next few lines, an algorithm that finds the best possible calibration parameters for the line, regardless of the particular conditions, is schematically presented. The Page 141 algorithm finds the tap access points that result in the nearest approximation to the ideal cumulative bin size curve. tap[0]=0 ; for i=1 to M-1 for segment=0 to 31 if (ch[segment]< LSB·i & ch[segment+1]> LSB·i) if (LSB·i-ch[segment]< ch[segment+1]-LSB·i) tap[i]=segment ; else tap[i]=segment+1 ; In Figure 15 the results of a simulated calibration of the delay line using this algorithm are shown. The emphasis on minimising the integral non-linearity of the line is clearly seen in the graphs. The differential non-linearity is, anyway, kept within the accepted limits for any conditions. The same simulation conditions as before were used. Comparison with the results obtained using the iterative algorithm show that, if the linearity limits enforced when using that algorithm are tight enough, then similar results are obtained, as would be expected. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 typical 0 1 2 fast 3 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 slow 4 5 binRC 6 0 1 2 3 4 5 6 binRC Figure 15: Results of calibration for different conditions (from simulation). 10.3.2. Hardware implementation. Two variables determine the silicon area required to implement these calibration algorithms, the amount of memory needed and the complexity of the calculations needed. These may be traded-off for calibration time. To determine the amount of memory needed, the number of hits n that must be collected is determined from the formula developed in the Appendix D: Page 142 Chapter 10: Adjustable RC Delay Line using a Tap Selection Scheme. 2 z 1 n ≥ α / 2 ⋅ − 1 . β p We will consider the same tolerance (β=5% of the final bin) and confidence level (98%, corresponding to α=2%) for both cases, so that the number of required hits is only depending on the bin size that is to be characterised. In the iterative procedure the bin to be characterised corresponds to the interpolator’s LSB, with a hit probability p=1/M=0.125. In the single step procedure all the line segments must be characterised, regardless of the particular working conditions. The minimum bin that must be accurately characterised is then ~10ps wide, corresponding to a hit probability p=10/(LSB·M)=0.0256. The following table summarises the relevant numbers obtained when these calculations are carried out. The tolerance for the INL measurements is obtained using the expressions that were also developed in Appendix D. confidence tolerance number level (DNL) of bins 5% (LSB) 7 iterative 98% single step 98% 11% (seg.) 32 algorithm number of hits <16383 <16383 n 2 14 14 tolerance (INL) 13% (LSB) 62% (seg.) Table 1: Comparison of the two proposed algorithms. In this table the tolerance is measured a fraction of the quantity being measured, one LSB (~48.8ps) for the iterative algorithm and one minimum segment delay (~10ps). The same reasoning used for the determination of the tolerance of the INL measurements leads to the conclusion that the addition of a number of line segments to obtain the calibrated bin results in similar final DNL and INL measurement tolerances, expressed in LSB, for both algorithms. The register requirements for bin storage and histogram build-up for the two architectures are shown in Table 2. To each of these registers corresponds an equal length accumulator. histogram cumulative histogram number size (bits) number size (bits) iterative 7 12 1 14 single step 32 11 1 14 algorithm total (bits) 98 366 Table 2: Register (accumulator) requirements for the two proposed algorithms. The other comparison item is the complexity of the computing needed for each algorithm. The iterative algorithm, as shown in Figure 12, requires only a few comparators (see Table 3), one accumulator per bin, and a small amount of decision logic. The single step algorithm needs a larger arithmetic unit, capable of performing the more complex Page 143 decisions required. The silicon area that it uses is therefore much bigger than in the case of the simple iterative algorithm. algorithm number size (bits) iterative 2 14 single step 4 14 Table 3: Comparator requirements for the two proposed algorithms. The time used by each calibration algorithm is, to a large extent, determined by the hit collection time. The iterative algorithm does not have a fixed number of characterisation runs, so the calibration time will vary with the actual conditions found. However, if the number of iterations is f, then the time is proportional to f·214, whereas the single step algorithm takes a time proportional to 5·214, where the constant of proportionality is the collection time of a single hit. It is therefore clear that only if more ! than 5 characterisation steps are required (f the single step algorithm. Page 144 Chapter 11. Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. In this chapter an RC delay line adjustment scheme using banks of selectable capacitors will be analysed in detail. We follow the same analysis method that was pursued for the tap selection adjustment scheme. We will only concentrate on the features that differ from the previous chapter, referencing to it the relevant common topics. 11.1. Lumped capacitor delay line. In the lumped capacitor adjustable delay line scheme, the adjustment of the RC line is performed by lumped load variation. This load is an important contributor to the overall delay therefore the uniform line approximation previously used is no longer valid. A slightly different set of design rules applies to this line: Definition of the line width: The width of the microstrip line is mainly defined by layout considerations. It should result in a good compromise between two conflicting requirements. The line should be kept wide enough to render dimensional uncertainties due to IC processing small1 and to lower the contact resistivity. However, it should be made narrow so that its overall capacitance is small and that small selectable capacitors can be used to adjust the its delay. The capacity of the unit capacitor used is ~37.5fF, therefore a line width of 40µm results in an acceptable calibration sensitivity. Definition of the number of access points: The number of access points is predefined by the intended interpolation factor M. The number of required access points is M=8, corresponding to M-1=7 line segments. The RC line dimensions must match the dimensions of the output buffer and associated delay adjustment circuitry. The 7 segments into which the line is divided include a polysilicon microstrip line and a metal1 spacer. 1 This condition is not strictly necessary since any delay mismatch due to these uncertainties can be corrected during calibration. However, to enable the utilisation of the calibration parameters derived for one channel in several channels, it is convenient to minimise the mismatch between delay lines. Page 145 As will be shown later, the last taps along the line have smaller adjustment sensitivity (see Figure 5) since their delay can only be adjusted varying the capacitors in front of it. To extend the adjustment range of the last taps, an extra adjustment point is introduced after the last access point. For reasons of symmetry of the timing characteristics of the line, this adjustment point is treated as another access point. Therefore the number of access points implemented is M+1=9, the line being divided in 8 segments. Definition of the line length: The total line length is defined as for the previous scheme. A total delay of ~350ps, corresponding to 7 segments of 48.8ps must be covered, regardless of operating conditions. A parametric simulation of the complete interpolator model was again used to obtain the correct overall delay. However, since similar segments are used, the delay of each of the line segments changes considerably along the line. Given a pitch of the adjustment circuitry of 50µm and typical working conditions, the required overall line delay is obtained when each segment is made of a polysilicon microstrip 35µm long and a metal1 spacer of 15µm. The middle calibration parameters are used, resulting in a capacitance of ~150fF connected to each adjustment point. Adjustment of the delay of the line segments: The distributed line parameters are not dominant in this scheme therefore the procedure previously used is not accurate. It results in a rough first approximation that should be improved by means of parametric simulations. These simulations include the lumped capacitors that make up the calibration scheme. The calibration is performed by addition, or subtraction, of ~37.5fF unit capacitors from a bank middle capacity value of ~150fF. The multiplication factors obtained from the transformation function previously developed applied to this line and the ones actually implemented are shown in Figure 1. 7 Calculated 6 Actual 5 Buffer Pitch 4 3 2 1 0 0 1 2 3 4 5 6 7 s egment Figure 1: Adjustment function values (calculated and actually implemented). Page 146 Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. The size of the RC line bins, after the adjustment has been performed, is shown in Figure 2. The effects of the parameter spread due to IC processing are clearly visible in the first graph. In the second graph only the environment conditions are changed, the calibration parameters are the same for all conditions. It demonstrates that only minor variation of the delay is provoked by extreme environment conditions. Other considerations: The same considerations developed for the previous scheme lead to the inclusion of leading and trailing section 210µm long. A longer leading section would lead to a reduced effect of varying input signal characteristics (due to environment changes) in the delay line. However, the driver capabilities would be unnecessarily stretched by this increase in output load. 80 60 70 55 60 50 50 45 40 40 30 35 5V/25C 30 4.5V/100C 25 5.5V/0C 20 typical 10 slow fast 0 20 0 1 2 3 4 5 6 binRC 0 1 2 3 4 5 6 binRC Figure 2: Bin size (from simulation). The first graph compares different design corners. The second graph shows the effects of extreme environment variation for the typical process. 11.1.1. Lumped capacitor selection circuitry. The variable capacitors implemented in each of the 9 access points are made of a bank of 7 unit sized capacitors that can be selectively connected to the RC delay line. The selection of the number of bank capacitors that are connected to the line is binary encoded. It is therefore possible to select 8 discrete capacitance levels, the resulting in a ±3 levels selection range. The capacitor bank is schematised in Figure 3. Each capacitor is made of a square 16µm PMOS device working in accumulation mode. This mode of operation results in a more linear and fast capacitor since the accumulation of charges under the gate guarantees their immediate availability. The temperature and supply voltage sensitivity of devices operating in accumulation mode is very low. Furthermore the n-well in which they are built increases their isolation from substrate noise. In typical conditions, each of these capacitors has ~37.2fF of capacitance. Unit-sized capacitors are used instead of scaled single capacitors to guarantee good matching of their values. 2 Page 147 from line to hit registers 1x cal<0> 2x cal<1> 4x cal<2> Figure 3: The unit capacitor bank. The selection of capacitors is made using a NMOS pass-transistor. This transistor is sized to have a high source-drain conductance. The conductance of a device is sensitive to temperature and supply variations: the quantity of thermally generated carriers and the saturation velocity of the carriers in the channel are a function of the device temperature. The electric field across the channel is a function of the gate voltage, itself proportional to the supply voltage. The conductance of the pass-transistor must be high enough, to minimise the effects of these variations. The pass-transistor is cut during a part of the signal excursion. In fact, as the input signal rises and the voltage on the gate of the capacitor follows, the Vgs of the passtransistor is reduced. When it is smaller than the threshold voltage Vth, the pass-transistor cuts its channel, therefore isolating the line from the adjustment capacitor. This, however, does not affect the timing characteristics of the line since it occurs well after the threshold voltage of the output buffer has been crossed. The signal edge progressing towards the following taps is not affected by the variations on the line characteristics occurring in the section of the line already crossed. In addition to the bank capacitance, a fixed capacitance due to the output buffer and to the diffusions of the pass transistors is also connected to the line. R-C delay line access points serial selection chain 0 7 0 7 taps Figure 4: The lumped capacitor selection circuitry. Page 148 8 Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. In Figure 4 the lumped capacitor selection circuitry is shown. Each capacitor bank is represented by a variable capacitor. The capacitor bank connected to tap0 is included only for layout symmetry purposes, since it does not affect the tap delay. This adjustment scheme gives a compact layout, the selection circuitry of each tap requiring only 6620µm2 of silicon, resulting in a total area of 0.25mm2. 11.2. Auto calibration circuitry. The auto calibration procedure follows the same basic steps previously described. It starts by characterising the line and proceeds to tune the calibration parameters in order to make the integral and differential non-linearity of the line smaller than a pre-determined limit. In this scheme the sensitivity of the delay between two taps (bin size) to a unit variation in a given capacitor bank is a complex function of the distance between the tap and the capacitor bank being changed and the position of the capacitor bank within the line. The graph in Figure 5 summarises the tap delay sensitivity to a unit change in each capacitor bank. It is not practical to identify all combinations of bin size sensitivity in all environment conditions. The calibration procedure must therefore be able to tune the size of the bin without this knowledge. An iterative procedure that follows the two step characterisation/tuning scheme is proposed to obtain the correct calibration parameters. 16 14 12 cap 1 cap 4 cap 7 10 8 cap 2 cap 5 cap 8 cap 3 cap 6 all 6 4 2 0 0 1 2 3 4 5 6 binRC Figure 5: The effects of lumped capacitor unit variation in the bin size (from simulation). The adjustment capacitor banks (cap1-7) are located, respectively, in tap1-7 and an extra capacitor bank (cap 8) is included in the end of the line to enable a wider tuning range of the last tap. The graph in Figure 5 shows that the sensitivity of the bin size increases as the varying capacitor is closer to it and that the cumulative effect of a unit variation in all capacitor banks is quite independent of the bin under consideration. The graph also shows the capacitor variations occurring before the bin under consideration do not change its size. The reason for this is that the properties of a signal propagating on an RC line are dominated by the characteristics of the section of the line that lay ahead of it. There is a small contribution from the line section behind it through Page 149 signal attenuation and edge slope degradation. However, for the short line under consideration, these effects are small. 11.2.1. Calibration algorithm. The starting point of the algorithm is the delay histogram obtained after running the characterisation step using the smallest capacitor selection in every bank. With these calibration settings the overall delay of the line and of the individual bins is guaranteed to be shorter than the required delay, regardless of the operating conditions. The calibration sequence tries to tune the delay line to the linearity limits following two procedures sequentially. In the coarse tuning procedure, the overall line delay is increased until it is close to the desired delay. The following fine tuning procedure individually adjusts the delay of each tap to make them conform to the linearity requirements. Delay tuning using this sequence is preferred to the use of the single fine tuning procedure because it results in faster convergence and, therefore, in better results. Coarse tuning procedure. In this procedure the capacity of all the banks is simultaneously incremented by one unit capacitor, resulting in a uniform increase of the size of all bins. The procedure is repeated until the cumulative bin size is smaller than the ideal delay by less than a determined limit limcoarse, which is set to 1LSB. In the following lines the procedure is schematically described: for bank= 1 to M cap[bank]= 0; repeat until ( ch[M-2] ·( M-1-limcoarse ) ) Characterisation step; for bank= 1 to M cap[bank]= cap[bank]+1; The calibration parameters for each capacitor bank are described by cap[bank] and ch[M-1] is the cumulative bin size histogram. A block diagram of the algorithm is shown in Figure 6, where the characterisation step is represented by the Code Density Test it performs. When coarse tuning has been completed, the size of each bin is similar for all bins in the line, to the extent of its matching characteristics. The delay error is therefore evenly divided among all the bins. The average differential non-linearity is then small and so the fine tuning procedure can mainly concentrate on adjusting the integral non-linearity. Page 150 Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. initial calibration repeat until changes=0 CDT cumulative histogram[M-2] for bank=1..M Y (M-1-limcoarse).LSB < N cap[bank]= cap[bank]+1 changes= 1 Figure 6: The coarse calibration procedure. Fine tuning procedure. After coarse delay tuning, the fine tuning procedure can be used. Each bin is evaluated one by one and a new set of calibration parameters is iteratively determined to adjust the line delay. The fine tuning procedure builds on the results obtained with the coarse procedure. Each bin is sequentially evaluated to determine if it adheres to the linearity limits. If that is not the case, the capacity of the respective capacitor bank is increased by one unit. This unit increase is repeated for all subsequent banks until a satisfactory result is obtained. Changing the capacitance of a capacitor bank affects all the bins that are located previous to it in the line. However, since the coarse adjustment step guarantees that the line is shorter than the ideal line and that all the bins have similar size, this effect contributes to improve the linearity of the line. The fine calibration algorithm is schematically presented in the next few lines. limINL is the differential and integral linearity limit. for bin= 0 to M-2 bank= bin+1; repeat until ( no_changes | bank> M) Characterisation step; if( ch[bin] < LSB·( bin+1-limINL )) cap[bank]= cap[bank]+1; bank= bank+1; else no_changes Page 151 This algorithm approaches the final calibration solution by small increases in the size of the bin, therefore only the inferior limits to the linearity need to be checked. The tap delay increase per fine characterisation step is not enough to surpass the superior linearity limits, in any conditions. In Figure 7 a diagram of the fine calibration algorithm is shown. from coarse calibration for bin=0..M-2 bank= bin+ 1 repeat until changes=0 | bank>M CDT cumulative histogram[bin] Y cap[bank]= cap[bank]+1 < (bin+1-limINL).LSB bank= bank+ 1 N changes= 1 Figure 7: The fine calibration procedure. On the graphs of Figure 8, the results of the coarse calibration step are shown for different simulation conditions. Since calibration started from the shortest possible line configuration, all the bins are smaller than intended and the expected downward slope of the INL curve is found. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 typical 0 1 fast 2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 slow 3 4 5 6 0 1 2 3 4 5 6 binRC binRC Figure 8: Results of the coarse calibration step for different conditions using the proposed algorithm (from simulation). Using restrictive limits in the fine calibration steps, an optimised calibration can be obtained. In Figure 9 the results of the fine calibration step are shown. The linearity limit limINL was set to 0.1LSB. In extreme conditions this limit proves to be too strict for the Page 152 Chapter 11: Adjustable RC Delay Line using a Variable Lumped Capacitor Scheme. simple algorithm proposed. However, the linearity of the line after calibration is better than 0.2LSB, in any conditions. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 typical 0 1 fast 2 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 slow 3 4 5 6 0 1 2 3 4 5 6 binRC binRC Figure 9: Results of the fine calibration for different conditions using restrictive linearity limits (from simulation). The number of calibration steps required for each of these conditions where 12, 15 and 7 steps, respectively for the “typical”, “fast” and “slow” simulation conditions. Since the calibration algorithm begins with the calibration settings resulting in the fastest possible RC delay line, the simulation conditions that lead to a slower starting point (“slow” conditions), require less calibration steps to converge into the final calibration settings. 11.2.2. Hardware implementation. This calibration algorithm is quite similar to the iterative algorithm proposed for the tap selection implementation of the line. The hardware requirements are also similar, since the number of taps to tune and the number of hits that should be collected for line characterisation are the same. The following tables resume the hardware requirements in terms of registers (and respective accumulators) and comparators. Requirements in terms of control logic are similar to the iterative calibration algorithm proposed for the tap selection adjustment scheme. algorithm iterative histogram cumulative histogram number size (bits) number size (bits) 7 12 1 14 total (bits) 98 Table 1: Register (accumulator) requirements for the present algorithm. algorithm number size (bits) iterative 1 14 Table 2: Comparator requirements for the present algorithm. Page 153 11.3. Comparing the two adjustment schemes. A simple comparison of the two adjustment schemes proposed in this part of the thesis shows that it is possible to adjust the linearity of RC delay line to the desired values. Although the calibration aims at obtaining a small integral non-linearity, the differential non-linearity that is achieved under any simulation conditions is also small. Simulations show that using the lumped capacitor scheme leads to better final results. However, these results are obtained at the expense of a longer calibration time. Due to the independence of the calibration of each tap, the calibration principle of the tap selection scheme is simple. The limit for the linearity that can be achieved with the RC delay line is determined the number of access points that are implemented. In the lumped capacitor scheme, the calibration of each tap is not independent, its change affecting several taps differently. The calibration algorithm takes into account all these effects, therefore its working principle is more complex and the calibration time is longer. However, due to the multiple combinations of effects that can be used, the final RC delay line linearity is potentially better. Page 154 Chapter 12. Experimental Results. In this chapter the results of tests performed on the TDC’s prototype are reported. The test procedure followed is very similar to the one detailed for the ADLL prototype in the previous part of this work, so it will not be described again. The performance of the two interpolation topologies will be shown separately. Their evaluation follows the same criteria: Linearity, temperature sensitivity, power dissipation and timing resolution. The calibration algorithms for the RC delay line where implemented in software, which has the advantage of allowing for high flexibility. For example, the calibration limits can be easily adjusted to the performance required. To generate the random hits, both an external pulse generator and an internal oscillator where used, without any noticeable difference. A set of 600,000 random hits is used to characterise the delay line. According to the calculations obtained in Appendix D, this results in a 98% confidence level that the measured results are correct within a tolerance of 0.8% (DNL) and 2.2% (INL). It should be noted that when characterising the complete converter, a tolerance of 3.4% (DNL) and 19.4% (INL) is obtained for the same confidence level. 12.1. Tap selection scheme. The graphs in the Figure 1 illustrate the results of the calibration of the delay line implementing the tap selection adjustment scheme, obtained using the iterative calibration algorithm. The graph labelled “before” represents the state of the line before calibration. In this situation the calibration parameters resulting from simulations of the typical conditions are used. The other graph (labelled “after”) is the final result of the calibration. Differential and Integral non-linearity of the RC delay line better than ±0.2LSB is achieved. It is noteworthy that the linearity of the line previous to calibration is close to the traditional 0.5LSB acceptance limit, which shows that the models used to describe the delay line are quite accurate. Page 155 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 before 0 1 2 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 after 3 4 5 6 before 1 7 2 after 3 4 5 6 7 8 binRC binRC Figure 1: Delay line calibration results: DNL and INL graphs. The DNL graph is repeated in Figure 2, together with the maximum and minimum delay measured for each tap in every hit register column. The spread in the measured delay results from timing mismatch of the hit registers corresponding to the same tap. It shows a maximum timing error spread of ~0.55LSB (27ps). The delay of the last tap (tap 8) is defined by the difference between the propagation delay of the hit signal along the RC delay line and the propagation delay of the clock signal along one DLL delay cell. The variation of its delay includes, therefore, a contribution from the delay mismatch of the DLL delay cells, which cannot be distinguished from the other contributions. Therefore it is not shown in this graph. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 min 0 1 max 2 ave 3 4 5 6 7 binRC Figure 2: Spread of the RC line tap delay over the DLL cells. The RC delay line was measured at several temperature conditions to verify its immunity to temperature changes. The results are shown in the graphs of Figure 3. The circuit was heated up to the specified temperatures using a heat source that could be moved closer or further away from the circuit. The temperature was measured directly on the package using an electronic thermometer. Only after the selected temperature stabilised was the characterisation performed. A different chip was used in this test, therefore the linearity graphs have different shapes from the ones previously shown. However, it is clear that the calibration procedure used also resulted in good RC delay line linearity. Page 156 Chapter 12: Experimental Results. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 30C 0 40C 1 2 50C 3 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 60C 4 5 6 7 30C 1 2 40C 3 4 50C 5 60C 6 7 8 binRC binRC Figure 3: Temperature dependency of the RC delay line. The delay variation of the complete line is measured from the variation of the delay of the last tap. This method is valid since the last tap is defined in one extreme by the temperature independent delay of the DLL delay cell. Any variation of the delay of the line will be reflected in a symmetric variation of the delay of the last tap. A total variation of 17,3% of an LSB is observed for a temperature increase of 30oC, which means that the delay of each RC line tap increased in average ~2.5%. This result can be extrapolated to the complete temperature range, resulting in a temperature sensitivity of only 0.83% per 10oC. Voltage supply sensitivity was also investigated. The procedure used was to characterise the delay line at different supply levels, within the allowed range for the technology. No significant delay variation was observed. 12.1.1. The complete interpolator. The RC delay line is an integral part of the time interpolator. Their correct integration is proven by the linearity graphs of the time-to-digital converter built from it. The graphs of Figure 4 correspond to the DNL and INL of the converter. 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0 0 -0.25 -0.25 -0.5 -0.5 -0.75 -1 -0.75 -1 -1.25 0 16 32 48 64 bin 80 96 112 128 1 17 33 49 65 81 97 113 bin Figure 4: DNL and INL graphs of the converter (using the tap selection adjustable delay line). Page 157 A maximum integral non-linearity INLmax=1.12LSB and differential non-linearity DNLmax=0.72LSB were measured. The non-linearity is a result of the delay mismatch of the DLL cells. Consequently, as shown in the graphs, it is found in taps corresponding to the transitions between successive DLL delay cells. The measured DLL delay cell mismatch is 3-4% (RMS), slightly larger than expected. It was shown on Chapter 6 that the contribution of the DLL cell mismatch σDNLDLL to the converter non-linearity σINLconvert. is determined by the by the following expression: σ INL convert. = σ DNL DLL ⋅ M ⋅ N . 2 Therefore, disregarding the contribution of the RC delay line, a maximum converter non-linearity of 0.5LSB (50%) requires a DLL cell mismatch smaller than 3.1%. Since this matching level has not been obtained, integral non-linearity of the interpolator is larger than the goal of ±0.5LSB. 0.25 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 max 0 1 2 3 min 4 5 6 ave 7 8 9 10 11 12 13 14 15 binDLL Figure 5: INL of the DLL, showing spread of the tap delay along the hit register rows. The integral non-linearity graph of the DLL is shown in Figure 5. The spread of the DLL tap delay along the eight hit register rows is also shown. The delay difference between these eight samples of the DLL is due to the mismatch of the hit registers, which leads to different sampling times for each tap in different rows. The maximum spread that was observed is 0.06LSBDLL (~25ps), which corresponds to 0.51LSB. This result agrees with what was previously obtained from the different samples of the RC delay line (see Figure 2). In Figure 6 the integral non-linearity graph of the interpolator is superimposed on the one of the DLL. The interpolator closely follows the DLL non-linearity, as would be expected since the non-linearity of the RC delay line can only accumulate along its limited length. Page 158 Chapter 12: Experimental Results. binDLL 0 2 4 6 1 0.75 0.5 0.25 0 -0.25 -0.5 -0.75 -1 -1.25 8 10 12 14 16 0.125 0.09375 0.0625 0.03125 0 -0.03125 -0.0625 -0.09375 DLL -0.125 -0.15625 converter 0 16 32 48 64 80 96 112 128 bin Figure 6: Comparison of the INL graphs of the DLL and of the complete converter. A statistical test such as the code density test just described is, by its nature, insensitive to random effects. This is an advantage when static characteristics are being measured. However, it is important to verify that none of the random noise mechanisms, such as electrical noise or phase noise (jitter), degrades significantly the dynamic characteristics of the converter. The effects of clock correlated noise can be interpreted as a static degradation mechanism, since they interact with the measurement the same way every reference period. They are, therefore, captured by the statistical tests. 1200 1000 800 600 400 200 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 error (LSB) Figure 7: Conversion error (σ=0.51LSB). A linear time sweep covering the complete clock period was performed. During this test 26,000 samples where collected, corresponding to 10 samples per step of ~2.4ps. The histogram of Figure 7 represents the conversion error along the full dynamic range of the interpolator. The distribution of the error has a RMS of σ=0.51LSB, with tails extending to ~1.5LSB. The same test was performed at different temperatures, to prove that the conversion error is not affected by temperature variations. The resulting histograms, displayed in Figure 8, show that only minimal temperature sensitivity is found. Temperature sensitivity of the RMS error is ~1.3% per 10oC. Page 159 1200 1000 30C 800 60C 600 400 200 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 error (LSB) Figure 8: Temperature effects on the conversion error (σ=0.50LSB/30oC and σ=0.52LSB/60oC). It may be interesting to evaluate the dynamic performance of the DLL itself, to understand its contribution to the overall conversion error. In Figure 9 the characteristic step-wise transfer function that results from a DLL time sweep is shown. Phase noise (jitter) present on the reference clock itself, or due to the dynamics of the DLL, force the output code transitions to jitter around their average value. If a number of DLL samples is taken close to the expected transition time, the output will vary between the two codes due to jitter. Variations due to the test set-up, such as small changes of the sampling time itself, are also included in this result, since they are indistinguishable from variations due to intrinsic jitter. This test enables the measurement of the DLL’s internal jitter. The time interval in which the output code uncertainty occurs corresponds to the peak-peak jitter seen on that transition. The maximum uncertainty is expected in the last transition. This can be verified in the graphs of Figure 10 that show the two code transitions occurring in the opposite extremes of the delay chain1. 16 14 12 10 8 6 4 2 0 14868 17522 20176 22830 25484 step Figure 9: DLL linear time sweep. 1 Tap 0 was implemented in the end of the delay chain, therefore it is the tap with the worst jitter. For convenience, bin 15 is renamed bin –1. Page 160 Chapter 12: Experimental Results. The second graph in that picture is a magnification of the transition from code –1 to 0, representing the jitter at tap 0. The “trend” line in that graph represents the relative number of samples in the two consecutive codes. From this curve, the average transition instant can be extracted and so the deviation of the transition occurrence (the jitter) is readily obtained. The peak-peak jitter for these two transitions was measured to be, respectively, 14.4ps and 19.2ps. To perform this measurements, 100 samples where taken for each time step of 2.4ps (equivalent to 4 “trombone” steps). The maximum jitter is measured to be σjitter DLL). The jitter that is observed in the first cell (σref !"$#&%')(*!+!,(*-!.-/ jitter of the reference clock as it arrives to the delay chain. In the end of the chain the dynamics of the DLL increase the uncertainty of the transition time. Assuming, (optimistically) that these two sources of jitter are uncorrelated, the jitter generated by the activity of the DLL closed loop is σloop012 346572 0.5 1.5 data 1 trend 0 0.5 0 -0.5 -0.5 -1 -1 -1.5 2350 -1.5 2550 2750 2950 3150 2368 2376 2384 2392 step 2400 2408 2416 step Figure 10: Detail of the DLL time sweep showing code transitions in opposite extremes of the delay chain. The DLL conversion error histogram in Figure 11 is obtained from the same set of data as the one in Figure 7. It shows that the conversion error of the DLL considered independently has an RMS of σDLL=0.29LSBDLL, with very small tails. 1600 1400 1200 1000 800 600 400 200 0 -1 -0.5 0 0.5 1 error (LSBDLL) Figure 11: DLL conversion error (σ=0.29LSBDLL). Page 161 The conversion error stems from several contributions, which add up to the total error. The main contributors to the conversion error are the quantising mechanism ( σ quant . = 1 12 LSB DLL ), the integral non-linearity (measured to be σINL=0.05LSBDLL) and the reference clock jitter (σjitter=0.01LSBDLL). σ DLL = σ quant . 2 + σ INL 2 + σ jitter 2 = 12 −1 + 0.05 2 + 0.012 = 0.29LSB DLL . The measured RMS error (σDLL=0.29LSBDLL) is in accordance with the expected value, demonstrating that no major error source was left unaccounted for. 12.2. Lumped capacitor scheme. The tests previously described were also applied for the channel using the RC delay line implementing the lumped capacitor adjustment scheme. Only the relevant results that highlight the differences between the two adjustment schemes will be discussed. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 1 2 3 4 5 binRC 6 7 8 1 2 3 4 5 6 7 8 binRC Figure 12: RC delay line’s DNL and INL graphs (using the lumped capacitor adjustment scheme). The graphs in Figure 12 represent the results of a calibration run of the RC delay line that uses the lumped capacitor adjustment scheme. The linearity obtained is well within the limits set forth for calibration. The DNL graph shows that the predicted matching characteristics of the line where not obtained. This also leads to a worse INL than what was predicted in simulations. The linearity graphs corresponding to measurements performed on the full converter are shown in Figure 13 and Figure 14. The integral non-linearity of the converter closely follows the, appropriately scaled, DLL non-linearity. This shows that the converter’s characteristics are limited by the DLL, as was also seen in the previous scheme. Page 162 Chapter 12: Experimental Results. 1.25 1 0.75 1 0.5 0.75 0.5 0.25 0.25 0 0 -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 1 17 33 49 65 81 97 1 113 17 33 49 65 81 97 113 bin bin Figure 13: DNL and INL graphs of the converter (using the lumped capacitor adjustable delay line). The differential non-linearity and the integral non-linearity are measured to be 0.70LSB and 1.03LSB, respectively. The main non-linearity errors are, again, found in the taps corresponding to DLL delay cell transitions. Comparison of the INL graphs of the DLL in Figure 6 and Figure 14 reveals the limitation of the DLL topology used. The different behaviour of the DLL when use together with each channel is most likely due to clock related noise coupling into the converter. The small non-linearity of the DLL is an important fraction of the bin, at the level of the time interpolation implemented in this converter. binDLL 1 1.25 1 0.75 0.5 0.25 0 -0.25 -0.5 -0.75 -1 3 5 7 converter 1 9 11 13 15 17 DLL 0.15625 0.125 0.09375 0.0625 0.03125 0 -0.03125 -0.0625 -0.09375 -0.125 17 33 49 65 81 97 113 129 bin Figure 14: Comparison of the INL graphs of the DLL and of the complete converter. The conversion error was measured using the same set-up as before. These measurements resulted in the histogram of Figure 15. A RMS error of 0.44LSB (~21.5ps) is obtained, and the maximum observed error is smaller than 1.5LSB. The resolution measured with this adjustment scheme is slightly better than with the previous scheme. The improvement is a consequence of the better DLL linearity obtained. Page 163 1200 1000 800 600 400 200 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 error (LSB) Figure 15: Conversion error (σ=0.44LSB). The DLL behaviour is expressed in the histogram of Figure 16. It confirms the correct dynamic behaviour of the DLL. 1600 1400 1200 1000 800 600 400 200 0 -1 -0.5 0 0.5 1 error (LSBDLL) Figure 16: DLL conversion error (σ=0.29LSBDLL). 12.3. Conversion time offset. In the measurements so far presented, the conversion time offset has not been investigated. Conversion time offset is a characteristic that cannot be considered independently from the extrinsic offsets generated by the acquisition circuitry before the converter. Offset variation internal to the converter, due to temperature changes or any other origin, can be measured and circuit techniques may be applied to reduce them. However, more important offset variations will be present, namely on the sensor, discrimination, signal shaping and driving circuitry. It is, therefore, important to characterise all the acquisition chain together when environment conditions changes are observed and absolute time measurements are to be acquired. This test is out of the scope of these studies. The maximal temperature dependency of the internal offset was measured using the tap selection scheme. It is of ~124ps/10oC. This dependency is a consequence of variation of the delay of the long internal hit signal’s path. In this circuit, no effort was made to compensate this variation with a similar variation in the reference clock path. Page 164 Chapter 12: Experimental Results. 12.4. Power dissipation. One important design goal of this circuit is to achieve reduced power dissipation per channel. An overall power dissipation of 0.22W was measured, which compares favourably with the results obtained with the ADLL architecture. 12.5. Summary of results. The main characteristics measured during the tests are summarised in Table 1. Some important properties of a TDC, such as multi-hit capability were not characterised, since no effort was made to optimise the prototype for them. Also crosstalk between channels was not investigated because the two channels are very dissimilar and cannot be used simultaneously. However, it is possible to extrapolate the results obtained with the ADLL based prototype to be confident that multi-hit capability is easily obtained and that small crosstalk is possible. adjustment scheme max INL σ converter max DNL σ RMS res. (σ) INL DNL R-C line area tap selection 1.12 LSB / 55 ps 0.44 LSB / 21 ps 0.72 LSB / 35 ps 0.18 LSB / 9 ps 0.51 LSB / 25 ps 0.15 LSB / 7 ps 0.21 LSB / 10 ps lumped capacitor 1.04 LSB / 51 ps 0.39 LSB / 19 ps 0.68 LSB / 33 ps 0.22 LSB / 11 ps 0.44 LSB / 21 ps 0.21 LSB / 10 ps 0.30 LSB / 15 ps 2 0.85 mm 0.25 mm common characteristics LSB 48.8 ps o temperature sensitivity 1.3% / 10 C power dissipation 0.22W number of channels 2 µ technology 0.7 m CMOS 2 area 10.7mm package 68 pin JLCC 2 Table 1: Characteristics of the TDC prototype. 12.6. Conclusions. The experimental results obtained with this prototype demonstrate that it is possible to build a low cost high-resolution integrated TDC using the proposed architecture. The converter has lower power dissipation than what was measured with the ADLL-based prototype previously described. Page 165 It was shown that the performance of the converter is mainly limited by the DLL characteristics, essentially its non-linearity. The DLL used in this circuit was built using the same circuit blocks as in the ADLL TDC, which use single ended signalling levels. A more linear DLL can, most likely, be obtained if a less noise sensitive, differential, topology is used. The two adjustable RC delay line schemes proved to work according to the design goals, both in terms of calibration and of temperature sensitivity. Furthermore, the model developed to study the delay line proved to be accurate, even with the limited technological information available. Page 166 References for Part III. [1] Gogaert, S. et al., A 10ps resolution 1.6ns tuning range CMOS delay line for clock deskewing in data recovery systems, Proceedings of the ESSIRC’95, pp. 54-56, Sep. 95. [2] Doernberg, J. et al., Full speed testing of A/D Converters, IEEE Journal of SolidState Circuits, Vol. 19, no. 6, pp. 820-827, Dec. 84. [3] Bossche, M. V. et al., Dynamic testing and diagnostics of A/D converters, IEEE Transactions on Circuits and Systems, Vol. 33, no. 8, pp. 775-785, Aug. 86. [4] Tsividis, Y., Mixed Analog-Digital VLSI devices and technology – an introduction, McGraw-Hill 1996, Chapter 5. [5] Elmore, W., The transient response of damped linear networks with particular regard to wideband amplifiers, Journal of Applied Physics, Vol. 19, pp. 55-63, Jan. 48. [6] Dvorak, V., On the transient analysis of distributed RC networks, International Journal of Electronics, Vol. 33, no. 4, pp. 385-391, 1972. [7] Antinone, R. et al., The modeling of resistive interconnectors for integrated circuits, IEEE Journal of Solid-State Circuits, Vol. 18, no. 2, pp. 200-203, Apr. 83. [8] Sakurai, T., Approximation of wiring delay in MOSFET LSI, IEEE Journal of Solid-State Circuits, Vol. 18, no. 4, pp. 418-426, Aug. 83. [9] Rubinstein, J. et al., Signal delay in RC tree networks, IEEE Transactions on Computer-Aided Design, Vol. 2, no. 3, Jul. 83. [10] Lee, M., A multilevel parasitic interconnect capacitance modeling and extraction for reliable VLSI on-chip clock delay evaluation, IEEE Journal of Solid-State Circuits, Vol. 33, no. 4, pp. 657-661, Apr. 98. [11] The HSPICE user’s manual, Meta-Software 1996. Page 167 Page 168 PART IV. CONCLUSION. Page 169 Page 170 Chapter 13. Summary of Results. In this work we studied the problem of building integrated Time-to-Digital Converters featuring very high resolutions. Our main goal was to demonstrate the ability to perform these time measurements in a single, low cost, monolithic circuit produced in standard commercial CMOS technologies. Stand-alone operation was envisaged, therefore the selected architectures are able to perform self-calibration. Also the possibility of including digital signal processing functionality in the same circuit was pursued. Several architectures where analysed, of which one was selected for a more detailed analyses that lead to the construction of a demonstrator IC. Furthermore, a novel highresolution time interpolation architecture was proposed and the analysis carried out confirmed a good time resolution and low power operation. 13.1. The ADLL architecture. The study of a time interpolation technique using an array of phase shifted DLL’s was pursued. In this study, we analysed: • The origins of non-linearity in a DLL based converter. We have showed the effects of delay cell mismatch and how it accumulates along the delay chain. We have also highlighted the diverse causes of phase errors intrinsic to a DLL and the effect of these errors along the delay chain. An additional source of nonlinearity was revealed, which has similar effects on the converter non-linearity as a phase error. The source of this delay error was shown to be the different propagation delays of the sampling signal towards the individual registers. • The origins of phase noise in a DLL. We have analysed the effects of phase noise due to the “bang-bang” operation of the closed control loop. The results of the analysis carried out for the single DLL case was extended to the case of an array of phase shifted DLL’s (the ADLL). • The non-linearity of an ADLL based converter. We have developed an analytical model of the ADLL that permits to establish the effects of independent delay error sources in the overall converter non-linearity. The presence of the phase shifting DLL is also accounted for in the model. We have highlighted the most important modes of delay error accumulation, in particular showing that there is Page 171 an intrinsic periodicity on the non-linearity curves (periodicity F and also F+1, where F is the interpolation factor) due to the interpolation scheme. • The optimal interpolation factor F. Based on the expected conversion integral non-linearity due to delay cell mismatch, we have established a relation between the mismatch level and the resolution of the converter. It shows that, depending on the actual mismatching characteristics of the delay cells, the maximum interpolation factor F that corresponds to a consequent increase in resolution is limited to F=4 or 5. Using the analysis tools developed, we where able to translate the performance goals into circuit requirements. We then proposed simple ways of constraining the individual blocks to the requirements by optimisation of the critical performance parameters. • Minimisation of the phase error. We have proposed solutions to reduce all the phase error sources, including an alternative topology for the distribution of the sampling signal to the individual registers. The need to use distributed parameter techniques when studying signal distribution in the time critical circuitry is highlighted. • Minimisation of the delay cell mismatch. A method for reducing the delay mismatch of a current-starved delay cell regardless of the operating conditions was proposed. • Noise sensitivity minimisation. The noise sensitivity of the scheme was analysed and minimised using simple circuit layout rules. A multi-channel high-resolution TDC based on these studies was built in a standard 0.7µm CMOS technology. It demonstrated the correctness of the conclusions of the analysis. In particular an RMS resolution of 34.5ps was obtained throughout the full 3.2µs dynamic range. This performance, which has been confirmed in several applications, is obtained in an IC that also includes processing and buffering logic. 13.2. The DLL & RC delay line architecture. We have proposed a new interpolation technique for Time-to-Digital Converters. The possibility of designing adjustable RC delay lines in a “digital” technology was demonstrated and we have also showed how a self-calibrating scheme can be implemented in the same circuit. In this study, we have analysed: • Adjustment methods for RC delay lines. We have proposed two discrete adjustable RC delay line schemes. • The characteristics of RC delay lines. We have proposed a methodology to partition such a delay line so that it complies with the timing and layout requirements of a particular design. We give guidelines to the design of circuits Page 172 Chapter 13: Summary of Results. that interface with the delay line without increasing its sensitivity to variations of the environment conditions. • Calibration procedure. A Code Density Test based calibration scheme was proposed. This simple scheme can be hardware implemented and integrated in the converter IC. It requires a pulse generator uncorrelated with the reference clock and a calibration logic block. • Calibration algorithms. We have proposed several calibration algorithms. Their advantages or disadvantages were discussed. Based on these studies, and on the DLL building blocks developed for the ADLL based TDC, we have built a TDC prototype. Two different channels, each implementing one of the adjustable RC delay lines proposed, were included. Dividing these delay lines in M=8 segments, an interpolation factor F=8 is obtained. The technology used is also a 0.7µm CMOS technology. Using the calibration algorithms that we have proposed, we where able to calibrate the two delay lines, obtaining an INLmax better than 0.21LSB in each of them. The RMS resolution of the converter was measured to be as low a 21ps. We also have shown that the performance of the converter is very insensitive to variations of the environment conditions. Furthermore, the use of passive RC delay lines to perform time interpolation results in a low power operation, as was demonstrated with the prototype. 13.3. TDC characterisation. We have developed a consistent methodology to characterise the timing performance of a T/D converter. With this methodology we were able to evaluate the static and the dynamic characteristics of the converter. • Define a consistent set of performance metrics. These metrics, adapted from the ADC world, are well matched to the TDC environment. • Build a comprehensive test set-up. We have developed an automated test set-up that is able to perform very linear time sweeps across an extended dynamic range. This set-up is critical for the evaluation of the dynamic characteristics of the converters that we have developed. Page 173 Page 174 Chapter 14. Future Developments. The major goal of the work described in this dissertation was to demonstrate the possibility of using standard digital CMOS technologies to build integrated, multichannel, time measurement systems with high resolution. Having established this possibility, by means of two different successful architectures, a wide range of fully integrated systems can be developed to match the specific requirements of the several interested users within the High-Energy Physics community. Alternatively a single “universal” system could be designed to fulfil all these separate requirements. During this work, although only cursory attention was given to the actual implementation of the system level functionality, its presence was always accounted for and the architectures proposed are adapted to operate in that environment. Two logical development paths may now be followed: • Profit from short gate delays available in the new, sub-micron, technologies to demonstrate the “ultimate” performance that can be extracted following the architectures here presented (or any other having the same capabilities). • Develop a general purpose T/D converter. This IC would cover the entire resolution spectrum envisaged for the near future, from the “low” 250ps range, to the “high” 25ps range. It should also allow for different buffering strategies and also for intelligent data filtering. Although the first path is scientifically stimulating and poses some interesting design challenges, it’s the second path that results in a better engineering compromise between single-minded performance and overall functional flexibility. It is also a more “multi-discipline project”, requiring the convergence of multiple design techniques (full custom / standard cell) and therefore including important challenges. Such a converter as been envisaged and preliminary studies carried out. The enabling architecture, the interpolator based on a DLL and on a RC delay line, was developed and proven during this work. Most of the system level functionality has been demonstrated elsewhere in the context of lower resolution converters. In a conventional, DLL based, converter, all the channels integrated in the same IC perform their time interpolation by sampling the status of a common DLL, as is schematically described in Figure 1. To obtain a higher resolution TDC, using the scheme Page 175 based on a DLL and an RC delay line that was proposed in this work, a number of equally spaced samples of the status of the DLL must be stored. The scheme is also pictured in Figure 1. clkref clkref PD PD hit RC delay line hit<0> hit<1> hit<2> hit<3> Figure 1: A four channel TDC using a DLL based scheme and a single channel TDC with four times smaller LSB, using the same building blocks and an RC delay line. A close look at this figure already gives a hint on how to obtain high resolution from what is intrinsically a lower resolution converter (the DLL). By the simple addition of an adjustable RC delay line (and the calibration hardware), it is possible to obtain a higher resolution converter channel using for the effect a small number of lower resolution conversion channels. By proper selection of the hit signal origin, a single IC can be used as a high channel density, low resolution, T/D converter or as a low channel density, highresolution, T/D converter, depending on the user needs (see Figure 2). clkref PD hit hit<1> hit<2> hit<3> RC delay line hit<0> Figure 2: The general purpose TDC architecture. Timing information can be carried in one, or in the two edges of the hit signal. It would therefore be convenient for the converter to be able to measure these two instants in the same channel. This feature will be implemented in this converter. Modern CMOS technologies, for example with a 0.25µm minimum feature size, result in very small gate delays. It is, therefore, possible to build a very compact time conversion block and integrate it with a large processing logic block. It is envisaged to include a more complete buffering hierarchy. Each channel will have a dedicated four measurement deep pipelined memory (to store two pairs of risingPage 176 Chapter 14: Future Developments. falling edge measurements). The second level of hierarchy will group 8 channels (2 in high-resolution mode) in a deeper FIFO memory. Each of these groups includes a separate pre-processing logic block that performs encoding, coarse time selection, etc. The groups are then multiplexed into a single data stream. An optional, trigger based, data reduction processor will also be included. This processor receives commands from a central processor used to identify time windows of interest. Measurements occurring outside these time windows are deemed uninteresting and, therefore, are filtered out of the data stream. The function of the local data reduction processor is to compare the time measurements acquired in each channel with the interesting time window, which is identified by a “trigger” time-tag. Measurements that are accepted by this criterion are stored in a common read-out FIFO memory. PLL mux. PD clkref coarse counter 1 low resolution channel hit<1> hit<31:0> hit<2> hit<3> mux. channel buffer (4 words) RC delay line hit<0> 4 low resolution channels (1 high resolution channel) channel buffer (4 words) channel buffer (4 words) 8 low resolution channels (2 high resolution channel) channel buffer (4 words) x2 channel arbitration 32 low resolution channels (8 high resolution channel) encoding & offset adjust group buffer (256 words) calibrate R-C delay JTAG trigger matching RC delay calibration (& hit oscillator) JTAG interface (testing / programming) x4 super-group buffer (256 words) trigger interface & control trigger interface read-out interface read-out interface Figure 3: Block diagram of the general purpose TDC. A simplified block diagram of the general purpose TDC is shown in Figure 3. A clock multiplying PLL is included to generate the required reference period for the highresolution option. The timing specification of this TDC is shown in the next table. Three resolution levels can be obtained with the specified 40MHz reference clock, 224.5ps, 56.4ps and 14.2ps. These values correspond to the standard deviation of the quantisation error (σq) of Page 177 an ideal converter. In reality other sources of time uncertainty will add up. They will affect more the higher resolution options. The experience gained during this work allows for a preliminary estimation of the RMS resolution to be (σTDC) ~226ps, ~61ps and ~25ps, respectively. ref. frequency 40 160 MHz ref. period 25 6.25 ns DLL LSB 781.3 195.3 ps 32 cells / DLL RC line LSB 48.8 ps using 4 channels dynamic range 102.4 102.4 µs Table 1: Timing specification of the general purpose TDC. Page 178 PART V. APPENDIXES. Page 179 Page 180 Appendix A. TDC Characterisation Test Bench. The evaluation of the high-resolution TDC prototypes produced during this work required the development of a specific test bench. This test bench allows for the measurement of several important timing characteristics of the converter: • Conversion linearity (differential and integral). • Conversion error, from a linear time sweep. • Crosstalk between channels. • Double hit resolution. In particular, the linear delay generator used for the characterisation of the conversion error required the development of an adequate instrument. Given the fine time characteristics that this test bench is intended to measure, especial attention was given to the integrity of the time critical signals. High performance PECL logic is used wherever the reference or the hit signals are handled. Controlled impedance (50Ω) micro-strips and cables are used to transport, or delay, these signals. Conversion linearity. The static characteristics of the converter (INL, DNL) are measured using a standard Code Density Test (CDT) that has been extensively described in the literature (in the context of ADC testing) [1],[2],[3]. Other methodologies have been used to characterise converters, for example using Walsh Functions [4], but their complexity does not seem required for the test of TDC’s, which typically require that only a limited number of bins be characterised in great detail. The resulting characterisation includes some uncertainty, which can be limited as discussed in Appendix D. In a CDT, the device under test (DUT) collects a large number of hits generated with a random time interval. Due to the randomness of the hit arrival time, they are uniformly distributed along the dynamic range of the DUT. Therefore, if the conversion result of each hit pulse is read-out and accumulated in an histogram whose bins correspond to an LSB of the converter, the number of hits collected in each of the histogram bins is proportional to the size of the actual converter bin. The DNL graph is obtained directly from the test. The INL graph is derived from the cumulative histogram Page 181 of the bin sizes, which is obtained by adding up consecutive bin sizes. Unfortunately, also the uncertainty of the size of each bin is accumulated in this operation. Therefore, for the same number of collected hits, the accuracy of the differential characterisation is greater than the accuracy of the integral characterisation. The CDT test requires a random pulse generator or, instead, a pulse generator which frequency is selectable (the choice of the sampling frequency is done in accordance with Appendix E) and a computer to collect and histogram the measurements obtained. In our set-up we used a Hewlett-Packard 8012B pulse generator. Data is collected in a computer that also controls the test bench. Since this is a statistical test, no information is obtained on the dynamic characteristics of the converter. Chiefly, random errors due to reference clock jitter or to the dynamics of the DLL and random noise due to other activity within the circuit are averaged out. In order to observe these effects, a linear time sweep is performed across a significant segment of the dynamic range. Conversion error. The linear time sweep is performed with a very short delay step (more than an order of magnitude shorter than the LSB of the converter under consideration), over a range of a few reference clock cycles. This range is wide enough to characterise the fine time interpolation scheme and also to verify that the dynamic range extension scheme does not interfere with the interpolation performance. Standard (active) delay generators do not have the linearity required to perform a linear time sweep suitable for this application. Therefore a computer controlled passive delay generator, using a step-motor driven coaxial phase shifter (also known as “trombone”), was used. Although no direct measurement of the “trombone” linearity was performed, the measurements obtained and the mechanics of the instrument give a high degree of confidence in its linearity. In order to expand the small dynamic range of the “trombone”, a selectable delay box was used. When the “trombone” reaches the end of its dynamic range, it is rewinded to the initial position and a corresponding delay is incremented in a delay box, by proper selection of the internal cable length. The accuracy of this alignment procedure is a concern. Even a small difference between the delay of the apparatus before and after the adjustment step will accumulate into a sizeable error, after a few adjustment steps. To guarantee an adequate alignment of the delay generator its delay is measured prior to adjustment, using an adjustment TDC, and again after adjustment. The two measures are compared and a fine adjustment is performed (changing the “trombone” delay), if required. The adjustment TDC does not have to be linear, since the two measurements it has to perform are identical. However, it must have a resolution better than the “trombone” delay step. Averaging many hits is an easy way of achieving high resolution in commercial delay measurement instruments. Page 182 Appendix A: TDC Characterisation Test Bench. In Figure 1, a block diagram of the computer controlled linear delay generator is shown, illustrating its connection to the device under test (DUT). Since the DUT is a time stamp TDC, the hit signal was synchronised with the reference clock (clkref) before it progresses through the trombone and the selectable cable delay box. The adjustment TDC was mounted in parallel with the DUT, in such a way that in normal operation it does not influence the test. In our test bench, we used the Sage model 6709 coaxial phase shifter driven by a computer controlled stepper motor, to obtain a minimum delay step of ~0.6ps in a dynamic range of 2ns. The CAEN programmable delay box N-146A, which has a minimum delay step of 0.5ns and a dynamic range of ~80ns, was used to extend the dynamic range of the apparatus. The adjustment TDC used was the Stanford Research SR620 universal time interval counter, which quoted resolution is ~2ps if 1000 hits are averaged. clkref trombone fine adjustment hit signal adjustment TDC DUT selectable cable delay coarse adjustment adjustment control (from computer) Figure 1: The linear passive delay generator block diagram (computer controlled). This apparatus is rather cumbersome and requires an external adjustment TDC. Therefore a simpler, but more reliable method was developed to perform the delay adjustment. If the two extremes of a delay line are connected to each other by means of an inverting amplifier, the frequency of oscillation of the oscillator thus generated is given by the following expression: f = 1 , 2 ⋅ ( Dline + Adelay ) where Dline is the delay of the line and Adelay is the propagation delay of the amplifier that closes the loop. Therefore it is possible to derive the delay of the line from the measurement of the oscillation frequency (given the delay of the amplifier). Page 183 As explained before, the absolute value of Dline is not necessary, since it is used only for the comparison between the delay of the line before and after alignment. Therefore, the only important property of the Adelay is its invariance and not its absolute value. A fast PECL inverter guarantees this invariance (within acceptable limits). The block diagram of this scheme is shown in Figure 2. When the delay of the delay generator is to be measured, a set of relays is switched in such a way that the oscillator loop is closed and the DUT is disconnected from the generator. The oscillation frequency is measured before the adjustment step and again after it. If these frequencies are different, the ‘trombone’ delay is again adjusted until the frequency agrees with the one measured before the adjustment step. A simple procedure to measure frequency is to count the number of oscillation cycles completed in a given time interval. The bigger the time interval, the better is the accuracy of the measurement. The oscillation period of a stable oscillator (or a multiple of it) can be used to set the counting time interval. This simple delay generation scheme was implemented in a 9U VME board that also includes all the alignment logic required. It is not practical to extend this test to the full dynamic range of the converter, due to its duration and to the possible accumulation of errors generated on the successive alignment steps. Fortunately the verification of the correctness of the dynamic range extension over its full dynamic range does not require the generation of small delay steps. For this application, it is more convenient to perform a coarse time sweep with delay steps of ~1ns. Since the requirements in terms of jitter and linearity of the hit signal are relaxed, an active instrument can be used as a delay generator, resulting in a faster characterisation. In our test bench, the Stanford Research model DG535 digital delay generator was used. VME interface oscillator cycle counter clkref adjustment control trombone fine adjustment hit signal DUT selectable cable delay coarse adjustment Figure 2: The linear passive delay generator block diagram (automated). Page 184 Appendix A: TDC Characterisation Test Bench. Other characteristics of the converter, such as crosstalk, double hit resolution and sensitivity to the activity on the digital circuitry can be evaluated with this test bench (they are applicable only for the converter based on an array of DLL’s). Crosstalk. The characterisation of the crosstalk between channels was performed in accordance with the following procedure: A double delay sweep is generated using the Stanford Research model DG535 digital delay generator. One channel (the channel under test - CUT) is stimulated independently from all the other channels in the circuit (the offending channels - OC). For each delay step in the CUT, a delay sweep spanning three reference clock cycles is simultaneously performed on all the OC. In this way, the worst correlation between the simultaneous hits in the OC, a hit in the CUT and the phase of reference clock can be found. The comparison between the peak error obtained using this procedure and the error obtained for the same delay in the CUT, but with the OC inactive, gives a measure of the worst case, maximum error due to crosstalk. Double hit resolution. Double hit resolution is measured using the Philips PM5786 pulse generator to generate bursts of pulses. This pulse generator is able to generate pulses with a minimum separation of ~8.5ns, corresponding to the maximum double hit resolution that can be measured. The bursts are generated asynchronously to the reference clock so that any correlation between the reference the clock and the activity in the channel buffer can be identified. Page 185 Page 186 Appendix B. Analysis of the DLL Closed Loop Behaviour. The control operation of a DLL is based on the integration of the phase error resulting from the comparison of the phase of the periodic reference signal and of the VCDL output. The negative feedback control loop adjusts the delay of the VCDL in order to minimise the phase error. The DLL configuration is a first order loop, therefore, if the sampling operation inherent to the phase detector is ignored, a simple continuous time approximation can be used to analyse its frequency response. This approximation can be used for loop bandwidths a decade or more smaller than the operating frequency. Following the naming conventions established in [5], we define output delay Do(s) as the delay established by the VCDL and input delay Di(s) as the delay to which the phase detector compares the output delay. These two quantities are related by the following expression: Do ( s ) = ( Di ( s ) − Do ( s ) ) ⋅ I CP ⋅ K VCDL , s ⋅ CF ⋅T where ICP is the charge-pump current, KVCDL is the gain of the VCDL, CF is the loop filter capacitance and T is the period of the reference signal. The average charge-pump current is given by the fraction of the reference period in which the charge-pump is activated (Di(s)-Do(s)/T) times its peak current (ICP)1. It is, therefore, proportional to the phase (delay) error. The closed loop response is then: Do ( s ) = Di ( s ) 1 s 1+ wn , where wn is the loop bandwidth. wn = I CP ⋅ K VCDL . CF ⋅T 1 If the loop is built in a “bang-bang” configuration, using a two-state phase detector, the average chargepump current can be evaluated over a large number of reference periods. Page 187 Since a first order loop is inherently stable, the only stability criteria of interest is to avoid the influence of the higher order poles introduced by the delay around the sampled feedback loop. In our application, the reference signal has a known and stable frequency, therefore it doesn’t require a high tracking bandwidth. It is, therefore, interesting to reduce the bandwidth of the loop by increasing the filter capacitor and decreasing the chargepump current and the gain of the VCDL. In this way the phase noise inherent to the “bang-bang” loop operation can be minimised. The nature of the loop, where a reference signal is propagating along a VCDL, means that variations of the input signal’s phase will also propagate through the VCDL and thus reduce the measurement accuracy. Therefore, although internal phase noise can be minimised and the delay of the VCDL stabilised at one reference period T, the phase noise carried by the reference signal must be eliminated at its origin, if the reduction of the measurement accuracy is to be minimised. Page 188 Appendix C. Analysis of the Effects of Cell Delay Mismatch on the Integral Non-Linearity of a DLL. A DLL is a closed feedback control loop with a somewhat complex dynamic behaviour. The object of this study is the static behaviour of the DLL that results from averaging of the dynamics of the control loop over a long period. Without loss of generality, we will assume an ideal control loop that is able to keep the delay along the DLL stable and equal to one clock period T. The following analysis follows broadly the method developed in [6] for resistor strings in flash ADC’s. For the purpose of this analysis, we will focus only on random mismatch effects. The delay of each cell in the DLL can be seen as an independent random variable with a normal probability distribution (PDF) G of mean µ m = T N and variance σ 2m (N is the number of cells that make up the DLL). The mean corresponds to the expected cell delay, and the variance gives a measure of the spread of the actual delays around the mean. In these conditions one can see the DLL as a delay chain whose delay at the origin is D=0 and at the other extreme is D=T. tap 0 tap 1 tap j 0 T/N j·T/N tap N-1 tap N (N-1)·T/N T 0≤j<N Figure 1: Voltage controlled delay line with fixed length. The delay Di of each cell is defined as random variable with a normal PDF G T N , σ 2m . The delay from the origin to the output of cell j can be expressed as a ( ) fraction of the total delay of the delay chain: u j (X ,Y ) = X , X +Y Page 189 j where X = ∑ Di and Y = i =1 N ∑ Di . i = j +1 Since Di have normal PDF’s, X and Y are also random variables with normal PDF: ( Y: G (µ ) , σ ) , with µ X: G µ1 , σ12 , with µ1 = j ⋅ µ m and σ1 = 2 2 2 j ⋅ σm = ( N − j ) ⋅ µ m and σ 2 = N − j ⋅ σ m , 2 using the variable transformations: u= X and v = X , X +Y we have g (u, v ) = f ( X (u, v ), Y (u, v )) ⋅ J , where |J|, the Jacobian of the function, is defined as: ∂X (u , v ) ∂u J = ∂X (u , v ) ∂v From Y = v ⋅ we get J = 0 ⋅ ∂Y (u, v ) ∂X (u, v ) ∂Y (u , v ) ∂X (u, v ) ∂Y (u, v ) ∂u = ⋅ − ⋅ . ∂Y (u, v ) ∂u ∂v ∂v ∂u ∂v 1− u and X = v u ( X + Y )2 1− u v v v − 1⋅ 2 = − 2 = 2 = u X u u u and thus 2 ( X +Y) g (u , v ) = f ( X , Y ) ⋅ . X Considering X and Y independent variables, their joint PDF is: f ( X , Y ) = f ( X ) ⋅ f (Y ) X and Y have normal PDF’s, f (X ) = ( X − µ 1 )2 , ⋅ exp − 2 2 ⋅ σ 2 ⋅ π ⋅ σ1 1 f (Y ) = ( X − µ 2 )2 , ⋅ exp − 2 2 ⋅ σ 2 2 ⋅ π ⋅ σ2 thus Page 190 1 1 Appendix C: Analysis of the Effects of Cell Delay Mismatch on the Integral Non-Linearity of a DLL. 2 σ 22 ⋅ (v − µ1 )2 + σ12 ⋅ v ⋅ (1 − u ) − µ 2 v 1 u . g (u , v ) = 2 ⋅ ⋅ exp − 2 2 u 2 ⋅ π ⋅ σ1 ⋅ σ 2 2 ⋅ σ1 ⋅ σ 2 The PDF for u is, by definition ∞ g (u ) = ∫ g (u, v ).dv , −∞ thus g (u ) = 1 − exp ⋅ 2 ⋅ σ2 ⋅ u 2 2 ⋅ π ⋅ σ1 ⋅ σ 2 ⋅ u 2 2 1 B 2 ⋅ ⋅ C − A 2 A B ⋅ ∫ v ⋅ exp − ⋅ v − ⋅ dv 2 ⋅ σ2 ⋅ u 2 A 2 −∞ ∞ with A = r ⋅ u 2 + (1 − u ) 2 , B = ( r ⋅ µ1 − µ 2 ) ⋅ u 2 + µ 2 ⋅ u , C = (r ⋅ µ12 + µ 2 ) ⋅ u 2 and r = σ 22 σ12 . If the substitution u = u 0 + u1 ( u 0 = j N ) is made, the equation is obtained: N 1 1 1 N ⋅ exp ⋅ ⋅ ⋅ 3 2 u 0 ⋅ (1 − u 0 ) 2 ⋅ π ⋅ Cm 2 ⋅ C m 1 + (1 − u 0 ) ⋅ u 0 2 u12 u12 1 + u 0 ⋅ (1 − u 0 ) g u0 (u1 ) = where C m = σm . µm Since u12 (u 0 ⋅ (1 − u 0 )) « 1, the following equations are obtained: σ u0 = C m ⋅ g u0 (u1 ) = g u0 (u ) = u 0 ⋅ (1 − u 0 ) N 1 2 ⋅ π ⋅ σ u0 1 2 ⋅ π ⋅ σ u0 u2 ⋅ exp 1 2 2 ⋅ σu 0 ⇔ (u − u )2 0 ⋅ exp 2 ⋅ σ u2 0 Page 191 µ u0 Thus, u (the delay division ratio) has a normal probability density with average = u 0 and a standard deviation σ u0 . The standard deviation of the integral error is obtained if σ u0 is normalised to the (average) cell delay: σ DLL = σ u0 ⋅ ( N ⋅ µ m ) σ DLL = C m ⋅ µm = N ⋅ σ u0 ⇔ j ⋅ (N − j ) N The maximum standard deviation of the integral error is found in the middle of the delay chain, with a value σDLL(max) of: σ DLL (max) = C m ⋅ N 2 which compares favourably with the maximum standard deviation of the integral error in an open (not enclosed in a control loop) delay chain σDC(max), found in the end of the delay chain: σ DC (max) = C m ⋅ N Therefore, the inclusion of a delay line inside a closed control loop such as the DLL improves the standard deviation of the integral linearity error by a factor of two. Page 192 Appendix D. Number of Random Samples Required for TDC Characterisation. A hit arriving at a time interpolator at a random time has equal probability p of being collected by each of the bins into which the reference period is divided (assuming identical bins). This probability is a function of the total number of subdivisions (Nbins), given by p = 1 N bins . To estimate the size of a given bin, an experiment can be devised where random hits are generated (trials). The possible outcomes of a trial are success, if a hit is collected in the bin, or failure, if not. After a large number of trials have been executed, the ratio of the number of successes over the number of trials is a direct measure of the bin size. The accuracy of the estimation is, of course, related to the number of trials. It is therefore, important to know what is the minimum number of trials that should be executed to obtain the required accuracy. The experiment just described has the following properties: • It consists of a number (n) of repeated trials. • Each trial has an outcome that may be classified as a success or as a failure. • The probability of success remains (p) constant from trial to trial. • The repeated trials are independent. It therefore classifies as a set of n Bernoulli Trials and, therefore, the number of successes has a Binomial probability distribution with mean µ = n ⋅ p and variance σ 2 = n ⋅ p ⋅ (1 − p ) . It is known that the distribution of a Binomial random variable can be approximated by the normal distribution, having the same mean and variance, if the number of trials is large. In a normal distribution, the probability that a random variable X will assume a value that deviates from its average µ less than zα/2·σ is 1-α: P(µ − z α / 2 ⋅ σ ≤ X ≤ µ + z α / 2 ⋅ σ ) = 1 − α . The variable zα/2 is the standard normal distribution z-value that is the limit of an area under the (standard) normal curve of α/2 (see Figure 1 for clarification of these definitions). It can be obtained from any table of areas under the normal distribution curve (for example [7]). Page 193 1-α α/2 -zα/2 0 n(z;µ=0,σ=1) α/2 zα/2 z Figure 1: P(-zα/2 < Z < zα/2) = 1-α. The result of the experiment, x successes representing the measured size of the bin, is a sample of a normal random variable X with mean µ and variance σ2. From the previous probability limit it is, therefore, possible to conclude that the bin size lies within its true value µ with a tolerance of zα/2 standard deviations (σ), with a 100.(1-α) percent confidence. If the accepted tolerance to which the bin size is to be determined is set to β.µ and µ and σ are substituted for their actual values, we get the following expression for the number of trials needed n: 2 z 1 z α / 2 ⋅ σ ≤ β ⋅ µ ⇔ z α / 2 ⋅ n ⋅ p ⋅ (1 − p ) ≤ β ⋅ n ⋅ p ⇔ n ≥ α / 2 ⋅ − 1 . β p The probability p is defined as 1/Nbins. Therefore the number of hits required to obtain the bin size with a tolerance 100·β% and a confidence 100·(1-α)% in the measurement is 2 z n ≥ α / 2 ⋅ ( N bins − 1) . β With the same set of hits, a similar estimation of the size of each bin can be obtained. Therefore the DNL characteristics of the line are obtained. In principle, the INL characteristics of the line are directly obtained by cumulating the DNL histogram. It should be noticed that while performing this operation, the uncertainty of the results (described by the variance) must also be added. For an open ended line, the worst variance is measured in the last bin to be: σ c = N bins ⋅ σ . The number of samples needed to obtain the INL characteristics with the same tolerance and confidence level must then be increased to 2 z nc = N bins ⋅ n = α / 2 ⋅ ( N bins − 1) ⋅ N bins . β Conversely, for the same number of samples, the tolerance of the INL estimation is β c = N bins ⋅ β . Page 194 Appendix D: Number of Random Samples Required for TDC Characterisation. If an enclosed line, for example within the DLL closed loop, is considered, then the worst variance is measured in the middle bin to be: σc = N bins 2 ⋅σ. The number of samples needed to obtain the INL characteristics with the same tolerance and confidence level must then be increased to 2 N N z nc = bins ⋅ n = α / 2 ⋅ ( N bins − 1) ⋅ bins . 4 4 β Conversely, for the same number of samples, the tolerance of the INL estimation is βc = N bins 2 ⋅β . Page 195 Page 196 Appendix E. TDC Characterisation Hit Frequency. Interpolator characterisation requires that the reference clock period be sampled at random times. However, sampling at random, by its strict definition, would be impossible. What must be done is to guarantee that the reference clock frequency is not sampled repeatedly at the same phase (beating effect). By choosing a sample frequency to be nonharmonically related to the clock frequency, we are assured of this [8]. Therefore, when a sufficient number of equidistant samples has been acquired, a uniform distribution of the samples along the clock period is obtained. The sample frequency must, of course, be stable in order to guarantee that it doesn’t wander into a beating frequency during the characterisation procedure. Fortunately, very accurate and stable oscillators are common. They can be used directly or as a reference for a clock multiplying PLL, enabling the generation of basically any frequency ratios. It is, for example, possible to generate the sample frequency from the clock frequency, thus guaranteeing correct characterisation regardless of the actual clock present. Any jitter present in the sampling frequency will only contribute to further randomise the sampling time, which benefices the characterisation. In this context, the requirements for a PLL can be quite relaxed. The relation between the sample period Tsample and the clock period Tclk may be generally described by the following equation, where A and B are integers that have no common divider1: Tsample = A ⋅ Tclk . B This relation merits a closer look to identify aids to the choice of the sample frequency. If we expand A, by letting 1 A = C + ⋅ B ± D ⋅ S , M then the previous equation can be expanded to: 1 It is commonly found in the literature that the integers A and B should be prime numbers [8]. However this is only a sufficient condition to generate a non-beating frequency, corresponding to a sub-set of the possible integer ratios that satisfy the absence of beating effect requirement. Page 197 1 D Tsample = C + ± ⋅ S ⋅ Tclk M B The constants on this equation are all related to identifiable characteristics of the sampling frequency: B is the number of intervals into which the sampling divides clock period. It should be large enough so that the sampling coverage is compatible with the expected characterisation accuracy. S reflects the possible existence of sub-sampling, where only every nth. sample out of the ones generated is collected. It is now clear that the sub-sampling rate cannot be chosen randomly, because the definition of the constant A restricts it. If S and B have common dividers, then the effective number of intervals B’ is reduced to B divided by them. C is the number of integer Tclk periods contained in Tsample (or one more if 1 M ± D B < 0 ). This constant must also abide to the rules of the definition of A. M gives a measure of the spread between consecutive samples (normalised to Tclk). The actual sample spread is (1 M ± D B ) ⋅ S ⋅ Tclk . The constant M should be the same as the number of sub-divisions (bins) of the interpolator being characterised. In this way, a more uniform sample distribution is obtained along the time that the test is being performed. Since it is included in the definition of constant A, it must also obey to the required restrictions. D is a small perturbation that actually defines the constant A. There is no real restriction to this constant, except for the rules defining A, but it should be made smaller than B M , to keep these definitions coherent. Typically, when determining the sampling frequency, B, S, C and M are defined by system requirements, and then D is determined so that the resulting A B don’t have any common dividers. The existence of common dividers between these two constants results in a decreased effective number of intervals B’. The clock multiplying system required to perform these operations is graphically described in Figure 1. The critical operation is the clock multiplication (by B) on the return path of the PLL control loop. The delay introduced by this operation influences the stability of the closed control loop and, therefore, should be carefully analysed and minimised. It is interesting to note that, in a noiseless system, after collecting B samples generated this way, the interpolator is completely described, with a measurement tolerance of ±1/(2.B).100% of the clock period. In the presence of inevitable noise, it is safer to assume a random uniform sample distribution and collect the conservative number of samples determined in Appendix D. Page 198 Appendix E: TDC Characterisation Hit Frequency. PLL Fclk 1 B C⋅ B+ ± D ⋅ S M PD LPF VCO Fsample B Figure 1: The clock multiplying PLL. Page 199 Page 200 Appendix F. Analysis of the Limits to the TDC Resolution (Alternative Tap Definition). This Appendix completes the Chapter 6. It contemplates the case where the tap 0 of each of the Timing DLLs is located in the end of the delay chain (Figure 1). Since the delay chain spans exactly one clock period, this alternative definition doesn’t change the performance of the converter. However the shape of the non-linearity histograms is altered, so we present here the corresponding non-linearity histogram expression for a single DLL (F=1) and for an ADLL. Clock d0 d1 d2 dN-2 τ2 dN-1 τ1 Tap 1 Tap 2 D Hit Tap N-2 D τhit Tap N-1 D τhit τhit F(D1,D2) D1 Phase Detector Tap 0 D D2 D τhit Figure 1: Detail of a delay locked loop depicting the important delays within the loop (notice the alternative location of tap 0). The alternative timing and phase shifting variables m, n and n’ as a function of the bin position i ( 0 ≤ i < F ⋅ N ) are defined as: m = Mod (i + 1, F ) , i + 1 n = Mod m − Floor , N , F i + 1 n′ = Mod Floor − m, N . F The following expressions reflect, respectively, the standard deviation of the integral non-linearity error due to cell delay mismatch and loop jitter: Page 201 2 n F + 1 m σ array (i ) = F ⋅ σ cell ⋅ ⋅ ⋅ (M − m ) + ⋅ ( N − n ) . N F M 2 2 m n σ array (i ) = σ j ⋅ F ⋅ + . M N The integral non-linearity due to combined effect of all static errors is given by the following expressions, respectively for the case where the hit sampling signal is distributed via a linear network or via the T-shaped network. m n m F +1 n INLarray (i ) = Din ⋅ F ⋅ ⋅ + + + − DPD ⋅ F ⋅ − F N M N M . m F + 1 n′ − Dout ⋅ F ⋅ ⋅ + − Dhit ⋅ F ⋅ n F N M m F +1 n m n INLarray (i ) = Din ⋅ F ⋅ ⋅ + − DPD ⋅ F ⋅ − + + N F M M N , N N m F + 1 n′ − Dout ⋅ F ⋅ ⋅ + − Dhit ⋅ F ⋅ − n − F N 2 M 2 where, as before, the following variable transformations are used: Din = δ in , D PD = Page 202 C + τ diff , Dout = δ out and Dhit = − τ hit . K Appendix G. DNL-aware Algorithms for the RC Delay Line Calibration. The calibration algorithms so far exposed used integral non-linearity as the only criteria for judging the correctness of the calibration results. If differential non-linearity is also to be used, more complex calibration algorithms are needed. Since these algorithms try to optimise two variables simultaneously, their convergence may be hazardous when the two goals require contradictory directions. The logic controlling the execution of the algorithm must be able to decide which goal is more important and pursue the calibration taking in account that decision. In the following lines the algorithms previously described are modified so that they can also set limits to differential non-linearity. Tap selection adjustment scheme. Iterative algorithm. The analysis of the linearity of a bin is based on the bin histogram h[bin]. A cumulative histogram ch[bin] is built from it and both are compared to the ideal histograms (developed from the knowledge of the ideal converter’s bin size LSB). The following operations check if the line conforms to the differential and integral linearity limits and takes corrective measures for the offending bins. for i= 0 to M-1 tap[i]= segment_from_simulation_of_typical_conditions; for bin= 0 to M-2 repeat until no_changes Characterisation step; if ( ch[bin]< LSB.( bin+1-limINL) & h[bin]< LSB·( 1+limDNL) | | h[bin]< LSB·( 1-limDNL) ) for i= 0 to M-bin-2 Page 203 tap[bin+i+1]= tap[bin+i+1]+1; else if ( ch[bin]> LSB.( bin+1+limINL) & h[bin]> LSB·( 1-limDNL) | | h[bin]> LSB·( 1+limDNL) ) for i= 0 to M-bin-2 tap[bin+i+1]= tap[bin+i+1]-1; else no_changes In Figure 1 the algorithm is clarified. The acceptable limits of the integral and differential non-linearity are, respectively, limINL and limDNL. These linearity limits must be chosen in accordance to the size of the calibration steps. The access point selection for each tap is captured in tap[i]. tap[all]=typical conditions for bin=0..M-2 CDT histogram[bin] cumulative histogram[bin] repeat until changes=0 Y (bin+1-limINL).LSB (1+limDNL).LSB (1-limDNL).LSB (bin+1+limINL).LSB < N Y < N Y < for i=0..M-bin-2 tap[bin+i+1]= tap[bin+i+1]+1 changes=1 N Y < N for i=0..M-bin-2 tap[bin+i+1]= tap[bin+i+1]-1 Figure 1: Calibration procedure for the tap selection adjustment scheme. The accepted limits to integral and differential non-linearity do not have to be the same. Setting different limINL and limDNL, it is a simple way to force the algorithm to give priority to one of the goals pursued. Page 204 Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration. Single step algorithm. This algorithm finds the tap access points that result in the nearest approximation to the ideal cumulative bin size curve. It also checks that the specified limit to the differential non-linearity, limDNL, is not surpassed. tap[0]=0 ; for i=1 to M-1 for segment=0 to 31 if (ch[segment]< LSB·i & ch[segment+1]> LSB·i) if (LSB·i-ch[segment]< ch[segment+1]-LSB·i & & 1-limDNL< ch[segment]-ch[tap[i-1]]< 1+limDNL) tap[i]=segment ; else tap[i]=segment+1 ; Lumped capacitor adjustment scheme. Coarse tuning procedure. In this procedure the capacity of all the banks is simultaneously incremented by one unit capacitor, resulting in a uniform increase of the delay of all taps. The procedure is repeated until the cumulative bin size is smaller than the ideal delay by less than a determined limit limcoarse. In the following lines the procedure is schematically described: for bank= 1 to M cap[bank]= 0; repeat until ( ch[M-2]= LSB·( M-1-limcoarse ) ) Characterisation step; for bank= 1 to M cap[bank]= cap[bank]+1; The calibration parameters for each capacitor bank are described by cap[bank] and ch[M-1] is the cumulative bin size histogram. A block diagram of the procedure is shown in Figure 2, where the Characterisation step is represented by the Code Density Test it performs. Page 205 initial calibration repeat until changes=0 CDT cumulative histogram[M-2] for bank=1..M Y (M-1-limcoarse).LSB < N cap[bank]= cap[bank]+1 changes= 1 Figure 2: The coarse calibration procedure. Fine tuning procedure. The fine tuning procedure builds on the results obtained with the coarse procedure. Each bin is sequentially evaluated to determine if it adheres to the linearity limits. If that is not the case, the capacity of the respective capacitor bank is increased by one unit. This unit increase is repeated until a satisfactory result is obtained. The fine calibration algorithm is schematically presented in the next few lines. The bin size histogram is h[bin] and limDNL and limINL are the differential and integral linearity limits. for bin= 0 to M-2 bank= bin+1; repeat until ( no_changes | bank> M ) Characterisation step; if( ch[bin] < LSB·( bin+1-limINL ) & h[bin]< LSB·( 1+limDNL ) | | h[bin]< LSB·( 1-limDNL ) ) cap[bank]= cap[bank]+1; bank= bank+1; else no_changes The algorithm approaches the final calibration solution by small increases in the bin size, therefore only the inferior limits to the linearity need to be checked. In this version of the algorithm, a second loop (shown bellow) can be used to perform a final adjustment to the calibration settings. This loop may be required in case the pursuit of one linearity parameter goal forces the RC delay line to surpass the superior limit of the other linearity Page 206 Appendix G: DNL-aware Algorithms for the RC Delay Line Calibration. parameter. Since the bin size increase/decrease per fine characterisation step is very small, this situation only occurs if the linearity limits are too narrow. for bin= M-2 to 0 bank= bin+1; repeat until ( no_changes | bank< 1 ) Characterisation step; if( ch[bin] > LSB·( bin+1-limINL ) & h[bin]> LSB·( 1-limDNL ) | | h[bin]> LSB·( 1+limDNL ) ) cap[bank]= cap[bank]-1; bank= bank-1; else no_changes In Figure 3 and Figure 4, the diagrams of the two fine calibration algorithm loops are shown. from coarse calibration for bin=0..M-2 bank= bin+1 repeat until changes=0 | bank>M CDT histogram[bin] cumulative histogram[bin] Y (bin+1-limINL).LSB (1+limDNL).LSB . (1-limDNL) LSB < N Y < N Y cap[bank]= cap[bank]+1 bank= bank+1 < N changes= 1 Figure 3: The fine calibration procedure (first loop). Page 207 from fine calibration (1st. Loop) for bin=M-2..0 bank= bin+1 repeat until changes=0 | bank<1 CDT histogram[bin] cumulative histogram[bin] Y (bin+1+limINL).LSB (1-limDNL).LSB (1+limDNL).LSB > N Y > N Y cap[bank]= cap[bank]-1 bank= bank-1 > N Figure 4: The fine calibration procedure (second loop). Page 208 changes= 1 References for the Appendixes. [1] [2] [3] [4] [5] [6] [7] [8] Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84. Ginetti, B. et al., Reliability of code density test for high-resolution ADCs, Electronics Letters, Vol. 27, No. 24, pp. 2231-2233, Nov. 91. Bossche, M. V., et al., Dynamic testing and diagnostics of A/D converters, IEEE Transactions on Circuits and Systems, Vol. 33, No. 8, pp. 775-785, Aug. 86. Brandolini, A. et al., Testing Methodologies for analogue-to-digital converters, IEEE Transactions on Instrumentation and Measurement, Vol. 41, No. 5, pp. 595603, Oct. 92. Maneatis, J. G., Low-jitter process-independent DLL and PLL based on self-biased techniques, IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1723-1732, Nov. 96. Kuboki, S. et al., Nonlinearity analysis of resistor string A/D converters, IEEE Transactions on Circuits and Systems, Vol. 29, No. 6, pp. 383-390, Jun. 82. Walpole, R. E. et al., Probability and statistics for engineers and scientists - fifth edition, MacMillan Publishing Company, 93. Doernberg, J. et al., Full-speed testing of A/D converters, IEEE Journal of SolidState Circuits, Vol. 19, No. 6, pp. 820-827, Dec. 84. Page 209 Page 210