

Nº d'ordre: xxxx

### École Doctorale de Physique et Chimie-Physique de l'Université de Strasbourg

**UDS** 

# **THÈSE**

### présentée pour obtenir le grade de

### Docteur de l'Université de Strasbourg

Discipline: Électronique, Électrotechnique et Automatique Spécialité : Instrumentation et Microélectronique

par

## Liang ZHANG

# Development of a CMOS Pixel Sensor for the Outer Layers of the ILC Vertex Detector

soutenue publiquement le 30 septembre 2013 devant le jury:

Directeur de thèse: YANN HU Professeur, UDS, Strasbourg

Rapporteur externe: MICHEL PAINDAVOINE Professeur, UDB, Bourgogne

Rapporteur externe: JEAN-FRANÇOIS GENAT Ingénieur de recherche, HDR, LPNHE, Paris

Examinateur: WILFRIED UHRING Professeur, UDS, Strasbourg

Membre invité: MARC WINTER Directeur de recherche, IPHC, Strasbourg

### Acknowledgements

This dissertation was made possible by the help of many individuals during the last four years in France. Foremost, I would like to thank China Scholarship Council (CSC) for the financial support during my Ph.D. study and research in University of Strasbourg, France and Institut Pluridisciplinaire Hubert Curien (IPHC), CNRS.

I would like to express my sincere gratitude to my advisor Prof. Yann Hu for giving the opportunity to work in the institute and for his motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis.

Beside my advisor, my sincere thanks also go to Dr. Christine Hu-Guo, the director of microelectronic group, for leading me working on the exciting project and for giving me insightful comments. I would like to thank Dr. Frederic Morel for his fruitful discussions and suggestions during this work.

I would like to thank Prof. Jerome Baudot for his insightful comments on the physical part of this work. I would also like to thank Gilles Claus, Kimmo Jaaskelainen, Mathieu Goffe and Mathieu Specht for their help with the testbench. And I would like to thank all of the members of the PIC-SEL group: Abdelkader Himmi, Andrei Dorokhov, Auguste Besson, Christian Illinger, Claude Colledani, Gregory Bertolone, Guy Doziere, Hung Pham, Isabelle Vallin, Marc Winter, Rachid Sefri, Sylviane Molinet, Tianyang Wang, Wei Zhao, Xiaochao Fang and Yang Zhou for their assistance in my work.

Last but not the least, I would like to thank my parents and sister for their unconditional support throughout my life. Especially I would like to thank my wife, Dr. Wei Yan, for her patience, understanding and encouragement that made me complete the Ph.D. work. I am very grateful to her.

# Contents

| $\mathbf{C}$ | onten                  | iii                                                |
|--------------|------------------------|----------------------------------------------------|
| Li           | st of                  | Figures                                            |
| Li           | st of                  | Tables xi                                          |
| A            | bstra                  | xiii                                               |
| R            | ésum                   | en Français xv                                     |
| In           | trodu                  | ction                                              |
| 1            | ILC                    | Vertex Detector 1                                  |
|              | 1.1                    | The ILC Physics Programme                          |
|              |                        | 1.1.1 The ILC Machine                              |
|              | 1.2                    | ILD at ILC                                         |
|              |                        | 1.2.1 The ILD Layout                               |
|              | 1.3                    | Vertex Detector                                    |
|              |                        | 1.3.1 Requirements                                 |
|              |                        | 1.3.2 Geometries                                   |
|              |                        | 1.3.3 Sensor Technologies                          |
|              |                        | 1.3.4 CMOS Pixel Sensors for VTX                   |
|              | 1.4                    | Summary                                            |
|              | Bibli                  | ography                                            |
| 2            | $\mathbf{C}\mathbf{M}$ | OS Pixel Sensors for Charged Particle Detection 17 |
|              | 2.1                    | Detection Principle                                |

<u>iv</u> CONTENTS

|   | 2.2  | CPS A   | Architecture                             | 19        |
|---|------|---------|------------------------------------------|-----------|
|   |      | 2.2.1   | Pixel Circuit                            | 20        |
|   |      | 2.2.2   | Signal Processing Circuit                | 26        |
|   | 2.3  | State-o | of-the-art CPS                           | 30        |
|   |      | 2.3.1   | MIMOSA 26                                | 31        |
|   |      | 2.3.2   | ULTIMATE                                 | 33        |
|   | 2.4  | Sensor  | Concept for the ILD Vertex Detector      | 36        |
|   |      | 2.4.1   | Sensor Equipping the Innermost Layer     | 36        |
|   |      | 2.4.2   | Sensor Equipping the Outer Layers        | 38        |
|   | 2.5  | Summ    | ary                                      | 40        |
|   | Bibl | iograph | y                                        | 41        |
| 3 | Cal  | T       | evel Analog-to-Digital Converter for CPS | 45        |
| 3 | 3.1  |         | cations of A/D Converters                | <b>45</b> |
|   | 3.1  | 3.1.1   | Quantization Error                       | 45<br>45  |
|   |      | 3.1.2   | Differential Nonlinearity                | 47        |
|   |      | 3.1.3   | Integral Nonlinearity                    | 48        |
|   |      | 3.1.4   | Offset Error                             | 49        |
|   |      | 3.1.5   | Signal-to-Noise ratio                    | 49        |
|   |      | 3.1.6   | Noise                                    | 50        |
|   |      | 3.1.7   | Settling Time                            | 52        |
|   | 3.2  |         | Reduction Techniques                     | 52        |
|   | J.2  | 3.2.1   | Offset Compensation                      | 52        |
|   |      | 3.2.2   | Sampling Switches                        | 55        |
|   | 3.3  | 9       | Converter Architectures                  | 59        |
|   |      | 3.3.1   | Flash ADC                                | 59        |
|   |      | 3.3.2   | Two-Step ADC                             | 60        |
|   |      | 3.3.3   | Subranging ADC                           | 61        |
|   |      | 3.3.4   | Pipeline ADC                             | 63        |
|   |      | 3.3.5   | Folding ADC                              | 64        |
|   |      | 3.3.6   | Successive Approximation Register ADC    | 65        |
|   |      | 3.3.7   | Sigma-Delta ADC                          | 66        |
|   | 3.4  | Colum   | an-Level ADC Suitable to Vertex Detector |           |

CONTENTS

|   | 3.5  | Summary                                      |
|---|------|----------------------------------------------|
|   | Bibl | iography                                     |
| 4 | Des  | ign of a Sensor Prototype 75                 |
|   | 4.1  | Global Architecture                          |
|   | 4.2  | Pixel Circuit                                |
|   | 4.3  | Column-Level ADC                             |
|   |      | 4.3.1 Design Requirements                    |
|   |      | 4.3.2 Operation principle                    |
|   |      | 4.3.3 Optimal Power Saving                   |
|   |      | 4.3.4 Sample-and-hold                        |
|   |      | 4.3.5 Comparator                             |
|   |      | 4.3.6 Digital Logic                          |
|   |      | 4.3.7 DAC                                    |
|   | 4.4  | Simulation Results and Layout                |
|   | 4.5  | Summary                                      |
|   | Bibl | iography                                     |
| 5 | Fyr  | perimental Results 111                       |
| J | 5.1  | Test Board and Setup                         |
|   | 5.2  | Test Results                                 |
|   | 9.2  | 5.2.1 Noise Performance                      |
|   |      | 5.2.1 Noise l'enformance                     |
|   |      | 5.2.3 ADC Performance                        |
|   | 5.3  | Summary                                      |
|   |      | iography                                     |
|   | וטוע | lography                                     |
| 6 | Imp  | provements on MIMOSA 31                      |
|   | 6.1  | Zero Suppression for MIMOSA 31               |
|   |      | 6.1.1 Physical Characteristics               |
|   |      | 6.1.2 Hit Recognition                        |
|   |      | 6.1.3 Data Sparsification Algorithm          |
|   |      | 6.1.4 Separated Sparse Banks in Pixel Matrix |
|   | 6.2  | Self-Timed ADC                               |

vi CONTENTS

|             |       | 6.2.1     | Architecture Design             | 131       |
|-------------|-------|-----------|---------------------------------|-----------|
|             |       | 6.2.2     | Enhanced S/H Circuit            | 132       |
|             |       | 6.2.3     | Self-Timed Comparator           | 135       |
|             |       | 6.2.4     | Simulation Results              | 139       |
|             | 6.3   | Extend    | ded Self-Timed Discriminator    | 140       |
|             | 6.4   | Summ      | ary                             | 142       |
|             | Bibli | iograph   | y                               | 144       |
|             |       |           | and Perspectives  infigurations | 145 $151$ |
| <b>/1</b> . |       |           | _                               |           |
|             |       |           | _ID Register                    |           |
|             | A.2   | BIAS_     | _DAC Register                   | 152       |
|             | A.3   | PATT      | _LINE Register                  | 153       |
|             | A.4   | DIS_A     | ADC Register                    | 153       |
|             | A.5   | PIX_S     | SEQ Register                    | 153       |
|             | A.6   | $ADC_{-}$ | _SEQ Register                   | 154       |
|             |       |           |                                 |           |

# List of Figures

| 1    | Géométrie d'un détecteur de vertex. A gauche: 5 couches simples (VTX-            |
|------|----------------------------------------------------------------------------------|
|      | SL). A droite: 3 couches doubles (VTX-DL) xvi                                    |
| 2    | Schéma de principe du capteur à pixels CMOS proposé xvii                         |
| 3    | Microphotographie du capteur                                                     |
| 1.1  | Cross sections of physics processes as a function of the collision energy. $ 2 $ |
| 1.2  | Schematic layout of the International Linear Collider                            |
| 1.3  | Illustration of the pinch effect in bunch collisions                             |
| 1.4  | Bunch timing scheme of the ILC                                                   |
| 1.5  | Schematic view of the ILD detector concept                                       |
| 1.6  | Quadrant of the ILD detector concept                                             |
| 1.7  | Vertex detector geometries. Left: 5 single-sided ladders (VTX-SL). Right:        |
|      | 3 double-sided ladders (VTX-DL)                                                  |
| 2.1  | CMOS sensor operation principle                                                  |
| 2.2  | A simplified diagram of the CMOS pixel sensor                                    |
| 2.3  | Structure of the 3T pixel (left) and self-biased (SB) pixel (right)              |
| 2.4  | Common source amplifier (left) and improved amplifier (right)                    |
| 2.5  | Improved common source amplifier with feedback                                   |
| 2.6  | In-pixel CDS and enhanced CS amplifier with feedback                             |
| 2.7  | (a) Block diagram of the discriminator and (b) related timing                    |
| 2.8  | Hit recognition and encoding of the pixels                                       |
| 2.9  | Block diagram of the zero suppression                                            |
| 2.10 | EUDET beam telescope                                                             |
| 2.11 | Block diagram of MIMOSA-26                                                       |

viii LIST OF FIGURES

| 2.12 | MIMOSA-26 beam test results obtained at the CERN-SPS with $\sim 120~\text{GeV}$               |    |
|------|-----------------------------------------------------------------------------------------------|----|
|      | charged particles. The detection efficiency (black curve), the fake hit rate                  |    |
|      | (blue curve) and the single point resolution (red curve) are evaluated with                   |    |
|      | various discriminator thresholds                                                              | 33 |
| 2.13 | Block diagram of ULTIMATE                                                                     | 34 |
| 2.14 | Beam test results of ULTIMATE with power supply of 3.3 V, before (left)                       |    |
|      | and after (right) exposure to a dose of 150 kRads                                             | 35 |
| 2.15 | Beam test results of ULTIMATE with power supply of 3 V, before (left)                         |    |
|      | and after (right) exposure to a dose of 150 kRads                                             | 35 |
| 2.16 | Schematic of the combination of AROM (elongated pixels for time resolu-                       |    |
|      | tion) and MIMOSA (square pixels for spatial resolution) sensors equipping                     |    |
|      | the double-sided ladder                                                                       | 37 |
| 2.17 | Resolution vs pixel pitch                                                                     | 39 |
| 3.1  | The ADC transfer function                                                                     | 46 |
| 3.2  | Differential nonlinearity (DNL)                                                               | 47 |
| 3.3  | Integral nonlinearity (INL)                                                                   | 48 |
| 3.4  | Offset error                                                                                  | 49 |
| 3.5  | Simple sampling circuit (left) and thermal noise equivalent circuit (right). $\boldsymbol{.}$ | 51 |
| 3.6  | Output offset storage architecture                                                            | 53 |
| 3.7  | Input offset storage architecture                                                             | 54 |
| 3.8  | CMOS switch (left) and on-resistance behavior (right)                                         | 55 |
| 3.9  | Charge injection through the source and drain terminals                                       | 56 |
| 3.10 | Clock feedthrough through gate-source and gate-drain overlap capacitance.                     | 57 |
| 3.11 | Dummy switch to reduce the charge injection and clock feed<br>through. $\ \ .$ $\ \ .$        | 58 |
| 3.12 | Block diagram of an N-bit flash ADC                                                           | 60 |
| 3.13 | Two-step ADC architecture                                                                     | 61 |
| 3.14 | Subranging ADC architecture                                                                   | 62 |
| 3.15 | Pipeline ADC architecture                                                                     | 63 |
| 3.16 | Folding ADC architecture                                                                      | 64 |
| 3.17 | Successive approximation ADC architecture (top) and its operation prin-                       |    |
|      | ciple (bottom)                                                                                | 66 |
| 3.18 | Sigma-delta ADC architecture                                                                  | 67 |

LIST OF FIGURES ix

| 3.19 | ADC architecture power efficiency comparison                                  | 68  |
|------|-------------------------------------------------------------------------------|-----|
| 4.1  | Global architecture of the CMOS pixel sensor                                  | 76  |
| 4.2  | SB pixel architecture                                                         | 77  |
| 4.3  | ADC block diagram                                                             | 79  |
| 4.4  | ADC approximation procedure diagram                                           | 80  |
| 4.5  | Input/output characteristic of an ideal multiple-bit/step ADC                 | 81  |
| 4.6  | ADC operation waveforms showing (a) operation plan (b) timing control         | 82  |
| 4.7  | Typical sample-and-hold circuit                                               | 84  |
| 4.8  | Small signal model of the switched-capacitor (SC) circuit                     | 84  |
| 4.9  | (a) Sample and hold architecture and (b) related timing diagram               | 87  |
| 4.10 | OTA schematic                                                                 | 89  |
| 4.11 | Loop gain and phase margin of the OTA                                         | 90  |
| 4.12 | Input/output characteristic of an ideal comparator and a high-gain amplifier. | 92  |
| 4.13 | A simple latch comprising two back-to-back amplifiers                         | 93  |
| 4.14 | A simplified small signal model of the latch                                  | 93  |
| 4.15 | Time response of the latch                                                    | 94  |
| 4.16 | Offset compensated comparator with preamplifier                               | 95  |
| 4.17 | Auto-zeroed comparator diagram                                                | 96  |
| 4.18 | Circuit diagram with buffer                                                   | 97  |
| 4.19 | Switched preamplifier schematic                                               | 97  |
| 4.20 | Dynamic error versus the settling time of the ADC array                       | 98  |
| 4.21 | Dynamic latch with current source                                             | 100 |
| 4.22 | (a) clock manager and (b) related timing diagram                              | 102 |
| 4.23 | DAC diagram                                                                   | 103 |
| 4.24 | Simulated DAC (a) output voltage and (b) dynamic error                        | 103 |
| 4.25 | Simulated (a) DNL and (b) INL versus code                                     | 104 |
| 4.26 | Layout of the sensor prototype                                                | 106 |
| 5.1  | Proximity test board                                                          | 112 |
| 5.2  | Microphotograph of wire-bonded chip                                           | 113 |
| 5.3  | Normalized response versus threshold voltage                                  | 114 |
| 5.4  | (a) Measured temporal noise (b) fixed pattern noise                           | 115 |

LIST OF FIGURES

| 5.5  | Calibration results with 5.9 keV X-ray photons for a single pixel. The        |   |
|------|-------------------------------------------------------------------------------|---|
|      | tested pixel is with a standard epitaxial layer                               | ŝ |
| 5.6  | Total charge collection with 5.9 keV X-ray photons for different clusters.    |   |
|      | The test is performed at room temperature                                     | 7 |
| 5.7  | Test bench of the ADC measurement                                             | 3 |
| 5.8  | Timing diagram of the ADC measurement                                         | 3 |
| 5.9  | Normalized number of counts versus threshold voltage                          | 9 |
| 5.10 | (a) Measured temporal noise and (b) fixed pattern noise                       | 9 |
| 5.11 | Basic histogram test setup                                                    | ) |
| 5.12 | Number of counts versus ADC output code                                       | ) |
| 5.13 | (a) Measured DNL and (b) INL versus code                                      | 1 |
| 6.1  | Characteristics of the full size sensor in the outer layers                   | ദ |
| 6.2  | Schematic of the pixels delivering signal above threshold                     |   |
| 6.3  | Concept of a hit represented by a regular cluster                             |   |
| 6.4  | Concept of sparse scan including searching states and identifying hits with   | , |
| 0.1  | a group of three pixels                                                       | 3 |
| 6.5  | Block diagram of the state encoding circuit                                   |   |
| 6.6  | Improved ADC block diagram                                                    |   |
| 6.7  | (a) Enhanced sample and hold architecture and (b) related timing diagram. 132 |   |
| 6.8  | OTA schematic                                                                 |   |
| 6.9  | Loop gain and phase margin of the OTA                                         |   |
|      | Waveforms showing (a) standard bit-cycling (b) self timed bit-cycling 136     |   |
|      | Self-timed comparator diagram                                                 |   |
|      | Switched preamplifier schematic                                               |   |
|      | Dynamic latch with current source                                             |   |
|      | Simulated (a) DNL and (b) INL versus code                                     |   |
|      | Schematic of the discriminator                                                |   |
|      | Enabled (a) preamplifier and (b) source follower                              |   |
|      | (a) Enable signal generation and (b) related timing                           |   |
|      |                                                                               |   |
| B.1  | First page of the schematic                                                   |   |
| B.2  | Second page of the schematic                                                  |   |
| B.3  | Third page of the schematic                                                   | 3 |

# List of Tables

| 1.1 | Anticipated impact parameter resolution for the ILD, compared to other   |
|-----|--------------------------------------------------------------------------|
|     | collider experiments                                                     |
| 1.2 | Geometrical parameters of the two vertex detector options                |
| 2.1 | Extrapolation from previous measurements. The spatial resolution depends |
|     | on the pixel pitch, epitaxial layer and number of bits                   |
| 4.1 | The simulation results of the pixel circuit                              |
| 4.2 | The simulation results of the OTA                                        |
| 4.3 | The simulation results of the preamplifier                               |
| 4.4 | The simulation results of the dynamic latch                              |
| 4.5 | Performance Summary                                                      |
| 5.1 | Performance Summary of the Sensor Prototype                              |
| 6.1 | The simulation results of the OTA                                        |
| 6.2 | The simulation results of the preamplifier                               |
| 6.3 | Performance summary                                                      |
| A.1 | JTAG instruction registers                                               |
| A.2 | ID code of MIMOSA 31                                                     |
| A.3 | Bias generation register                                                 |
| A.4 | Pattern line register                                                    |
| A.5 | Disable ADC register                                                     |
| A.6 | Pixel sequencer configuration                                            |
| Α 7 | ADC sequencer configuration 154                                          |

### Abstract

This work deals with the design of a CMOS pixel sensor prototype (called MIMOSA 31) for the outer layers of the International Linear Collider (ILC) vertex detector. CMOS pixel sensors (CPS) also called monolithic active pixel sensors (MAPS) have demonstrated attractive performance towards the requirements of the vertex detector of the future linear collider. MIMOSA 31 developed at IPHC-Strasbourg is the first pixel sensor integrated with 4-bit column-level ADC for the outer layers. It is composed of a matrix of 64 rows and 48 columns. The pixel concept combines in-pixel amplification with a correlated double sampling (CDS) operation in order to reduce the temporal and fixed pattern noise (FPN). At the bottom of the pixel array, each column is terminated with an analog to digital converter (ADC). The self-triggered ADC accommodating the pixel readout in a rolling shutter mode completes the conversion by performing a multi-bit/step approximation. The ADC design was optimized for power saving at sampling frequency. Accounting the fact that in the outer layers of the ILC vertex detector, the hit density is in the order of a few per thousand, this ADC works in two modes: active mode and inactive mode. This thesis presents the details of the prototype chip and its laboratory test results.

**Keywords**: CMOS pixel sensors (CPS), monolithic active pixel sensors (MAPS), International Linear Collider (ILC), vertex detector, correlated double sampling (CDS), analog to digital converter (ADC), column-level, self-triggered, multi-bit/step approximation.

# Résumé en Français

### R.1 Introduction

Les expériences de Physique des Hautes Energies, telles que le futur International Linear Collider (ILC), requièrent des détecteurs de vertex de haute précision qui doivent être composés de capteurs à pixels très granulaires et minces. Profitant des conditions de fonctionnement de l'ILC, qui sont beaucoup moins contraignantes que celles du Large Hadron Collider (LHC), des spécifications physiques telles que la résolution spatiale peuvent être privilégiées au détriment de la vitesse de lecture ou de la tolérance aux radiations. Les capteurs CMOS à pixels (CPS), également appelés capteurs monolithiques à pixels actifs (MAPS) et qui sont un axe fort de recherche à l'IPHC-Strasbourg (Institut Pluridisciplinaire Hubert Curien), ont montré des performances intéressantes pour les spécifications du détecteur de vertex. Ils peuvent facilement atteindre la granularité et le budget matière recherché, et ne nécessitent pas la mise en place un système de refroidissement qui augmente le budget matière dans le volume fiduciel du détecteur de vertex.

Le sujet de cette thèse est la conception d'un concevoir un prototype de capteur à pixel CMOS adapté aux couches extérieures du détecteur de vertex ILD VTX. L'ILD VTX impose des exigences strictes sur les capteurs à pixels CMOS. Il existe deux géométries différentes pour le VTX, comme l'illustre la figure 1. L'un d'eux (VTX-SL) dispose de 5 couches simples équidistants, alors qu'une option alternative (VTX-DL) dispose de 3 couches doubles. Les capteurs équipant la couche la plus interne dans les deux géométries doivent avoir une résolution spatiale inférieure à 3  $\mu$ m et associée à un temps d'intégration très court (moins de 50  $\mu$ s) en raison du beamstrahlung. Dans les conditions de fonctionnement de l'ILC, une milliseconde de collisions intenses est entrecoupée de 199 ms sans faisceau. Le bruit de fond du faisceau nécessite que pendant les collisions d'un train de particule, les capteurs doivent être lu vingt fois ou plus afin de maintenir un



Figure 1: Géométrie d'un détecteur de vertex. A gauche: 5 couches simples (VTX-SL). A droite: 3 couches doubles (VTX-DL).

taux d'occupation de la matrice de pixels inférieur à 1%. Cette contrainte nécessite un effort de R&D se focalisant sur un design à haute vitesse de lecture. Des pixels à faible pitch terminés par un discriminateur est proposé. Les capteurs envisagés pour les couches externes, qui sont les plus grandes, s'étendant sur près de 90% de la surface totale du VTX, ont moins de contraintes en termes de résolution spatiale et de vitesse de lecture. Une résolution spatiale de 3-4  $\mu$ m combinée avec un temps d'intégration inférieur à 100  $\mu$ s devrait constituer un compromis acceptable. Dans ce cas, l'effort de conception se concentre sur la réduction de la consommation d'énergie. Un pixel avec un pitch de 35  $\mu$ m combiné avec un convertisseur analogique-numérique (analog-to-digital converter, ADC) de 4 bits est proposé, réduisant ainsi la consommation d'énergie tout en gardant la résolution spatiale nécessaire.

### R.2 Travail Doctoral

#### R.2.1 Partie 1

L'architecture du prototype nommé MIMOSA 31 comprend une matrice de pixels de 48 colonnes par 64 lignes, des ADC en bas de colonne et d'un microcircuit de lecture numérique périphérique, comme l'illustre la figure 2.

Les pixels sont lus ligne par ligne en mode d'obturation roulant. Chaque pixel est



Figure 2: Schéma de principe du capteur à pixels CMOS proposé.

composé d'une amplification interne avec une opération de double échantillonnage corrélé (correlated double sampling, CDS), qui a été validé dans les capteurs précédents (MI-MOSA 26 conçu pour le projet EUDET de téléscope de faisceau et ULTIMATE qui équipe le sous-système STAR-PXL). Les ADCs de bas de colonne recoivent la sortie des pixels en parallèle et réalisent la conversion en effectuant une approximation de multi-bit/step défini ci-dessous. L'architecture de l'ADC est similaire à un convertisseur à approximations successives (successive approximation register, SAR), avec une faible consommation d'énergie et une vitesse modérée (plusieurs méga-échantillons par second). Les prototypes précédents ont permis de vérifier que le bruit du pixel est d'environ 1 mV. Afin diminuer la résolution sur la position de reconstruction de la particule, le bit le moins significatif (least significant bit, LSB) est fixé au niveau du bruit du pixel. Des études de physique antérieures montrent que le codage approximatif de l'amplitude des pixels ayant un fort signal dans un cluster ne dégrade pas la résolution spatiale. Par conséquent, un encodage variable du signal est utilisé, allant d'un maximum de 4 bits pour les sig-



Figure 3: Microphotographie du capteur.

naux de faible amplitude à seulement 2 bits pour les grands signaux. Après la conversion analogique/numérique, les sorties numériques sont mémorisées, et transmises en série vers l'extérieur à travers un multiplexeur de 8 vers 1. Les paramètres de réglage du capteur sont programmables à distance à travers le protocole JTAG. Afin de comparer les performances des circuits de lecture, le circuit possède également huit sorties analogiques.

Sachant que dans les couches externes de l'ILC VTX, la densité de pixels touchés est de l'ordre de quelques pour mille, l'ADC est conçu pour fonctionner en deux modes (actifs et inactifs) afin de minimiser la consommation d'énergie. L'ADC utilise une tension de seuil pour déclencher la conversion. Si le signal du pixel est supérieur au seuil, l'ADC fonctionne en mode actif et effectue la conversion, sinon, l'ADC fonctionne en mode inactif et reste en sommeil jusqu'à la prochaine conversion. L'utilisation de cette méthode permet d'économiser considérablement la puissance consomée.

Le prototype a été conçu et fabriqué dans une technologie CMOS 0.35  $\mu$ m, 2 couches de polysilicium et 4 niveaux de métal. La superficie totale du circuit est de 4 × 4.8 mm<sup>2</sup>, comme l'illustre la figure 3. La matrice de pixels est de 48 × 64 pixels avec un pas de 35  $\mu$ m. La matrice d'ADC en bas de colonnes est situés directement en dessous de la matrice de pixels. La surface d'un ADC est de 35 × 545  $\mu$ m<sup>2</sup>.

Les résultats des tests préliminaires indiquent que MIMOSA 31 répond aux exigences des spécifications de la conception. Les tests en laboratoire ont été effectués en trois parties: le test de la matrice de pixels avec les ADCs, le test des pixels et enfin le test des ADCs. Ces résultats des tests permettent de déterminer les performances de base, y compris le bruit temporel, le bruit motif fixe (fixed pattern noise, FPN), le bruit équivalente en charge (equivalent noise charge, ENC), le facteur de conversion charge-tension (charge-to-voltage conversion factor, CVF), l'efficacité de collection des charges (charge collection efficiency, CCE) et la non-linéarité.

Les courbes de transfert indiquant les performances en bruit, ont été obtenues par balayage de la tension de seuil de l'ADC. A partir de ces courbes ayant une distribution cumulative, on peut déduire le bruit temporel et le FPN. Un bruit temporel de 1.36 mV et un FPN de 0.98 mV ont été mesurés sur l'ensemble pixel et ADC.

Le test des pixels seuls permet d'extraire un bruit équivalent en charge (ENC) correspondant à  $18.6~{\rm e_{rms}^-}$ . Le facteur de conversion charge-tension mesuré (CVF) obtenu pour un unique pixel est de  $60~\mu{\rm V/e^-}$ . Afin d'étudier la répartition des charges, les performances des groupes de pixels touchés ont été analysées. L'efficacité de la collection des charges (CCE) mesuré sur p1 (pixel central), p4 (2 × 2 pixels), p9 (3 × 3 pixels) et p25 (5 × 5 pixels) sont respectivement de 18%, 49%, 66% et 74%.

Le test des ADCs seuls a permis d'analyser la non-linéarité. La non-linéarité différentielle mesurée (differential nonlinearity, DNL) est de 0.49/-0.28 LSB et la non-linéarité intégrale (integral nonlinearity, INL) est de 0.29/-0.20 LSB. Le temps de conversion de l'ADC est de 80 ns avec une fréquence d'échantillonnage de 6.25 MHz. Il consomme 486  $\mu$ W en mode inactif, qui est de loin le plus fréquent, cette consomation s'élève à 714  $\mu$ W en mode actif.

### R.2.2 Partie 2

Afin de réduire davantage la consommation d'énergie globale du capteur, une nouvelle architecture auto-synchronisée de l'ADC à très faible puissance est proposée. La structure de cette nouvelle solution consiste en une amélioration de l'échantillonneur-bloqueur (sample-and-hold, S/H) et l'ajout d'une technique d'auto-synchronisation. La partie de S/H est renforcée par l'utilisation d'une architecture à double échantillonnage corrélé (CDS) afin de réduire le bruit de motif fixe (FPN) du pixel et de l'amplificateur opéra-

tionnel. Ceci permet d'éviter l'implémentation d'un condensateur d'auto-zéro supplémentaire, minimisant ainsi la consommation de puissance tout en maintenant la conversion des signaux avec les fréquences souhaitées. L'efficacité du comparateur est améliorée en utilisant des signaux auto-synchronisés, et en relâchant le temps de stabilisation du préamplificateur. Avec améliorations la consommation totale d'énergie est réduite jusqu'à 54% en mode inactif et 40% en mode actif, par rapport à l'ADC implanté dans MIMOSA 31.

### R.3 Conclusion

MIMOSA 31 est le premier prototype de capteur CMOS intégrant un ADC en bas de colonne de 4-bit à une matrice de pixels dédiée aux couches externe d l'ILD-VTX. Les résultats préliminaires indiquent que MIMOSA 31 répond au cahier des charges pour cette couche de capteurs. La caractérisation de MIMOSA 31 sera complétée par des tests en faisceau afin de mesurer la résolution spatiale. Le prototype a été conçu avec les caractéristiques d'un capteur de taille finale c'est-à-dire environ  $2 \times 2 \text{ cm}^2$ , et peut donc être facilement étendu dans des versions futures.

Un nouvel ADC auto synchronisé à puissance très faible est développé. La consommation totale d'énergie est considérablement réduite alors que dans le même temps l'ADC maintient une vitesse de conversion élevée. Les résultats de la simulation démontrent que la puissance consommée est réduite de 53% alors que les paramètres d'origine ont été conservés.

# Introduction

High Energy Physics (HEP) experiments, such as the future International Linear Collider (ILC), have expressed an increasing demand for high precise vertex detectors, to be equipped with very granular and thin pixel sensors. Taking advantage of the ILC running conditions, which are much less demanding than those at the Large Hadron Collider (LHC), physics driven specifications such as spatial resolution can be privileged at the expense of read-out speed or radiation tolerance. Since CMOS Pixel Sensors (CPS), also called Monolithic Active Pixel Sensors (MAPS) are well developed at IPHC-Strasbourg (Institut Pluridisciplinaire Hubert Curien), they have demonstrated attractive performances towards the specifications of the vertex detector. They can easily match the targeted granularity and material budget, and do not introduce a cooling system which adds material budget in the fiducial volume of the vertex detector.

This thesis deals with the design of a CMOS pixel sensor prototype adapted to the ILC vertex detector (VTX) outer layers. The International Large Detector (ILD) is one of the detector concepts proposed for the ILC. The ILD VTX has driven stringent requirements on the CMOS pixel sensors. There are two different geometries for the VTX. One of them (VTX-SL) features 5 equidistant single layers, while an alternative option (VTX-DL) features 3 double layers. Sensors equipping the innermost layer in both geometries should exhibit a single point resolution better than 3  $\mu$ m associated to a very short integration time (less than 10  $\mu$ s) because of the beamstrahlung background. In the ILC running conditions one millisecond of intense collisions is interspaced with 199 ms without beam. The beam background dictates that during the collisions of a single bunch train, sensors are supposed to read out twenty or more times to maintain the pixel occupancy below 1%. This requirement motivates an R&D effort concentrating on a high read-out speed design. A small pixel pitch terminated with a discriminator is proposed. The sensors envisioned for the outer layers, which are the largest ones, standing for about 90% of the

xxii Introduction

total VTX surface, have less constrains in term of spatial resolution and read-out speed. A single point resolution of 3-4  $\mu$ m combined with an integration time shorter than 100  $\mu$ s are expected to constitute a valuable trade-off. In this case, the design effort focuses on minimizing the power consumption. A larger pixel pitch of 35  $\mu$ m combined with a 4-bit ADC is proposed, therefore reducing the power consumption and keeping necessary spatial resolution.

This thesis is organized as follows:

- In chapter 1, the ILC physics programme and the ILD vertex detector will be introduced. In this chapter, the physics motivation is outlined, highlighted by the precision measurements of the Higgs Boson candidate that was recently discovered at the LHC. Then the high precise vertex detector that exhibits excellent performance in terms of flavour tagging and track reconstruction is presented. The CMOS pixel sensors (CPS) proposed to equip the vertex detector is described in the end.
- In Chapter 2, the basic principle of the CMOS pixel sensor equipping the vertex detector will be presented. In this chapter, the design specifications, architecture and characteristic of the CPS will be described. With the state-of-the-art CPS developed, these devices are close to comply with all major requirements for the innermost layer and outer layers. Then a sizeable sensor prototype integrated with 4-bit column-level ADCs aiming to equip the outer layers is proposed.
- In Chapter 3, the column-level ADC suitable for our application will be addressed. In this chapter, the performance parameters of an column-level ADC will be presented. Also different techniques will be described in order to eliminate the non-ideal errors. Different ADC architectures are briefly reviewed, and then the column-level architecture suitable for our application are presented, including its basic building blocks.
- In Chapter 4, the design method of the sensor prototype will be presented. In the first section, the system level design of the prototype chip will be described, and then in the following sections circuit implementations including pixel and ADC will be presented. Also the design requirements and considerations will be described in more detail. Finally, the simulation results and layout of the prototype chip will be provided.
- In Chapter 5, the test results are presented. In this chapter, the test board and measurement setup for the sensor prototype will be described. Then the laboratory

Introduction xxiii

test results will be presented, which have been performed on pixels and column ADCs in order to determine the basic performances including temporal noise, fixed pattern noise (FPN), equivalent noise charge (ENC), charge collection efficiency (CCE), charge-to-voltage conversion factor (CVF) and nonlinearity.

- In Chapter 6, improvements on CMOS pixel sensors will be addressed. In this chapter, a zero-suppression method proposed for the digital outputs in MIMOSA 31 will be described in more detail. In the following sections, optimizations such as power saving techniques for both of ADC and discriminator will be presented.
- The conclusions about the prototype chip will be provided at the end. The chip is the first CMOS sensor prototype integrating 4-bit column-level ADCs for the ILC VTX outer layers. It was designed with the specifications of the full size sensor (about 2 × 2 cm<sup>2</sup>), and therefore can be easily extended in the future. Also the perspectives of the CMOS pixel sensor for vertex detector are presented.

# Chapter 1

# **ILC Vertex Detector**

This initial chapter introduces the background for the ILC physics and detector. First, the physics motivation is outlined, highlighted by the precision measurements of the Higgs Boson candidate that was recently discovered at the LHC. Next, the chapter describes the International Large Detector (ILD), one of the detector concepts at the ILC. It has the ability to achieve an excellent vertexing and tracking in order to reconstruct the secondary vertices and to measure precisely the momenta of tracks. The main topic of this thesis is the ILC vertex detector. With the physical requirements, high precise vertex detector that exhibits excellent performance in terms of flavour tagging and track reconstruction is presented. The chapter ends with a description of the CMOS pixel sensors (CPS) proposed to equip the vertex detector.

## 1.1 The ILC Physics Programme

High Energy Physics (HEP) experiments using particle accelerators convert matter into energy and generate new particles by colliding in order to explore the most elementary structure of the universe. During the next few years, experiments at CERN's Large Hadron Collider (LHC) will have the first direct look at Terascale physics. However, the highly extreme data rate and the not well defined initial state of an event make a very complicated realization in precision measurements. After the discoveries, more precise measurements are needed which can be provided by lepton colliders instead of hadron colliders. The advantage of lepton machine is the well defined initial state of the collision due to the structureless leptons and less severe background level that leads to



Figure 1.1: Cross sections of physics processes as a function of the collision energy.

clean signatures in the detector frame. The International Linear Collider (ILC) where electrons will collide with positrons is a strong candidate for the precision study of the Higgs boson. It will operate in the center of mass energy range of 250 GeV to 500 GeV, with the possibility for a later upgrade to 1 TeV. The main purpose of the ILC experiment is to measure very precisely the properties of the Higgs particle and any new particles that may exit, following the initial outcome of the LHC experiment.

The ILC programme will access all of the Higgs boson production reactions. Figure 1.1 show the cross sections of physics processes as a function of the collision energy. The Higgs boson programme of the ILC begins at the energy of 250 GeV, near the peak of

the cross section for  $e^+e^- \to Zh$ . The presence of a Z boson at the energy tags the Higgs boson events. This allows direct measurement of the Higgs boson branching ratios. At higher energy, the WW fusion process of Higgs production turns on. Measurement of this process at the full ILC energy of 500 GeV gives a model-independent precision measurement of the Higgs boson. Experiments at 500 GeV also give first measurement of the Higgs boson coupling to  $t\bar{t}$ . At a center of mass energy of 1 TeV, all of the Higgs boson production reactions are fully accessible.

The ILC thus offers a rich experimental programme that addresses the most important open issues in elementary particle physics. In this respect, the ILC will be essential to move forward on mechanism understanding.

### 1.1.1 The ILC Machine

The International Linear Collider (ILC) is a linear particle accelerator, the next generation collider for high energy physics. Compared to the circular collider, the linear collider avoids energy loss caused by synchrotron radiation, allowing to reach high energy. The site for the accelerator has not yet been decided. The total footprint is between 31 km and 50 km, plus two damping rings each with a circumference of 6.7 km. The interaction region of the ILC is designed to host two detectors, which can be moved into the beam position with a "push-pull" scheme. As already well documented in the ILC Reference Design Report (RDR) [1], the ILC programme will extend and complement the physics experiments of the Large Hadron Collider (LHC).



Figure 1.2: Schematic layout of the International Linear Collider.

Figure 1.2 shows a schematic view of the overall layout of the ILC. The electron

source, the damping rings and the positron auxiliary source are centrally located around the interaction region (IR) [2]. The major subsystems include:

- a polarized electron source based on a photocathode DC gun.
- an undulator-based positron source driven by the 150 GeV main electron beam.
- 5 GeV electron and positron damping rings (DR) located in a common tunnel at the center of the ILC complex.
- beam transport from the damping rings to the main linacs, followed by a two-stage bunch compressor system prior to injection into the main linac.
- two 11 km long main linacs (in future will be extended for an upgrade to 1 TeV), utilizing 1.3 GHz SCRF cavities, operating at an average gradient of 31.5 MV/m, with a pulse length of 1.6 ms.
- a 4.5 km long beam delivery system, which bring s the two beams into collision with a 14 mrad crossing angle, at a single interaction point which can be share by two detectors.

ILC has an unprecedented potential for precision measurements, with new windows of exploration for physics beyond the Standard Model. This implies new requirements on experimental accuracies. This in turn drives the need for more precise detectors.

#### 1.1.1.1 Beam Related Background



Figure 1.3: Illustration of the pinch effect in bunch collisions.

The ILC may provide a very clean experimental environment compared to hadron colliders, but it is certainly not background-free. The most important source of unwanted interactions are machine induced background. The beam interactions are described as

follows. When the two opposite bunches approach each other, they exert a significant electromagnetic force. The individual particles will be accelerated towards the center of the oncoming bunch, as shown in figure 1.3. This mutual attraction is known as the pinch effect and has both advantages and disadvantages. The pinch effect reduces the size of the colliding bunches, increasing the luminosity by a factor of  $\sim 2$ . On the other hand, the deflection of particles by the charge of the opposite bunch will cause beamstrahlung photons which degrade the energy of the beam. The rate of beamstrahlung pairs create high occupancies that demand fast readouts.

#### 1.1.1.2 Beam Time Structure



Figure 1.4: Bunch timing scheme of the ILC.

Events at the ILC are accompanied by a beamstralung background. Each crossing produces a large flux of electrons and protons caused by incoherent pair production and bremstasslung in the intense fields at the interaction point. Figure 1.4 shows the beam structure anticipated for the ILC, containing 2820 bunch crossings (BXs), each separated by 369 ns and followed by a bunch gap of 199 ms [3]. The frequency of the bunch trains is 5 Hz. Each bunch trains is about 1 ms, which translates into a machine duty cycle of  $\sim 0.5\%$ .

The low event rate and moderate background allow a variety of strategies to be considered to optimize the vertex detector. The long inter-train gap raised the possibility of detector during the gap, rather than during the train. The low duty factor means that

6 1.2. ILD at ILC



Figure 1.5: Schematic view of the ILD detector concept.

the average power can be reduced by cycling power off the apparatus in between bunch trains, thereby reducing the mass needed for cooling.

### 1.2 ILD at ILC

Even if the effective collision energy at the LHC will be higher than that of the ILC, measurements could be made more accurately at the ILC. Taking advantage of the ILC running conditions, the physics experiments express a great demands on the high performance detectors. This requires an excellent vertexing and tracking system in order to reconstruct the secondary vertices and to measure precisely the momenta of tracks. This objective translates into the necessity of a precise detector compared to the existing state-of-the-art employed for hadron colliders.

The International Large Detector (ILD) [4] is a concept for a detector at the ILC, as shown in figure 1.5. It is based on the Global Large Detector (GLD) [5] and the Large Detector Comcept (LDC) [6, 7]. ILD combines excellent tracking and finely-grained calorimetry system. This gives ILD the ability to obtain the best possible overall event reconstruction, including the capability to reconstruct individual particles, known as the



Figure 1.6: Quadrant of the ILD detector concept.

Particle Flow approach. The precision that can be achieved by ILD is ideal for studies in particle physics which call for accurate measurements of particles and their properties.

### 1.2.1 The ILD Layout

The ILD concept is designed as a multi-purpose detector. Figure 1.6 shows a schematic view of the quadrant of the ILD detector concept. The proposed ILD detector has the following components.

#### 1.2.1.1 Vertexing

A Si-pixel based vertex detector (VTX) located at the outside of the beam pipe. It is complemented by a silicon tracking system which is constituted by the Silicon Inner Tracker (SIT) in the barrel and the Forward Tracking Disks (FTD) at the endcaps. The VTX detector will consist of multiple layers of state-of-art silicon pixel sensors, with a purely barrel geometry. It is optimized for excellent performance in terms of flavour tagging,

8 1.3. Vertex Detector

long lived particles such as b- and c-hadrons identification and track reconstruction. The main subject of this thesis is the VTX and more details are described in section 1.3.

#### 1.2.1.2 Tracking

The tracking reconstruction is composed of three subdetectors. The main tracker is a large volume time projection chamber (TPC), which plays an important role in the tracking abilities. It is crucial for the reconstruction of the low momentum tracks and the Forward Tracking Disks (FTD) that provide shallow angle coverage. Additional high precision spatial measurements detectors for tracking reconstruction are Silicon Inner Tracker (SIT), Silicon External Tracker (SET) and Endcap Tracking Detector (ETD). The development of the tracking system was driven by the requirement to offer an overall momentum resolution of  $\delta(1/p_T) \simeq 2 \times 10^{-5}/GeV/c$ .

### 1.2.1.3 Calorimetry

The physics program of the ILD requires a jet energy resolution of  $\Delta E_{Jet}/E_{Jet} \sim 3.5\%$ . This target translates into the necessity of high precise calorimetric system. The calorimeters, composed of Electromagnetic Calorimeter (ECAL) and Hadronic Calorimeter (HCAL) are located inside the coil, in order to optimize the jet energy resolution. The ECAL will be a sampling calorimeter using tungsten as an absorber while the HCAL uses stainless steel.

#### 1.2.1.4 Magnetic Field and Yoke

The ILD is designed to operate in a nominal magnetic field of 3.5 Tesla. The main magnetic field is generated by a large volume superconducting coil surrounding the calorimeters. The coil is surrounded by an iron yoke, which returns the magnetic flux. At the same time, the iron is instrumented and serves as a muon detector.

### 1.3 Vertex Detector

To unravel the underlying physics mechanisms of newly observed processes, the identification of heavy flavours will play a critical role. The Vertex Detector (VTX) is the key tool to achieve very high performance flavour tagging by reconstructing displaced vertices [4].

1. ILC Vertex Detector 9

It also plays an important role in the track reconstruction, especially for low momentum particles which don't reach the main tracker or barely penetrate its sensitive volume.

The vertex detector requirements are mainly driven by two competing sources of constraints: the physics goals and the running conditions. The ILC physics goals dictate an unprecedented spatial tree-dimensional point resolution and a very low material budget. The running conditions at the ILC impose the readout speed and radiation tolerance. The requirements are normally in contradiction. High granularity and fast readout compete with each other and tend to increase the power dissipation. Increased power dissipation in turn leads to an increased material budget. The challenges on the vertex detector are considerable and significant R&D is being carried out on both the development of the sensors and the mechanical support.

### 1.3.1 Requirements

In order to identify the flavor (b or charm) of heavy-flavor jets, measure the associated vertex charge and recognize tau-lepton decays, the VTX needs to be optimized in single point resolution. The high granularity necessary to achieve the single point resolution needs to be complemented with a particularly low material budget allowing high precision pointing with low momentum tracks.

The impact parameter resolution from the vertex detector is expressed by the usual gaussian expression:

$$\sigma_{ip} = a \oplus b/p \cdot \sin^{3/2}(\theta) \tag{1.1}$$

where p and  $\theta$  are the particle momentum and polar angle respectively. The parameters a and b can be given by:

$$a = \sigma_{s.p.} \frac{R_{int} \oplus R_{ext}}{R_{ext} - R_{int}} \tag{1.2}$$

$$b = R_{int} \cdot 13.6 MeV/c \cdot z \cdot \sqrt{\frac{x}{X_0 sin\theta}} \left[ 1 + 0.038 \cdot ln(\frac{x}{X_0 sin\theta}) \right]$$
 (1.3)

where  $\sigma_{s.p.}$  is the spatial resolution of the sensors, z is charge of the impinging particle,  $\frac{x}{X_0 sin\theta}$  is the material crossed by the particle given in radiation length,  $R_{int}$  and  $R_{ext}$  are the inner and outer layers radii respectively.

From equation 1.2, the parameter a depends on the single point resolution  $\sigma_{s.p.}$  and the level arm, which is equal to  $R_{ext} - R_{int}$ . From equation 1.3, the parameter b depends

1.3. Vertex Detector

on the distance of the innermost layer to the IP and the material budget  $(x/X_0)$ . The ILD collaboration has set the targeted values for parameters a  $\leq 5 \ \mu m$  and b  $\leq 10 \ \mu m \cdot GeV/c$ . The values of a and b significantly exceed those achieved so far, as illustrated by the comparison in table 1.1, which compares with vertex detector operated at LEP, SLC and LHC as well as planned at RHIC.

| Accelerator | a (μm) | b $(\mu m \cdot GeV/c)$ |  |
|-------------|--------|-------------------------|--|
| LEP         | 25     | 70                      |  |
| SLC         | 8      | 33                      |  |
| LHC         | 12     | 70                      |  |
| RHIC-II     | 13     | 19                      |  |
| ILD         | < 5    | < 10                    |  |

Table 1.1: Anticipated impact parameter resolution for the ILD, compared to other collider experiments.

Studies show that these specifications are met with the a single point accuracy of  $\lesssim$  3  $\mu m$  for a first measured point of tracks at  $\sim$  15 mm from the IP. The material budget between the Interaction Point (IP) and the first measured point should not exceed a few per mill of radiation length, which translates into an upper bound on the ladder material budget in the order of 0.2%  $X_0$ .

#### 1.3.2 Geometries

The ILD vertex detector has been optimized for high spatial resolution and low material budget. The high granularity, combined with the very powerful track reconstruction capabilities of the other tracking detectors, allows integrating over several bunch-crossings without deteriorating the flavour tagging performance. It is made of 5 or 6 cylindrical layers, all equipped with  $\lesssim 50~\mu \mathrm{m}$  thin pixel sensors. The innermost layer has a radius of 15 - 16 mm, a value for which the beamrelated background rate is expected to still be acceptable. And therefore the innermost layer intercepts all particles produced with a polar angle  $(\theta)$  that  $|\cos\theta| \lesssim 0.97$ .

Figure 1.7 shows the view of the vertex detector geometry. Its geometry appears in two alternative options. One of them (called VTX-SL) consists of 5 equidistant single-

1. ILC Vertex Detector 11



Figure 1.7: Vertex detector geometries. Left: 5 single-sided ladders (VTX-SL). Right: 3 double-sided ladders (VTX-DL).

| geometry | radius[mm] |           | ladder length [mm] |        |  |
|----------|------------|-----------|--------------------|--------|--|
|          | VTX-SL     | VTX-DL    | VTX-SL             | VTX-DL |  |
| Layer 1  | 15.0       | 16.0/18.0 | 125.0              | 125.0  |  |
| Layer 2  | 26.0       | 37.0/39.0 | 250.0              | 250.0  |  |
| Layer 3  | 37.0       | 58.0/60.0 | 250.0              | 250.0  |  |
| Layer 4  | 48.0       |           | 250.0              |        |  |
| Layer 5  | 60.0       |           | 250.0              |        |  |

Table 1.2: Geometrical parameters of the two vertex detector options.

sided ladders (i.e. equipped with one layer of sensors), while the other (called VTX-DL) features 3 double-sided ladders (i.e. each ladder being equipped with two layers of sensors on both faces). Fore both geometries, the sensitive area of the innermost layer is 12.5 cm long while it is 25.0 cm long in the other layers (a priori based on 2, 12.5 cm long, butted ladders). The two sides of the double-sided ladders are about 2 mm apart.

Both conceptual designs meet the ILC goals for impact parameter resolution, with the double ladder option giving an impact parameter resolution which is better, particularly for high momentum tracks. Some of their main geometrical parameters are summarised in table 1.2. The complete VTX-SL ladder thickness is equivalent to 0.11%  $X_0$ , while the

1.3. Vertex Detector

double ladders of VTX-DL represent 0.16%  $X_0$ . These values assume 50  $\mu$ m thin silicon pixel sensors.

#### 1.3.3 Sensor Technologies

The physics goals and running conditions call for sensor technology that offers a high granularity, low material budget and high read out speed. A wide variety of options for the sensor technology are investigated. The technologies presently concentrating most of the R&D effort are charge coupled devices (CCD) [8, 9], Hybrid pixel detectors (HPD), DEPFETs [10, 11] and CMOS Pixel Sensors (CPS) [12, 13].

The charge couple devices are widely used in imaging devices, especially in consumer electronics. They were successfully used as vertex detectors in SLD with very high granularity. The high granularity makes CCD a very attractive solution for tracking devices. However, the readout time is limited by the slow charge transfer. Also another limitation of CCD used as tracking devices is their low radiation tolerance.

Hybrid pixel detectors are built from two separately processed silicon layers. The sensing element and readout electronics are connected together using the flip-chip and bump-bonding techniques. They can offer highly readout speed with the independent electronics. However, they exhibit a series of disadvantages such as limited granularity, large material and high power consumption.

The DEPFET device provides low noise performance and low power dissipation. However, the readout electronics use specialized external VLSI circuits that increase the complexity of the system design. A sizeable size DEPFET detector with a fast readout and low noise performance still needs to be demonstrated.

CMOS pixel sensors (CPS) are fabricated in standard CMOS<sup>1</sup> process, featuring several advantages such as low cost, low power, high speed and high integrity. Recently, CPS are making steady progress towards the specifications of the ILD vertex detector, which have shown that they are close to comply with all major requirements [14], in particular the read-out speed needed to cope with the beam related background. They are considered a strong candidate for the ILD vertex detector. In the following section, we are going to focus on the CMOS pixel sensor technology.

<sup>&</sup>lt;sup>1</sup>standing for Complementary Metal-Oxide-Semiconductor.

#### 1.3.4 CMOS Pixel Sensors for VTX

The CMOS Pixel Sensors (CPS) also called Monolithic Active Pixel Sensors (MAPS) used for charged particle detection are developed at IPHC-Strasbourg [15] since the late nineties. Since earlier in 1990s, these devices were progressively replacing Charge Coupled Devices (CCD) in commercial cameras. Initially these devices could not be straightly used in particle tracking because of their low fill factor, which was in the range of 20-30% caused by in-pixel transistors and metal interconnections. Later the fill factor was increased to nearly 100% by using a barrier at the boundaries of the epitaxial layer. Taking this advantage, CPS have been introduced for particle detection which depart from the commercial visible light imagers, which were too slow, exhibited poor detection efficiency and were radiation soft.

Since the start of the development pioneered at IPHC, most of the R&D effort is invested in the sensor design, including both the detection system and integrated signal processing micro-circuits, and into the characterization of sensor prototypes. In the last 12 years, more than 30 different MIMOSA<sup>1</sup> prototypes have been designed and fabricated. The sensors are manufactured by the CMOS industry and can be thinned down to  $\lesssim 50~\mu m$ . The sensor R&D incorporates radiation tolerance and power dissipation constraints. The CMOS pixel sensors have demonstrated good performances for Minimum Ionizing Particles (MIPS) detection [16, 17]. They have an attractive balance between granularity, material budget, readout speed, radiation tolerance and power dissipation. They make it possible to integrated both the sensing elements and readout electronics on a single substrate, thus creating a detector-on-a-chip.

The ILD VTX has driven stringent requirements on the CMOS pixel sensors. Sensors placed on the innermost layer in both geometries should exhibit a spatial resolution better than 3  $\mu$ m associated to a very short integration time (less than 10  $\mu$ s) due to the beamstrahlung background. This requirement motivates an effort concentrating on a high read-out speed design. The sensors envisioned for the outer layers, which are the largest ones, standing for about 90% of the total VTX surface, have less constrains in term of spatial resolution and read-out speed. A single point resolution of 3-4  $\mu$ m combined with an integration time shorter than 100  $\mu$ s are expected to constitute a valuable trade-off. In this case, the design effort focuses on minimizing the power consumption. This thesis

<sup>&</sup>lt;sup>1</sup>standing for Minimum Ionizing particle MOS Active pixel sensor.

1.4. Summary

focuses on the design of a CMOS pixel sensor prototype suited to the outer layers of ILD vertex detector.

### 1.4 Summary

The ILC physics programme are reviewed in this chapter and the ILD vertex detector is presented which is the main subject of this thesis. Several sensor technologies are compared and CMOS pixel sensors for the vertex detector emerged to take up the challenge of such a high performance vertexing, which calls for high granular and thin sensors. Moreover, they are rather swift and radiation tolerant. In the next chapter, more detail of CPS for charged particle detection will be described.

### **Bibliography**

[1] A. Djouadi, J. Lykken, K. Mönig, Y. Okada, M. Oreglia, and S. Yamashita, "International linear collider reference design report volume 2: physics at the ILC." Aug. 2007. [Online]. Available: http://www.linearcollider.org

- [2] N. Phinney, N. Toge, and N. Walker, "International linear collider reference design report volume 3: accelerator." Aug. 2007. [Online]. Available: http://www.linearcollider.org
- [3] M. A. Thomson, "Detectors at a future linear collider," Birmingham HEP Seminar, Feb. 2011.
- [4] "International large detector letter of intent," The ILD Concept Group, Feb. 2010. [Online]. Available: http://www.ilcild.org
- [5] "GLD detector outline document," GLD Concept Study Group, Jul. 2006. [Online]. Available: http://ilcphys.kek.jp/gld
- [6] "Detector outline document for the large detector concept," LDC Working Group, Jul. 2006.
- [7] T. Behnke, "The LDC detector concept," *Pramana*, vol. 69, no. 5, pp. 697–702, Nov. 2007.
- [8] A. Miyamoto, K. Nakayoshi, Y. Sugimoto, H. Ikeda, T. Nagamine, Y. Takubo, H. Yamamoto, and K. Abe, "FPCCD vertex detector R&D for ILC," Internal Note, Oct. 2007.
- [9] Y. Sugimoto, H. Ikeda, A. Miyamoto, T. Nagamine, Y. Takubo, and H. Yamamoto, "R&D status of FPCCD VTX," presented at the International Linear Collider Workshop 2008, Chicago, USA, Nov. 2008.
- [10] C. Damerell, "ILC vertex detector R&D report of review committee," ALCPG Workshop, Oct. 2007.
- [11] "DEPFET pixel vertex detector for the ILC," Internal Note, DEPFET Collaboration, Oct. 2007.

[12] M. Winter, "Development of swift and slim CMOS sensors for a vertex detector at the international linear collider," ILC Vetex Detector Review, Oct. 2007. [Online]. Available: http://iphc.in2p3.fr/Others.116.html

- [13] M. Winter, J. Baudot, A. Besson, C. Colledani, Y. Degerli, R. De Masi, A. Dorokhov, G. Dozière, W. Dulinski, M. Gélin, F. Guillox, A. Himmi, C. Hu-guo, F. Morel, F. Orsini, I. Valin, and G. Voutsinas, "MIP detection performances of a 100  $\mu$ s readout CMOS pixel sensor with digitised ouputs," presented at the International Linear Collider Workshop 2008, Chicago, USA, Nov. 2008.
- [14] M. Winter, "Development of swift, high resolution, pixel sensor systems for a high precision vertex detector suited to the ILC running conditions," DESY PRC report, Oct. 2009. [Online]. Available: http://iphc.in2p3.fr/Others.116.html
- [15] R. Turchetta, J. D. Berst, B. Casadei, G. Claus, C. Colledani, W. Dulinski, Y. Hu, D. Husson, J. P. Le Normand, J. L. Riester, G. Deptuch, U. Goerlach, S. Higueret, and M. Winter, "A monolithic active pixel sensor for charged particle tracking and imaging using standard VLSI CMOS technology," Nucl. Instr. and Meth. Phys. Res. A, vol. 458, pp. 677–689, Feb. 2001.
- [16] M. Winter, "Achievements and perspectives of CMOS pixel sensors for charged particle tracking," *Nucl. Instr. and Meth. Phys. Res. A*, vol. 623, pp. 192–194, Nov. 2010.
- [17] M. Winter, J. Baudot, A. Besson, G. Claus, A. Dorokhov, M. Goffe, Ch.Hu-Guo, F. Morel, I. Valin, G. Voutsinas, and L. Zhang, "Development of CMOS pixel sensors fully adapted to the ILD vertex detector requirements," presented at the International Workshop on Future Linear Colliders (LCWS'11), Granada, Spain, Sep. 2011.

## Chapter 2

# CMOS Pixel Sensors for Charged Particle Detection

CMOS Pixel Sensors (CPS) have been making steady progress towards the specifications of the ILD vertex detector. They have an attractive balance between granularity, material budget, readout speed, radiation tolerance and power dissipation. In this chapter, the operation principle, the architecture and characteristic of the CPS will be presented. Also the CPS design specifications for the vertex detector are described.

### 2.1 Detection Principle

CMOS pixel sensors have been proposed as an alternative option to CCD devices in image sensor since the early in 1990s. They are designed and fabricated in a standard CMOS technology. Usually there are two types of CMOS sensors: Passive Pixel Sensor (PPS) and Active Pixel Sensor (APS). In the former, only a photodiode is integrated in a pixel with selection switches and then directly connected to the readout circuit. However in the latter, there is an amplifier employed in each pixel to buffer the signals to the output. In this way the CMOS sensor can get better performances such as high signal-to-noise ratio (SNR), high readout speed and good scalability.

The use of CMOS pixel sensors (CPS) for charged particle tracking in subatomic physics emerged to take up the challenge of such a high performance vertexing, which calls for high granular and thin sensors providing  $\sim 100\%$  detection efficiency. Moreover, they are rather swift and radiation tolerant. In a CMOS sensor, the detector is integrated



Figure 2.1: CMOS sensor operation principle.

on a substrate through the standard industrial process. The essential aspect for sensors is the high doping, which realizes a sizable thickness of the depleted zone of a reversed biased diode [1]. Figure 2.1 shows the operation principle of the CMOS pixel sensor. The in-pixel transistors are integrated in the p-well  $(p^+)$ , while the sensing element is a reverse biased p-n diode based on the junction of n-well  $(n^+)$  and p-epitaxial layer. The p-well and  $p^{++}$  substrate generate a reflective barrier because of the different doping. The operation principle is described as follows. When the charged particles travel through the thin, almost undepleted epitaxial layer, they liberate charges. The carriers of the signal charge (electrons) diffuse thermally in the layer and are collected by the sensing diode. The signal will be processed by the in-pixel amplifier and transferred to the signal processing circuits by the source follower.

Because the epitaxial layer doping is a few orders of magnitude smaller than that of p-well and  $p^{++}$  substrate, a potential barrier exists at their boundaries. The built-in voltage is given by the following equation [2]

$$V = \frac{kT}{q} ln \frac{N_{sub}}{N_{epi}}. (2.1)$$

where k is the Boltzmann constant, q is the electron charge, T is the absolute temperature,  $N_{sub}$  and  $N_{epi}$  are the doping of the substrate and the epitaxial layer respectively.

The potential acts like a barrier to the charged carriers (electrons). The electrons lib-



Figure 2.2: A simplified diagram of the CMOS pixel sensor.

erated by the radiation in the epitaxial layer diffuse towards the n-well diode, where most of them are collected. Because of the reflective potential barriers, the whole pixel surface is fully sensitive. Since minimum ionising particle generates typically  $\sim 80$  electron-hole pairs per micrometer in the  $\sim 5$  - 15  $\mu$ m thick epitaxial layer, the signal charge ranges from a few hundreds to  $\sim 1000$  e<sup>-</sup>.

It should be noticed that the detector can be thinned down to a few tens of  $\mu m$  ( $\sim 50$   $\mu m$ ) with the standard CMOS technology, which is close to the epitaxial layer without degrading their mechanical support. This can greatly reduce the material budget. Also the fabrications of the sensor are cheap and the their update is fast.

### 2.2 CPS Architecture

CMOS pixel sensor is composed of a pixel array and signal processing micro-circuits. Figure 2.2 shows a simple architecture (MIMOSA-1) of the CMOS pixel sensor. In this simplest version, each pixel is equipped with three transistors: the transistor  $M_1$  for

resetting the sensing diode voltage, the transistor  $M_2$  connected to a source follower which transfers the voltage to the outside and the transistor  $M_3$  for addressing the pixel for the read-out and signal transfer. This architecture does not include any signal processing. In later versions, as described in section 2.2.1, each pixel is implemented with an in-pixel amplification for mitigating the noise sources of the signal, and a correlated double sampling operation for subtracting the average pixel noise.

The readout of the pixel array is achieved with addressing logic and amplifier. The addressing of the pixels is done with two shifters located on the left side and bottom of the matrix. The signal is transferred to the central amplifier through a source follower. Currently an integrated micro-circuit architecture was developed, where the signals delivered by the sensors are discriminated before being filtered by an integrated zero-suppression logic (see section 2.2.2).

#### 2.2.1 Pixel Circuit

The simplest 3 transistors (3T) pixel structure is shown in figure 2.3. The collected charge is converted into voltage through the parasitic capacitor of the p-n diode. The reset operation is used to compensate the leakage current. In this architecture, the leakage current of the diode discharges the capacitor ( $C_{parasitic}$ ), causing a voltage drop at the node K. Thus it affects the common mode voltage of the source follower, introducing a signal offset which significantly depends on the integration time. In general, this offset is considered as fixed pattern noise (FPN).

An improved design is the self biased (SB) pixel architecture (right of figure 2.3). It uses a forward biased diode while employing a reverse biased diode. In this structure, the node K can be treated as a floating node. When the reverse biased diode collects charge, the voltage of the node K drops. Simultaneously the capacitor ( $C_{parasitic}$ ) is recharged by the forward biased diode, and therefore the voltage can slowly recover with a large time constant. This method compensates the leakage current while providing a constant bias via the high resistive diode.

#### 2.2.1.1 Noise Analysis

In this part, the noise analysis is based on the simple 3T architecture. The noise is composed of fixed pattern noise (FPN) and temporal noise. The fixed pattern noise can



Figure 2.3: Structure of the 3T pixel (left) and self-biased (SB) pixel (right).

be reduced by the correlated double sampling (CDS) technique, as described in following section. Thus the temporal noise has become dominant in the pixel. The noise must be analyzed separately in the three operation phases, i.e. the reset, the integration and the readout.

#### • Noise during reset

The reset transistor  $M_1$  (see left of figure 2.3) is used to remove the accumulated charge and compensate the leakage current. In the real case, the reset operation should be done few times in order to avoid the saturation of the diode. When the reset switch  $M_1$  is closed, the readout switch  $M_3$  is open. Then the parasitic capacitor ( $C_{parasitic}$ ) is charged to almost VDD. In this case the average noise can be calculated by

$$\overline{V_{n,rst}^2} = \frac{kT}{C_d}. (2.2)$$

where  $C_d$  is the total parasitic capacitance at the node K. However in real systems the steady state is not obtained because of the insufficient duration of the reset phase. Therefore the average noise is divided by a factor of 2. The equation 2.2 changes to [3]

$$\overline{V_{n,rst}^2} = \frac{1}{2} \frac{kT}{C_d}.$$
 (2.3)

The reset noise is the dominant temporal noise and also can be mitigated by CDS technique.

#### • Noise during integration

The main noise during the integration phase is the shot noise due to the diode leakage current  $i_{leak}$ . At room temperature the mean value of this current is in the order of several fA, and the related noise contribution can be neglected in a large integration time ( $\sim$  a few milliseconds). However, this noise should be taken into account when the integration time increases. Without considering the second order effect the mean square value of the noise during integration time is given by

$$\overline{V_{n,int}^2} = \frac{qi_{leak}}{C_d^2} t_{int}.$$
(2.4)

#### • Noise during readout

During the read out phase, the switch  $M_1$  is open. The transistors  $M_2$  and  $M_3$ , the column switch  $M_{col}$ , the source follower  $M_{cur}$  and the line parasitic capacitor  $C_1$  become the main noise sources. The noise contribution introduced by each transistor can be calculated by the following equations

$$\overline{V_{n,read,M_2}^2} = \frac{2}{3} \frac{kT}{C_1} \frac{1}{1 + \frac{g_{m,M_2}(g_{ds,M_{col}} + g_{ds,M_3})}{g_{ds,M_3}g_{ds,M_{col}}}}.$$
(2.5)

$$\overline{V_{n,read,M_3}^2} = \frac{kT}{C_1} \frac{1}{\frac{1}{g_{ds,M_3}} + \frac{1}{g_{ds,M_{col}}} + \frac{1}{g_{m,M_2}}}.$$
(2.6)

$$\overline{V_{n,read,M_{col}}^2} = \frac{kT}{C_1} \frac{1}{\frac{1}{g_{ds,M_{col}}} + \frac{1}{g_{ds,M_2}} + \frac{1}{g_{m,M_2}}}.$$
(2.7)

$$\overline{V_{n,read,M_{cur}}^2} = \frac{2}{3} \frac{kT}{C_1} g_{m,M_{cur}} \left( \frac{1}{g_{ds,M_3}} + \frac{1}{g_{ds,M_{col}}} + \frac{1}{g_{m,M_2}} \right). \tag{2.8}$$

where  $g_{ds,M_x}$  and  $g_{m,M_x}$  are the common source-output impedance and transconductance of the transistor  $M_x$  respectively.

#### • Total noise

The total noise at the output of the pixel is the sum of the noise in above, which is given by

$$\overline{V_n^2} = \overline{V_{n,rst}^2} + \overline{V_{n,int}^2} + \overline{V_{n,read,M_2}^2} + \overline{V_{n,read,M_3}^2} + \overline{V_{n,read,M_{col}}^2} + \overline{V_{n,read,M_{col}}^2}.$$
 (2.9)



Figure 2.4: Common source amplifier (left) and improved amplifier (right).

#### 2.2.1.2 In-pixel Amplifier

In the 3T pixel structure, the signal generated by the charge collecting diode capacitor ( $\sim 10$  fF) is typically in the order of a few mV. This signal is read out by the source follower and the voltage value is reduced by a gain factor ( $\sim 0.8$ ). At the same time, the small signal is influenced by the temporal noise which leads to a low signal-to-noise ratio (SNR). And therefore in-pixel amplification becomes necessary to increase the signal-to-noise ratio. The design of such an amplifier is constrained by several factors. It should be compromised between speed, noise, gain and power consumption. In addition, the pixel-to-pixel dispersion needs to be considered. It should be noted that in a twin-well CMOS technology, the difficulty of in-pixel amplifier design is that only NMOS transistors can be used, because any additional Nwell used to fabricate PMOS transistor would compete with sensing Nwell diode for charge collection.

The noise contribution to the amplified signal can be significant. Thus in order to maximize the signal-to-noise ratio, obtaining a higher amplifier gain is necessary. Standard common source (CS) amplifier (left of figure 2.4) can be used in the pixel [4], but it can not achieve a sufficient gain (less than 5). The gain of the amplifier is given by

$$Gain = \frac{V_{out}}{V_{in}} = \frac{g_{m1}}{q_{m2} + q_{mb2} + q_{ds1} + q_{ds2}}.$$
 (2.10)

where  $g_m$ ,  $g_{mb}$  and  $g_{ds}$  are the transconductance, body-effect transconductance and com-



Figure 2.5: Improved common source amplifier with feedback.

mon source-output impedance of the transistor, respectively.

In order to improve the gain of the amplifier, a special biasing transistor is used on the common source amplifier (right of figure 2.4) [5]. The gain of the amplifier is improved by reducing  $g_{m2}$  when the frequency is larger than  $g_{m3}/C_{gs2}$ . The gain can be given by

$$Gain = \frac{V_{out}}{V_{in}} = \frac{g_{m1}}{g_{mb2} + g_{ds1} + g_{ds2}}. (2.11)$$

The improved gain of the amplifier increases by a factor of two. As a high gain can be achieved with the same  $g_{m1}$ , the power consumption can be reduced by slightly decreasing  $g_{m1}$ . However, the operation point of the amplifier will change due to the CMOS process variation. Therefore a feedback method can be used to stabilize the operation point.

As shown in figure 2.5, the feedback in the amplifier is a low pass filter with a large time constant  $(C_1/g_{m4})$  while providing bias for the reverse diode  $D_1$  via high resistive diode  $D_2$ . As the same as the self biased pixel circuit, the leakage current of the reverse biased diode  $D_1$  can be compensated by the forward biased diode  $D_2$ .

The discharge time of the low pass filter and diode capacitance are very large, which accumulate unwanted information at the output of the amplifier. However, this can be reduced by the in-pixel correlated double sampling technique.



Figure 2.6: In-pixel CDS and enhanced CS amplifier with feedback.

#### 2.2.1.3 In-pixel CDS

Each pixel employs a correlated double sampling (CDS) circuit which is based on a clamping technique [6]. As shown in figure 2.6, the CDS consists of a MOSCAP and a clamping switch (RST2). The MOS capacitor (MOSCAP) should remain in inversion for a better linearity. The first sampled output voltage of the amplifier is stored on the clamping capacitor. The second sample is subtracted by the stored voltage. The enhanced common source (CS) amplifier with feedback is controlled by a PWR\_ON signal to reduce the power consumption. The pixel signal after CDS is buffered by the source follower. RD and CALIB are used to memorize the output signal level and the reference of the pixel output stage, respectively. This second double sampling operation reduces the pixel-to-pixel dispersion.

The use of CDS is beneficial in the reduction of the overall noise, which has the following aspects:

- it suppresses the low frequency (1/f) noise.
- it removes the reset noise (kT/C).
- it mitigates the fixed pattern noise caused by pixel-to-pixel non-uniformity.

#### 2.2.2 Signal Processing Circuit

Since the CMOS pixel sensors were devoted to the achievement of a fast and full scale sensor required for the vertex detector, the readout speed has become a major driving parameter of the development of CPS, which needed to come close to  $100~\mu s$ . Therefore full digital output data and on-chip real time data processing are required for this purpose. An integrated micro-circuit architecture was developed, where the signals transferred through the column line are discriminated before being filtered by an integrated zero-suppression logic. Also the fast readout is achieved by grouping the pixels in each column. Thus the parasitic column line capacitance is greatly reduced. One of the front-end readout circuit processing the signal from the pixel array employs the discriminator ending each column. The other one after being discriminated combines a zero-suppressing logic to filter the digital data.

#### 2.2.2.1 Discriminator

Column-parallel readout architecture has become increasingly popular to achieve high frame rate, allowing reading up to 10 k frames/s [7]. Therefore column-level analog-to-digital converters (ADCs) will be implemented below the pixel array. Accommodating the pixel readout in row by row rolling shutter mode, the ADCs have to convert those signals into digital data at high conversion speed. Thus a high speed ADC with low power consumption and small layout area is required. The choice of the number of bits depends on the required spatial resolution and the pixel size. Here, in order to achieve low power consumption and high speed without losing the spatial resolution, a column-level discriminator is employed [8]. The discriminator (i.e. comparator) transforms the analog output signal of the pixel into a 1-bit digital data by comparing it with a threshold value. As the signal from the pixel is very small ( $\sim 1 \text{ mV}$ ), the discriminator needs to have a high accuracy comparison (small offset) while maintaining a high speed. Therefore the design of this discriminator is not a trivial task.

For cancelling the offset of the discriminator, a better way is to use the auto-zeroing technique [9, 10]. The principle of this technique is as follows. First, the offset voltage is sampled and stored in a capacitor (called auto-zeroing capacitor). This stored voltage can be referred to the input voltage. Then in the next phase the stored offset voltage is subtracted at the output and eventually cancelled. There are two different auto-zeroing



Figure 2.7: (a) Block diagram of the discriminator and (b) related timing.

technique. One is called the input offset storage (IOS) technique and the other is called the output offset storage (OOS) technique. More details about these two techniques are described in Chapter 3.

In order to achieve a higher accuracy comparison, the discriminator combines these two offset cancellation techniques. Figure 2.7(a) shows the architecture of the discriminator. It is composed of two preamplifiers, a latch and a pair of auto-zeroing capacitors. The preamplifier provides sufficient gain to compensate for the input referred offset voltage of the dynamic latch and isolates the latch kickback noise. The dynamic latch does not consume static current, which is suitable for power efficient design.

Figure 2.7(b) shows the timing of the discriminator. The operation is as follows. During the offset cancellation mode or  $\phi 1$  clock phase, switches S1, S1', S4 and S4' are turned on while other switches are turned off. The auto-zeroing capacitor (C1 and C2) will be charged with the offset of the amplifier ( $G_0$  and  $G_1$ ). During the tracking mode or  $\phi 2$  clock phase, switches S1, S1', S4 and S4' are turned off which opens the offset compensation. In order to avoid the shared voltage between Vref1 and Vref2, switch S1

is turned off earlier while switch S2 is turned on. Then the input signal starts to be amplified by a gain factor of the preamplifier, resulting an output voltage by subtracting the stored offset. During the comparing mode or  $\phi 3$  clock phase, the latch compares the difference between the two outputs of the preamplifier. In this method, the total input referred residual offset of this discriminator is given by

$$V_{OSR} = \frac{\Delta V_{off,G_1}}{G_0(1+G_1)} + \frac{\Delta Q}{G_0C} + \frac{\Delta V_{off,Latch}}{G_0G_1}.$$
 (2.12)

where  $G_0$  and  $G_1$  are the static gains of the preamplifiers,  $\Delta V_{off,G_1}$  and  $\Delta V_{off,Latch}$  are the referred input offsets of the preamplifier and latch, respectively,  $\Delta Q$  is the subtraction of the charge injection between switch S4 and S4'.

All column-level discriminators use a threshold value for comparisons. Then it will be adjustable and optimal to set its value (usually  $\sim 5$ -6 times of the noise standard deviation) to ensure  $\sim 100\%$  detection efficiency and low fake rate ( $\sim 10^{-4}$ ). The discriminator can get a low power consumption of  $\sim 250~\mu W$  (MIMOSA-22) [11]. The "S" curves can be achieved by scanning the external threshold voltage in order to evaluate the offset, temporal noise and fixed pattern noise (FPN) [12, 13].

#### 2.2.2.2 Zero Suppression

The raw data flow of MAPS for high energy physics experiments can reach up to several Gbits/s per chip. This implies the use of a fast zero-suppression technique in order to increase the readout speed. The zero suppression micro-circuit is based on row by row sparse data scan readout [14, 15], which is located at the end of the discriminator. This allows a data compression factor ranging from 10 to 1000, depending on the hit density per frame.

Figure 2.8 shows the hit recognition and encoding of the pixels in a matrix frame [16]. The encoding of the pixels delivering a signal above the threshold voltage is performed in terms of "states". Each state implies a group of successive pixels in one row. The format of the state is composed of the address of the first hit pixel with two extra bits encoding continuous pixels.

The principle of the zero-suppression circuit is shown in figure 2.9 [17]. Here, we take an example of a full scale sensor (called MIMOSA-26) incorporating the zero suppression logic. It includes a pixel array of 1152 columns of 576 pixels. The zero suppression is



Figure 2.8: Hit recognition and encoding of the pixels.

organized in 3 stages. In the  $1^{st}$  stage, the 1152 columns are separated into 18 banks, each bank including 64 columns. A parallel scan based on a priority look ahead (PLA) encoding is performed in each bank. This allows finding up to N states per bank which result from the encoding of 4 contiguous hit pixels (first output equals "1"). This stage handles the column address encoding and the continuity of the algorithm between the adjacent banks for the entire row. The  $2^{nd}$  stage combines the results of 18 banks with PLA. Its multiplexing logic accepts up to M states per row and includes row and bank address. The values of N and M are derived from a statistical study based on the hit density. The results are stored in the  $3^{rd}$  stage, i.e. a memory of 2 foundry's IP SRAMs, with a capacity up to 48 Kbits. The two SRAMs allow a continuous readout. While one buffer stores the compressed data of a frame, the other one is transmitted to the outside via two LVDS transmitters at a frequency of up to 160 MHz. The memory capacity and the transfer frequency are adapted to each application.

The zero suppression circuit has been well used in MIMOSA-26 designed for EUDET beam telescope and ULTIMATE equipping the STAR-PXL sub-system. The results shows



Figure 2.9: Block diagram of the zero suppression.

that the zero suppression circuit decodes correctly column and row addresses of a hit pixel with no information loss. The estimated digital power consumption for the full size logic is about 135 mW.

### 2.3 State-of-the-art CPS

More than 30 different CMOS pixel sensors have been designed and fabricated. They offer attractive features for the requirements and easily match the targeted granularity and material budget, and do not necessitate a cooling system adding substantial material budget inside the fiducial volume of the detector.

The sensor MIMOSA-26 [18] and ULTIMATE (called MIMOSA-28) [19, 20] are considered as the state-of-the-art of the CPS technology for charged particle detection. They are designed to equip the EUDET beam telescope and the new STAR vertex detector, respectively. Both of them are fabricated in the AMS 0.35  $\mu$ m CMOS technology. Their



Figure 2.10: EUDET beam telescope.

pixels are grouped in columns readout and terminated with column-level discriminators. Inside each column, the pixels are readout in parallel with a typical readout time of  $\leq$  200 ns per row. This so-called rolling shutter mode exhibits the great advantages of reducing the power consumption of the whole pixels array. MIMOSA-26 and ULTIMATE have power consumption of about 250 and 150 mW/cm<sup>2</sup> respectively.

The pixel pitch is around 20  $\mu$ m for both sensors. Each pixel incorporates a correlated double sampling based on a clamping technique to reduce the average pixel noise. A preamplifier in each pixel is used to improve the signal to noise ratio (SNR). The setting parameters of the sensor are remotely programmable through the JTAG circuits integrated in the sensor. A zero-suppression micro-circuit integrated in the chip compresses the signals above the threshold voltage. Then the compressed data including the hit pixel addresses are buffered in 2 SRAMs before being transmitted to the outside.

#### 2.3.1 MIMOSA 26

EUDET is a coordinated European effort based on research and development for the next generation of large scale particle detectors for the ILC. The beam test has been performed by joint research activity (JRA). The telescope is composed of 2 arms of 2  $\times$  3 CPS measurement plans, as shown in figure 2.10, providing an extrapolated resolution better than 2  $\mu$ m. The reference detectors have to cover a sensitive area of more than



Figure 2.11: Block diagram of MIMOSA-26.

2 cm<sup>2</sup> to be read in 10 K frames per second and to cope with a rate of  $10^6$  particles/cm<sup>2</sup>/s. MIMOSA-26 is the first full scale sensor aiming to equip the final version of the EU-DET beam telescope [21]. It combines the architecture of two earlier prototypes: one is MIMOSA-22 addressing the upstream part of the signal detection and the signal processing chain; the other one is SUZE-01 dedicated to data sparsification and formatting. Figure 2.11 shows the overall architecture of MIMOSA-26, covering an area of 224 mm<sup>2</sup>. It is composed of 1152 columns of 576 pixels with pixel pitch of 18.4  $\mu$ m. With row by row rolling shutter mode at a 80 MHz main clock frequency, the whole pixel matrix is read out in 112  $\mu$ s.

The organization of the structure is as follows. The rolling shutter mode is steered through a row selector and pixel sequence located on the left side. The collecting charges are converted into signal voltage through an Nwell/Pepi diode, and then amplified in each pixel by an in-pixel amplification stage. The useful information is obtained on the subtraction of two successive frames. Each column is terminated with a offset compensated discriminator. The discriminator outputs are connected to a zero suppression circuit, organised in pipeline mode. An optional PLL allows a high frequency clock generation based on a low frequency reference input clock. The on-chip programmable biases circuit and the test block are set via a JTAG controller.

The detection performance of several sensors were assessed with minimum ionising



Figure 2.12: MIMOSA-26 beam test results obtained at the CERN-SPS with  $\sim 120$  GeV charged particles. The detection efficiency (black curve), the fake hit rate (blue curve) and the single point resolution (red curve) are evaluated with various discriminator thresholds

geV pions at the CERN-SPS. Typical values including the detection efficiency, pixel fake hit rate (due to noise fluctuation above threshold) and single point resolution are obtained at a temperature of about 20°C with various threshold voltages. The detection efficiency is close to  $\sim 100\%$  for low enough threshold value, corresponding to  $\sim 6$  times the noise standard deviation, to keep the fake hit rate below  $10^{-4}$ . The single point resolution is slightly higher than the ILD specification of 3  $\mu$ m. It results from an impact position reconstruction based on the center of gravity of the hit pixels in a cluster. These performances allow its straightforward extension for the PIXEL vertex detector for the STAR experiment.

#### **2.3.2 ULTIMATE**

The STAR experiment at the Brookhaven National Laboratory has upgraded its inner detector (Heavy Flavor Tracker) based on CMOS pixel sensor. The requirements for the



Figure 2.13: Block diagram of ULTIMATE.

STAR pixel detector are similar to those of EUDET. ULTIMATE is the final sensor for this application, which is the extension from MIMOSA-26 [22]. Figure 2.13 shows the block diagram of ULTIMATE. It includes a pixel array of 960  $\times$  928 pixels with pixel pitch of 20.7  $\mu$ m, covering an area of  $\sim 3.8 \text{ cm}^2$ . The whole pixel matrix has an integration time of 185.6  $\mu$ s with rolling shutter mode read out of 200 ns per row.

The discriminator outputs are processed through an integrated zero suppression logic and the the results are stored in 2 SRAMs allowing a continuous readout. This architecture is capable to cope with a hit rate of 10<sup>6</sup> hits/cm<sup>2</sup>/s. The sensor also includes enhanced testability with large number of configurations in each part (pixels, discriminators, zero suppression and data transmission). The on-chip programmable bias DACs, the threshold voltage for the discriminators, the test mode selection are set via a JTAG controller. An on-chip voltage regulator is used to provide the pixel clamping voltage in order to minimize interferences on the critical node. PLL and voltage regulators for analogue power supply were implemented as individual blocks in order to evaluate their performances.



Figure 2.14: Beam test results of ULTIMATE with power supply of 3.3 V, before (left) and after (right) exposure to a dose of 150 kRads.



Figure 2.15: Beam test results of ULTIMATE with power supply of 3 V, before (left) and after (right) exposure to a dose of 150 kRads.

The detection performances of ULTIMATE were evaluated with a  $\sim 120$  GeV  $\pi^-$  beam at the CERN-SPS. Six sensors with a 20  $\mu$ m thick high-resistivity epitaxial layer were mounted in a beam telescope configuration. One of them was exposed to a dose of 150 kRads. The tests were performed at 30°C before and after irradiation, with an analogue power supply of 3.3 V and 3 V, respectively. Figure 2.14 and figure 2.15 show the test results of the detection efficiency, pixel fake hit rate and single point resolution as a function of the discriminator threshold. An efficiency above 99.5% was obtained with a fake hit rate below  $10^{-4}$ . The single point resolution is better than 4  $\mu$ m. The performances of ULTIMATE even do not change with low power supply of 3 V. The

total power consumption can be reduced by 6% with the low power supply, and therefore allowing it below 150 mW/cm<sup>2</sup>. These results comply with the STAR-PXL specifications.

The influence of the epitaxial layer thickness on the sensor detection performance is under studies, based on beam tested sensor with either a 15 or a 20  $\mu$ m thick layer. Present results show that both sensors exhibit almost similar performances in terms of the detection efficiency, pixel fake hit rate and single point resolution with all operating conditions.

### 2.4 Sensor Concept for the ILD Vertex Detector

As described in the previous section, the compliance of CPS with the single point resolution and material budget specifications of the ILD vertex detector are not questionable. The measured radiation tolerance of MIMOSA-26 and ULTIMATE is also expected to be sufficient for the running conditions. The remaining challenges are whether the readout speed can accommodate the high hit rate generated by the beam related background. Also the power consumption should be compliable with a non-disturbing cooling stuff such as air flow.

The particle rate is dominated by beamstrahlung electrons, and decrease rapidly when moving away from the interaction region. For instance, the three double-sided layers featuring average radii of 17, 38 and 59 mm, faces a hit density varying by one order of magnitude from one layer to the next. On the other hand, the innermost layer, which is by far the most exposed to beamstrahlung background, is also the smallest one, standing for only  $\sim 10\%$  of the total detector surface. These features guide the sensor concept, together with design specifications. The double sided option offers several important advantages, and will be discussed in more detail below. Three different sensors are considered here, each optimised for a balance between the single point resolution, the read out speed and the power consumption.

### 2.4.1 Sensor Equipping the Innermost Layer

CPS complying with vertex detector specifications are derived from MIMOSA-26, with modified spatial resolution and read-out time, and adapted to different requirements for distinct layers. The innermost layer should exhibit a single point resolution better than



Figure 2.16: Schematic of the combination of AROM (elongated pixels for time resolution) and MIMOSA (square pixels for spatial resolution) sensors equipping the double-sided ladder.

3  $\mu$ m associated to a short integration time (less than 10  $\mu$ s) because of the ILC beam structure and the massive pair background. The beam background dictates that during the collisions of a single bunch train, sensors are supposed to read out twenty or more times to maintain the pixel occupancy below 1% [23]. The conflict between high granularity and fast read-out is resolved by equipping the innermost ladders with two different types of sensors [24], one achieving the required spatial resolution and one proving a fast time stamp. The combination of the two different sensors are shown in figure 2.16.

The inside sensor, called MIMOSA-in, aims to provide the spatial resolution. Equipping the inside ladder with square pixels of 16  $\mu$ m pitch and binary read-out, the required  $\leq 3~\mu$ m spatial resolution can be achieved. A crucial step here is to extend the MIMOSA-26 with a pixel pitch of 18.4  $\mu$ m to  $\sim 16~\mu$ m. In the rolling shutter mode, the read-out time is proportional to the number of pixels per column. Therefore the read out would be twice faster, i.e.  $\sim 50~\mu$ s short, in case of a double-sided read out architecture where each column is split into two halves [25]. This value is expected to be appropriate for the innermost layer, which is exposed to the highest hit rate.

The other sensor, called AROM<sup>1</sup>, features rectangular pixels with a 4-5 times longer pitch, resulting in five times less pixels per column and therefore in a 10  $\mu$ s time resolution.

<sup>&</sup>lt;sup>1</sup>standing for Accelerated Read Out MIMOSA sensor.

The spatial resolution of elongated pixels has been studied with the MIMOSA-22 AHR sensor, which features  $18.4 \times 73.2 \ \mu\text{m}^2$  pixels. Based on beam test results, its spatial resolution is expected to be  $\sim 6 \ \mu\text{m}$  in both directions with staggered pixels.

During the last quarter of 2011, a prototype sensor (called MIMOSA-30) was fabricated aiming to demonstrate the feasibility of the VTX inner layer sensors. It is composed of two parts featuring the design concepts for each side of the sensors. The first part, optimised for high spatial resolution, consists of square pixels of 16  $\mu$ m pitch; the second part, optimised for timing stamping, consists of elongated pixels of 16  $\times$  64  $\mu$ m<sup>2</sup>. The pixel matrix are read out in double side, terminated with a discriminator in each column. The expected performances for the first part is a spatial resolution of  $\leq$  3  $\mu$ m and a read-out time of  $\leq$  50  $\mu$ s, while for the second part, the expected spatial resolution and time resolution are  $\sim$  6  $\mu$ m and  $\sim$  10  $\mu$ s, respectively. First test results of the sensor show that these integration times have been achieved.

The double-sided sensor, with  $\sim 2$  mm apart, combing a very precise sensor with a much faster one provides a tight correlation between the two. It will thus achieve a spatial resolution of  $\leq 3~\mu m$  combined with  $\sim 10~\mu s$  time stamp. The ladder is implemented with the PLUME<sup>1</sup> collaboration and aims to provide a total material budget of 0.3% of radiation length.

### 2.4.2 Sensor Equipping the Outer Layers

Due to the reduced beamstrahlung induced hit density, the outer layers have less constrains in term of spatial resolution and read-out speed. A single point resolution of  $\sim$  4  $\mu$ m combined with an integration time shorter than 100  $\mu$ s are expected to constitute a valuable trade-off. Moreover, the outer layers, which are the largest ones, standing for about 90% of the total VTX surface. The sensors equipping the outer layers, called MIMOSA-out, are expected to have 3 times less power consumption [26]. In this case, the design effort focuses on minimizing the power consumption. In the rolling shutter read-out architecture, the power consumption is proportional to the number of columns. Thus the sensor will be read out in one side and implemented with pixels of 35  $\times$  35  $\mu$ m<sup>2</sup>, i.e. 4 times larger than the pixels of the innermost layer. The resolution versus pixel pitch is shown in figure 2.17 [27]. In this way, the number of columns, thus the power

<sup>&</sup>lt;sup>1</sup>standing for Pixelated Ladder with Ultra-low Material Embedding.



Figure 2.17: Resolution vs pixel pitch.

| Pixel pitch (µm)   | 20       | 20          | 30       | 35           | 40       |
|--------------------|----------|-------------|----------|--------------|----------|
| Number of bits     | 12       | 4           | 12       | 4            | 12       |
| Epitaxial layer    | low R    | low R       | low R    | high R       | low R    |
| Spatial resolution | measured | reprocessed | measured | extrapolated | measured |
| $(\mu \mathrm{m})$ | 1.5      | 1.7         | 2.1      | $\leq 4$     | 3        |

Table 2.1: Extrapolation from previous measurements. The spatial resolution depends on the pixel pitch, epitaxial layer and number of bits.

consumption, is reduced by a factor of 4 with respect to the innermost layer. The loss in spatial resolution due to the sizeable pitch can be compensated by replacing the end of column discriminators with  $\leq$  4-bit ADCs. The proposed architecture with pixel pitch of 35  $\mu$ m, ended with  $\leq$  4-bit ADCs, is estimated to provide a single point resolution of  $\sim$  3.5  $\mu$ m. The extrapolated result is shown in table 2.1 [28]. Indeed the impact position estimation improves when using the charge center of gravity for instance. Therefore using this way can reduce the power consumption while keeping necessary spatial resolution.

This thesis dedicates to the design of a sizable sensor prototype with 4-bit column-level

2.5. Summary

ADCs aiming to equip the outer layers. This is the first CMOS pixel sensor integrating column-level ADCs for the ILD vertex detector. The column-parallel ADC needed for the sensors requires seeking a compromise among low noise, low power consumption, high conversion rate and minimal active area.

In this application, the design of such a column-parallel ADC is constrained by several factors. The ADC needs continuous signal conversion, and therefore does not have dead time. In order to achieve an integration of  $100\mu s$  in a full size sensor (about  $2 \times 2$  cm<sup>2</sup>), the ADC accommodating the pixel read-out in parallel is required to work at a frequency of 6.25 MHz (160 ns/row). Due to the cooling system limitation which contributes as well as to the material budget, the power consumption of the column ADC must be minimized, which should be less than 500  $\mu W$ . To decrease the dead zone of the sensor, the dimension should be minimized (less than 1 mm), enabling the pixel array with the available space expected during environment vertex detecting. The readout chain ought to introduce very low noise in order to accommodate the modest pixel signal. Also offset compensation between different column ADCs should be considered. In Chapter 3, the ADC architectures are reviewed and the choice of the suitable architecture to our application is presented.

### 2.5 Summary

CMOS pixel sensor are making steady progress towards the specifications of the ILD vertex detector. The detection principle of CPS has been reviewed in this chapter and the components including pixel and signal processing circuit are described in more detail. Recent developments are summarized, which show that these devices are close to comply with all major requirements. The requirements are guided by the double-sided ladder concept, which allows combining two different sensors for instance, one dedicated to spatial resolution and the other one to time resolution. In particular, the innermost layer motivates an effort concentrating on a high read-out speed design because of the beam-strahlung background. The outer layers have less constrains in term of spatial resolution and read-out speed, moving the design effort on minimizing the power consumption. The thesis dedicates to the design of a sizable sensor prototype integrated with 4-bit column-level ADCs in order to equip the outer layers. In the next chapter, a detailed description of the column-level ADC suitable for the sensor prototype will be presented.

### **Bibliography**

[1] G. Deptuch, M. Winter, W. Dulinski, D. Husson, R. Turchetta, and J. L. Riester, "Simulation and measurements of charge collection in monolithic active pixel sensors," *Nucl. Instr. and Meth. Phys. Res. A*, vol. 465, pp. 92–100, Jun. 2001.

- [2] B. Dierickx, G. Meynants, and D. Scheffer, "Near 100% fill factor CMOS active pixels," presented at the IEEE CCD & AIS Workshop, Brugge, Belgium, Jun. 1997.
- [3] H. Tian, B. Fowler, and A. El Gamal, "Analysis of temporal noise in CMOS APS," proceeding of SPIE Electronic Imaging '99 Conference, vol. 3649, Jan. 1999.
- [4] A. Dorokhov and on behalf of the CMOS & ILC group of IPHC, "Optimization of amplifiers for monolithic active pixel sensors," presented at the Topical Workshop on Electronic for Particle Physics, Prague, Czech Republic, Sep. 2007.
- [5] A. Dorokhov, "NMOS-based high gain amplifier for MAPS," presented at the VI<sup>th</sup> International metting on front end electronics for high energy, nuclear and space applications, Perugia, Italy, May 2006.
- [6] G. Deptuch, G. Claus, C. Colledani, Y. Degerli, W. Dulinski, N. Fourches, G. Gaycken, D. Grandjean, A. Himmi, C. Hu-Guo, P. Lutz, M. Rouge, I. Valin, and M. Winter, "Monolithic active pixel sensor with in-pixel double sampling operation and column-level discrimination," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 5, pp. 2313–2321, Oct. 2004.
- [7] C. Hu-Guo, "10000 frames per second readout MAPS for the EUDET beam telescope," presented at the Topical Workshop on Electronic for Particle Physics, Paris, France, Sep. 2009.
- [8] Y. Degerli, G. Deptuch, N. Fourches, A. Himmi, Y. Li, F. Orsini, and M. Szelezniak, "A fast monolithic active pixel sensor with pixel-level reset noise suppression and bianry outputs for charged particle detection," *IEEE Trans. Nucl. Sci.*, vol. 52, no. 6, pp. 3186–3193, Dec. 2005.
- [9] Y. Degerli, N. Fourches, M. Rouge, and P. Lutz, "Low-power autozeroed high-speed comparator for the readout chain of a CMOS monolithic active pixel sensor based vertex detector," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 5, pp. 1709–1717, Oct. 2003.

[10] Y. Degerli, "Design of fundamental building blocks for fast binary readout CMOS sensors used in high-energy physics experiments," *Nucl. Instr. and Meth. Phys. Res.* A, vol. 602, pp. 461–466, 2009.

- [11] C. Hu-Guo, "Design and characterisation of a fast architecture proving zero suppressed digital output integrated in a high resolution CMOS pixel sensor for the STAR vertex detector and the EUDET beam telescope," presented at the Topical Workshop on Electronic for Particle Physics, Naxos, Greece, Sep. 2008.
- [12] Y. Degerli, M. Besancon, A. Besson, G. Claus, G. Deptuch, W. Dulinski, N. Fourches, M. Goffe, A. Himmi, Y. Li, F. Orsini, and M. Szelezniak, "Performance of a fast binary readout CMOS active pixel sensor chip designed for charged particle detection," *IEEE Trans. Nucl. Sci.*, vol. 53, no. 6, pp. 3949–3955, Dec. 2006.
- [13] Y. Degerli, A. Besson, G. Claus, M. Combet, A. Dorokhov, W. Dulinski, M. Goffe, A. Himmi, Y. Li, and F. Orsini, "Development of binary readout CMOS monolithic sensors for MIP tracking," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 1, pp. 354–363, Feb. 2009.
- [14] J. J. Jaeger, C. Boutonnet, P. Delpierre, J. Waisbard, and F. Plisson, "A sparse data scan circuit for pixel setector readout," *IEEE Trans. Nucl. Sci.*, vol. 41, no. 3, pp. 632–636, Jun. 1994.
- [15] K. Einsweiler, A. Joshi, S. Kleinfelder, L. Luo, R. Marchesini, O. Milgrome, and F. Pengg, "Dead-time free pixel readout architecture for ATLAS front-end IC," *IEEE Trans. Nucl. Sci.*, vol. 46, no. 3, pp. 166–170, Jun. 1999.
- [16] A. Himmi, G. Doziere, O. Torheim, C. Hu-Guo, and M. Winter, "A zero suppression micro-circuit for binary readout CMOS monolithic sensors," presented at the Topical Workshop on Electronic for Particle Physics, Paris, France, Sep. 2009.
- [17] C. Hu-Guo, J. Baudot, G. Bertolone, A. Besson, A. S. Brogna, C. Colledani, G. Claus, R. De Masi, Y. Degerli, A. Dorokhov, G. Doziere, W. Dulinski, X. Fang, M. Gelin, M. Goffe, F. Guilloux, A. Himmi, K. Jaaskelainen, M. Koziel, F. Morel, F. Orsini, M. Specht, Q. Sun, I. Valin, and M. Winter., "CMOS pixel sensor development: a fast read-out architecture with integrated zero suppression," *Journal of Instrmentation JINST*, vol. 4, p. P04012, Apr. 2009.

[18] —, "First reticule size MAPS with digital output and integrated zero suppression for the EUDET-JRA1 beam telescope," *Nucl. Instr. and Meth. Phys. Res. A*, vol. 623, pp. 480–482, 2010.

- [19] I. Valin, C. Hu-Guo, J. Baudot, G. Bertolone, A. Besson, C. Colledani, G. Claus, A. Dorokhov, G. Doziere, W. Dulinski, M. Gelin, M. Goffe, A. Himmi, K. Jaaskelainen, F. Morel, H. Pham, C. Santos, S. Senyukov, M. Specht, G. Voutsinas, J. Wang, and M. Winter., "A reticle size CMOS pixel sensor dedicated to the STAR HFT," *Journal of Instrmentation JINST*, vol. 7, p. C01102, Jan. 2012.
- [20] M. Winter, "Development of CMOS pixel sensors fully adapted to the ILD vertex detector requirements," presented at the International Workshop on Future Linear Colliders LCWS'11, Granada, Spain, Sep. 2011.
- [21] C. Hu-Guo, "Achievements & perspectives of MIMOSA sensors (MAPS) for vertexing applications," presented at the Yearly workshop on vertex detectors and related techniques Vertex 2009, Putten, Sep. 2009.
- [22] —, "ULTIMATE: a high resolution CMOS pixel sensor for the STAR vertex detector upgrade," presented at the Topical Workshop on Electronic for Particle Physics, Aachen, Germany, Sep. 2010.
- [23] A. Nomerotski and et al., "PLUME collaboration: ultra-light ladders for linear collider vertex detector," Nucl. Instr. and Meth. Phys. Res. A, vol. 650, pp. 208–212, 2011.
- [24] M. Winter, "High resolution vertexing based on CMOS sensors with microsecond time stamping," presented at the ACFA ILC Worshop KILC12, Daegu, Korea, Apr. 2012.
- [25] —, "Status of the CMOS pixel sensor development for the VXD at 500 GeV and 1 TeV," presented at the ILD Workshop 2012, Fukuoka, Japan, May 2012.
- [26] —, "Power consumption of CMOS sensors for an ILD vertex detector," presented at the Linear Collider Power Distribution and Pulsing Workshop, LAL, Orsay, May 2011.

[27] A. Besson, "CMOS sensors with high resistivity epitaxial layer," presented at the Europhysics Conference on High-Energy Physics 2011, Grenoble, France, Jul. 2011.

[28] J. Baudot, "An ILD vertex detector with CMOS sensors," presented at the ILD Workshop, LAL Orsay, Paris, May 2011.

## Chapter 3

# Column-Level Analog-to-Digital Converter for CPS

The sensor concept for the ILD vertex detector has been introduced in the previous sections. The framework of this thesis was motivated on designing a sensor prototype integrated with column-level A/D converters (ADC) for the outer layers. In this chapter, the column-level ADC is described in more detail. The performance of an ADC can be specified in various aspects, including DC specifications and dynamic specifications. The first section of this chapter describes the performance parameters of a column-level ADC. Also different techniques are presented in order to eliminate the non ideal behaviors. In the next section, different ADC architectures are briefly reviewed, and then the column-level architecture suitable for our application are presented, including its basic building blocks.

### 3.1 Specifications of A/D Converters

### 3.1.1 Quantization Error

An ideal ADC represents all analog inputs within a certain range by a limited number of digital output codes. Figure 3.1 shows the diagram of the 3-bit ADC. Each digital code represents a fraction of the total analog input range. Since the analog scale is continuous, while the digital codes are discrete, there is an introduced quantization error [1]. As the number of digital codes increases, the corresponding step width gets smaller and the



Figure 3.1: The ADC transfer function.

transfer function curve approaches an ideal straight line. The steps are designed to have transitions such that the midpoint of each step corresponds to the point on the ideal line.

The width of one step is defined as 1 LSB (least significant bit) and this is often used as the reference unit for other specifications. It is also a measure of the resolution of the converter since it determines the number of divisions or units of the full analog input range. Therefore, one half LSB represents on half of the analog resolution. The resolution of an ADC is usually expressed as the number of bits in its digital code. For example, an ADC with an n-bit resolution has  $2^n$  possible digital codes which define  $2^n$  steps. The first step and the last step are only on half of a full width, therefore the full scale range (FSR) is divided into  $2^n$ -1 steps. Hence, 1 LSB in an n-bit converter can be described as



Figure 3.2: Differential nonlinearity (DNL).

the following equation:

$$1LSB = V_{FSR}/(2^n - 1). (3.1)$$

## 3.1.2 Differential Nonlinearity

The differential nonlinearity (DNL) error describes the difference between an actual step width and the ideal value of 1 LSB. Thus it can be defined as follows:

$$DNL(k) = SW(k) - 1LSB. (3.2)$$

where SW(k) represents the step width of the output code k. If the step width equals to 1 LSB, the differential nonlinearity error is zero. If the DNL exceeds 1 LSB, that means the converter can become nonmonotonic and miss the digital code. The differential nonlinearity error is shown in figure 3.2.



Figure 3.3: Integral nonlinearity (INL).

#### 3.1.3 Integral Nonlinearity

The integral nonlinearity (INL) error, as shown in figure 3.3 is the deviation of the output code of the actual function from a straight line. This straight line can be drawn between the two end points of the transfer function, connecting the mid points of the ideal transfer function. The integral nonlinearity is a relative accuracy, which can be expressed as:

$$INL(k) = \sum_{i=1}^{k} DNLi.$$
(3.3)

As defined in equation 3.3 the integral nonlinearity of code k derives from the integration of the differential nonlinearities from code 0 to code k.

The nonlinearity errors are usually used to characterize the static performance, and therefore they are measured by using a low frequency input signal. It should be notable that a converter is always monotonic when the integral nonlinearity specification is less than or equal to  $\pm$  1/2 LSB.



Figure 3.4: Offset error.

#### 3.1.4 Offset Error

Building blocks such as amplifiers and comparators in practical circuits inherently have a build-in offset voltage. This offset is caused by the mismatch of the transistors, resulting in a nonzero input while the digital output is zero. This is often called "zero-scale" error, as shown in figure 3.4 indicating how well the actual transfer function matches the ideal transfer function at a single point. Offset error affects all codes by the same amount and can usually be compensated by an auto-zero technique. Further more, care must be taken during the layout of the circuit in order to avoid the mismatch and thermal coupling.

## 3.1.5 Signal-to-Noise ratio

Signal-to-noise ratio (SNR) is an important dynamic specification of a converter, which depends on the resolution of the converter and includes the specifications of linearity, distortion, noise and settling time. It is the ratio of the power of the desired signal to the

power of the noise signal. The SNR can be expressed as:

$$SNR = 10log \frac{Signal\ power}{Noise\ power} \text{ dB.}$$
 (3.4)

Here, the noise include the quantization noise, thermal noise, clock jitter, etc. Theoretically, the maximum SNR is the ratio of the full scale analog input to the quantization error. The maximum SNR can be described as:

$$SNR_{max} = 6.02n + 1.76 \text{ dB}.$$
 (3.5)

#### 3.1.6 Noise

The noise is an important feature for the column-level ADC because the input signal from sensing element (pixel) is very small ( $\sim 1 \text{ mV}$ ). In some extreme cases, some different analog input signals with different voltages are represented by the same digital code. That means some information has been lost and distortion has been introduced into the signal. This is called quantization noise. Beside this, thermal noise also called white noise of amplifiers, resistors and so on, adds to the quantization noise. Therefore the total noise of the system can be described as [2]:

$$N_{system} = \sqrt{N_{quantization}^2 + N_{thermal}^2}$$
 (3.6)

where  $N_{quantization}$  is the quantization noise power and  $N_{thermal}$  is the thermal noise power. The thermal noise is a dominant error generated by random electrons in transistors and resistors.

#### 3.1.6.1 Resistor Thermal Noise

The thermal noise of a resistor is proportional to the absolute temperature and its power spectral density can be expressed as [3]:

$$\overline{V_n^2} = 4KTR \tag{3.7}$$

where K = 1.38  $\times$  10<sup>-23</sup> J/K is the Boltzmann constant, T is the temperature in Kelvin's and R is the resistor value. Note that  $V_n^2$  is expressed in  $V^2/\text{Hz}$ .



Figure 3.5: Simple sampling circuit (left) and thermal noise equivalent circuit (right).

In the sampling circuit, the resistor thermal noise can be derived from the finite resistance of the MOS transistor switches and stored in the sampling capacitor. Figure 3.5 shows a schematic of the sampling circuit. The on-resistance of the MOS switch introduces thermal noise at the output and, when the switch turns off, the noise is stored on the capacitor along with the input signal. Modeling the noise of the on-resistance by a voltage source  $V_R$ , the transfer function of the low-pass filter is:

$$\frac{V_{out}}{V_R}(s) = \frac{1}{1 + RCs} \tag{3.8}$$

The white noise of the resistor is shaped by the low-pass filter, and the total noise power at the output can be obtained by:

$$P_{n,out} = \int_0^\infty S_R(f) |\frac{V_{out}}{V_R}(jw)|^2 df$$

$$= \int_0^\infty \frac{4KTR}{4\pi^2 R^2 C^2 f^2 + 1} df$$
(3.9)

$$= \int_0^\infty \frac{4KTR}{4\pi^2 R^2 C^2 f^2 + 1} df \tag{3.10}$$

$$= \frac{2KT}{\pi C} \arctan(f)|_0^{\infty}$$
(3.11)

$$=\frac{KT}{C}\tag{3.12}$$

Note the unit of KT/C is  $V^2$ . Equation 3.12 implies that the total noise at the output of the sampling circuit is independent of the resistance value. This is also called KT/C noise. The KT/C noise limits the high precision performance. In order to reduce the thermal noise, the sampling capacitor must be sufficiently large, but this large capacitor

load will increase the response time and consequently, reduce the speed of the device.

#### 3.1.6.2 Transistor Thermal Noise

MOS transistors also exhibit thermal noise [4]. It can be modeled by a current source connected between the drain and source terminals for long-channel MOS devices operating in saturation. The spectral density is given by:

$$\overline{I_n^2} = 4KT\gamma g_m \tag{3.13}$$

where the coefficient  $\gamma$  is derived to be equal to 2/3 for long-channel transistor and  $g_m$  is the transconductance of the transistor. With the current source noise model, the thermal noise of the amplifier can be calculated.

#### 3.1.7 Settling Time

The settling time of a system is defined as the time from the transition starting until the time the output achieves the new value within the specified accuracy. The settling time specification of the sample-and-hold (S/H) circuit, comparator and digital-to-analog converter is very important for applications of the ADC, especially in successive approximations register ADC configuration.

# 3.2 Error Reduction Techniques

In order to reduce the linearity errors of the converter, some techniques have been developed. In this section, more details on offset cancellation and switchs considerations are described.

## 3.2.1 Offset Compensation

As mentioned above, offset inherently exits in the amplifiers and comparators, which is caused by the mismatch of the MOS transistors. In order to cancel the offset, simple methods such as increasing the input capacitance can be used. However, it will severely limit the circuit speed and increase the power consumption, especially for column-parallel



Figure 3.6: Output offset storage architecture.

A/D converter. For this reason, many high precision systems require electronic offset cancellation.

Auto zeroing technique has been developed in order to cancel the offset [5, 6]. The principle is described as follows. Firstly, the input voltage being equal to the offset error is measured and stored on an auto zeroing capacitor. Then the input signal is added to the offset voltage. Only the changes of the input signal is amplified at the output node. Thus the offset is canceled.

There are two kinds of auto zeroing architectures [7, 8]. One is call output offset storage (OOS) architecture and the other is called input offset storage (IOS) architecture. As the first step towards an analysis of the offset cancellation, a differential amplifier having capacitive coupling at the output is considered here. The output offset storage architecture is shown in figure 3.6. The differential amplifier has an input referred voltage  $V_{OS}$  and is followed by two auto zeroing capacitors.

The operation of the OOS technique is described as follows. During the offset storing phase, switches  $S_1$  and  $S_2$  are turned off while the other switches  $S_3$  to  $S_6$  are turned on. Then the inputs of the amplifier are connected to a common mode voltage. The input referred offset voltage is amplified and stored on the capacitors, driving the output to  $V_{out} = A_v V_{OS}$  across  $C_1$  and  $C_2$ . For the moment a zero differential input results a zero difference at the output node. Thus the circuit consisting of the amplifier and two series capacitors exhibits a zero offset voltage.



Figure 3.7: Input offset storage architecture.

During the amplification phase, switches  $S_1$  and  $S_2$  are turned on while the other switches  $S_3$  to  $S_6$  are turned off. The input signal is added to the offset voltage, only the changes of the input signal is amplified at the output node. Therefore the offset is canceled by measuring the output together with setting the differential input to zero voltage and storing the results on capacitors. However, switches  $S_5$  and  $S_6$  inject charges on the capacitors when they are turned off. The offset from the charge injection is divided by the gain of the amplifier. In order to reduce this error,  $A_v$  should be large, but  $A_v V_{OS}$  could saturate the amplifier in an open loop configuration. For this reason,  $A_v$  is typically chosen to a low value.

Another architecture is input offset storage, as shown in figure 3.7, where a high gain amplifier is required. This approach incorporates two auto zeroing capacitors at the input and the amplifier is connected as a unity-gain negative-feedback loop during the offset cancellation. The principle is described as follows. During the offset compensation phase, switches  $S_1$  and  $S_2$  are turned off while the other switches  $S_3$  to  $S_6$  are turned on. The amplifier is placed in a unity-gain negative-feedback loop and the capacitors  $C_1$  and  $C_2$  are charged with the offset voltage. Now, the output of the amplifier is given by:

$$V_{out} = \frac{A_v}{1 + A_v} V_{OS} \tag{3.14}$$

where the output is equal to  $V_{OS}$  if the gain of the amplifier is very large. Now, for a zero



Figure 3.8: CMOS switch (left) and on-resistance behavior (right).

differential input, the differential output is equal to  $V_{OS}$ .

During the amplification phase, switches  $S_1$  and  $S_2$  are turned on while the other switches  $S_3$  to  $S_6$  are turned off. Now, the input referred offset of the total circuit is equal to  $V_{OS}/A_v$ . The input signal is added to the offset voltage, only a change of the input signal is amplified at the output node. In addition to the offset voltage, nonlinearities can be introduced, such as the mismatch of the switch. The charge injection mismatch of switches  $S_5$  and  $S_6$  may saturate the amplifier if the gain is very large. It can be minimized by using a small switch or a large auto zeroing capacitor, thus limiting the signal speed. Therefore, the effect from the charge injection of the OOS architecture is smaller than that of the IOS architecture. Usually a combination of OOS and IOS architecture can be used in order to achieve high precision with very low offset error.

## 3.2.2 Sampling Switches

The sample-and-hold circuit is composed of MOS switches and capacitors. The MOS switches operating in deep triode region can cause non-linearities such as nonlinear on-resistance, charge injection and clock feedthrough [9].

NMOS switches are also called "zero-offset" switches to denote that they have no dc shift between the input and output voltage of the sampling circuit. However, there is an output voltage drop of  $V_{TH}$  while the input approaches  $V_{DD}$ , which means that the output voltage can not exceed  $V_{DD} - V_{TH}$ . That causes the nonlinearity and introduces



Figure 3.9: Charge injection through the source and drain terminals.

the distortion. From another point of view, the on-resistance of the switch is input signal-dependent, which is considerably increased as the input signal gets close to  $V_{DD}$ . In the triode region, the on-resistance of the switch is given by:

$$R_{on} = \frac{1}{\mu_n C_{ox} \frac{W}{L} (V_{DD} - V_{in} - V_{TH})}$$
 (3.15)

In order to accommodate the on-resistance of the NMOS switch, a PMOS switch is added because its on-resistance decreases as the input signal approaches  $V_{DD}$ . Therefore, CMOS switches are used as complementary switches to achieve input signal-independent on-resistance. The on-resistance of the complementary switches can be described as:

$$R_{on,eq} = R_{on,N} || R_{on,P}$$

$$= \frac{1}{\mu_n C_{ox}(\frac{W}{L})_N (V_{DD} - V_{in} - V_{THN})} || \frac{1}{\mu_p C_{ox}(\frac{W}{L})_P (V_{in} - |V_{THP}|)}$$

$$= \frac{1}{\mu_n C_{ox}(\frac{W}{L})_N (V_{DD} - V_{THN}) - [\mu_n C_{ox}(\frac{W}{L})_N - \mu_p C_{ox}(\frac{W}{L})_P] V_{in} - \mu_p C_{ox}(\frac{W}{L})_P |V_{THP}|}$$
(3.18)

From the equation 3.18, if  $\mu_n C_{ox}(\frac{W}{L})_N = \mu_p C_{ox}(\frac{W}{L})_P$ , then  $R_{on,eq}$  is independent of the input signal. Figure 3.8 shows the complementary switch and the on-resistance behavior. Consider the sampling circuit as shown in figure 3.9, a channel exists between drain



Figure 3.10: Clock feedthrough through gate-source and gate-drain overlap capacitance.

and source. Assuming  $V_{in} = V_{out}$ , the total charge in the channel can be obtained by:

$$Q_{ch} = WLC_{ox}(V_{DD} - V_{in} - V_{TH}) (3.19)$$

where L denotes the effective channel length. When the MOS switch turns off, the charge in the channel will exit through the drain and source terminals, which is called "charge injection" [10, 11]. The charge injection to the source terminal is absorbed by the input signal, creating no error. However, the charge injection to the drain terminal will be stored on the sampling capacitor, causing an error to the sampled signal voltage. The resulting error appears as a "pedestal noise" at the output, and its value depends on the input signal voltage. The fraction of charge injected in the drain terminal is a relatively complex function of various parameters such as the impedance of each terminal and the transition time of the clock. In reality, the charge distribution in terms of such parameters is unpredictable and most circuit simulation programs model charge injection quite inaccurately. Therefore it is assumed that the entire channel charge is injected on the sampling capacitor as a worst case estimation.

In addition to the channel charge injection, another error called clock feedthrough exists in the parasitic capacitor of MOS switch [12]. The clock signal is coupled to the sampling capacitor through the gate-drain or gate-source overlap capacitance. As shown in figure 3.10, the error is introduced into the sampled output voltage. The effect can be



Figure 3.11: Dummy switch to reduce the charge injection and clock feedthrough.

given by: 
$$V_{error} = V_{CLK} \frac{WC_{ov}}{WC_{ov} + C_H} \eqno(3.20)$$

where  $C_{ov}$  is the overlap capacitance per unit width of the switch. Note that the error is input signal-independent and directly proportional to the size of the switch and the power supply of the clock. Charge injection and clock feed through introduce errors into the sampled signal, and both of them can be canceled by some techniques.

The first technique to reduce the charge injection incorporates a CMOS switch. When the switch turns off, the opposite charge (electrons and holes) injected by NMOS and PMOS switches cancel each other. The clock feedthrough can be minimized but not completely canceled because the devices do not have the same overlap capacitance.

Another approach to remove the charge injection is adding a second transistor, called "dummy" switch. As shown in figure 3.11, the dummy switch  $M_2$  is driven by the opposite clock. When  $M_1$  turns off,  $M_2$  turns on, therefore the channel charge injection stored on the capacitor is absorbed by the second transistor. In order to keep the charge completely absorbed, the size of  $M_2$  should be equal to half of the transistor  $M_1$ . Interestingly, the effect of clock feedthrough can be suppressed with the same method.

# 3.3 A/D Converter Architectures

The ADC architecture determines how well it can meet the required targets for our application. The A/D converter must be fast in order to realize a high frame readout speed sensor. In this section, several high-speed ADC architectures are reviewed.

#### 3.3.1 Flash ADC

The well known architecture for a high-speed analog-to-digital converter is the flash converter, which is the simplest and the fastest converter [13]. In this structure an array of comparators compares the input signal voltage with a series of increasing reference voltages. Consequently, the comparator outputs constitute a thermometer code, which is converted to a binary weighted output code. The flash architecture exhibits a good performance and can be easily implemented with the repetition of simple comparator and a decoder.

Figure 3.12 shows a block diagram of an N-bit flash ADC. The converter is composed of  $2^N - 1$  comparators, a resistor ladder including  $2^N - 1$  references and a decoder. The ladder subdivides the main reference into  $2^N - 1$  equally spaced voltages and the comparators convert the sampled input signal into a thermometer digital code. This code is converted into a binary output by using a decoder. The performance of the flash ADC is determined by that of the constituent comparators, which usually incorporate a clock controlled architecture including a preamplifier and a latch. Therefore the flash ADC can achieve high speed (several GS/s) [14, 15, 16, 17].

The flash ADC suffers from a number of drawbacks such as the comparator offset, high power consumption and large area [18, 19]. Since the number of comparators grows exponentially with the number of bits, the ADC requires a large area and power consumption for a high resolution. Furthermore the large number of comparators can bring many problems such as the deviation of the reference generated by the resistor ladder, the offset of the comparator, large input capacitance and the kickback noise at the expense of silicon area and power as  $2^N - 1$  comparators are required. Hence, flash architecture is not suitable for the column level A/D converter.



Figure 3.12: Block diagram of an N-bit flash ADC.

#### 3.3.2 Two-Step ADC

In order to avoid some of the problems encountered with the flash ADC, the two-step ADC also called half-flash ADC was developed [20, 21]. The two-step architecture is one of the residue type ADC structures, as shown in figure 3.13, which is composed of a coarse ADC, a digital-to-analog converter, a subtractor, an inter-stage gain block, a fine ADC and a bit combiner.

The operation principle is described as follows. Firstly, the sampled input signal is quantized by a coarse ADC with B1 most significant bits (MSB). After the coarse quantization the outputs containing large quantization error are converted into an analog value again by using a D/A converter. This analog value is subtracted from the input signal and the computed missing voltage is quantized by a fine ADC with B2 least significant bits (LSB). The fine converter has to have the full-scale range of 1 LSB of the coarse quantizer ( $FS_{fine} = FS_{coarse}/2^{B1}$ ) to achieve the precision in the order of the overall



Figure 3.13: Two-step ADC architecture.

ADC half-LSB. Therefore a high precision fine converter is required. In order to relax the accuracy of the fine ADC, an amplifier with a gain factor of  $2^{B1}$  is used before fine quantization. Thus the fine ADC has the same resolution as the coarse ADC, and they can use identical stages. The outputs of the coarse and fine ADCs are combined by an error correction logic, having the overall resolution of (B1 + B2) bits.

In this architecture less comparators are used than the full-flash ADC. Higher resolution can be easily obtained without large area and high power consumption [22, 23]. However, the D/A converter in this application needs to have the full accuracy of the ADC. Furthermore a sample-and-hold amplifier is employed to store the residue of input signal for fine quantization, and therefore coarse and fine ADCs can operate concurrently during one clock cycle. In this way the conversion time can be significantly decreased.

## 3.3.3 Subranging ADC

The subranging converter architecture is based on a two-step architecture [24]. However, in the subranging architecture no subtraction stage between the coarse and fine converter is used. Figure 3.14 shows the typical architecture of the subranging ADC. It is composed of a sample-and-hold circuit, a coarse converter, a fine converter, a resistor ladder, a multiplexer and a bit combiner.



Figure 3.14: Subranging ADC architecture.

The subranging converter has less area and lower power dissipation [25, 26]. The number of the comparators is  $2^{N/M}$ , where N is the overall ADC resolution, and M is the number of stages. Furthermore the resistor ladder can generate  $2^N - 1$  references. The operation principle is as follows. Firstly the sampled input signal is quantized by a coarse ADC that determines the B1 most significant bits. When the coarse information is obtained, the fine resistor ladder taps are addressed by the results from the coarse ADC. Then the fine conversion takes place and transmits the B2 least significant bits. The overall resolution of (B1 + B2) bits are produced by a combiner at the output.

The conversion in subranging architecture needs a clock multiplexer instead of one clock as in a flash ADC. Therefore the conversion speed decreases as the number of subranging stages increase [27]. The comparator required for the coarse converter has less gain than that of the fine converter, which should have an accuracy of the overall ADC resolution.



Figure 3.15: Pipeline ADC architecture.

## 3.3.4 Pipeline ADC

The pipeline architecture is another residue type ADC [28, 29]. The principle is based on cascading several low resolution stages to obtain high overall resolution. For example, a 10-bit ADC can be built with series of 10 ADCs of each 1 bit only. Each stage performs coarse A/D conversion and computes the quantization error. In order to achieve a high conversion rate, all stages operate concurrently.

Figure 3.15 shows the architecture of the pipeline ADC. It consists of a cascade of identical stages that are separated by a sample-and-hold block, which is part of the sub-converter stage. Following the S/H circuit, a sub-converter including a sub-ADC, a sub-DAC, a subtracter and an inter-gain amplifier produces  $B_i$  bits digital outputs. The operation principle is as follows. First the sampled input signal is quantized by the sub-ADC, and then reconstructed by the sub-DAC, which has an accuracy of the overall ADC resolution. This quantized analog signal is subtracted from the sample input signal of the



Figure 3.16: Folding ADC architecture.

stage. After the subtraction, the residue is amplified by the gain stage and then applied to the following sub-converter stage.

Because the bits from each stage are determined at different points in time, where each stage introduces at least 1/2 clock cycle latency, all the bits corresponding to the same sampled input signal are time-aligned with shift registers. Furthermore a redundancy circuit often called "digital error correction" is used in order to deal with the non-idealities with sub-ADCs, sub-DACs and gain stage [30, 31]. Note that each stage can produce the bits when the previous stage starts processing the next sample. Therefore the overall throughout does not depends on the number of stages, but limited by the speed of one stage.

## 3.3.5 Folding ADC

The folding ADC is based on a flash architecture and combined with a two-step solution [32]. However, it has fewer comparators than the flash, thus exhibits less area and lower power consumption [33, 34]. The number of comparators is determined by a folding factor which represents the number of folder times. Figure 3.16 show the architecture of the folding ADC. It is composed of a coarse flash ADC, a fine flash ADC, a folding circuit and a bit combiner.

No sample-and-hold circuit is required in the folding ADC. The architecture uses analog preprocessing to transform the input signal into a repetitive output signal to be applied to the fine converter. In this architecture the B1 most significant bits are

determined by the coarse converter, which also determines the number of times a signal is folded. The fine B2 least significant bits are determined by the fine quantizer which converts the preprocessed "folded" signal into the fine code. The two ADCs operate in parallel, thus a high conversion rate can be achieved. The overall resolution of (B1 + B2) bits are produced by a combiner at the output. In this way, the number of the comparators can be significantly reduced.

The folded signal is similar to the residue signal in a two-step ADC, but it is not generated from the coarse quantization. However in CMOS circuit the folder transfer curve is rounded and only the zero-crossing is accurate. In fact, most folding ADCs do not use the folded waveforms, but only the zero-crossings. The number of folding stages can be further reduced by interpolation technique.

#### 3.3.6 Successive Approximation Register ADC

The architecture of a successive approximation register (SAR) ADC is shown in the top part of figure 3.17. The basic converter consists of a sample-and-hold amplifier, a comparator, a digital-to-analog (DAC) converter and a SAR logic.

The SAR ADC basically employs a successive approximation algorithm [35]. The operation principle is as follows. At the beginning of the conversion the most significant bit (MSB) is set to 1 by the SAR logic. Then the sampled input signal is compared to the output signal of the D/A converter. When the input signal is larger than the midscale of the reference voltage  $(V_{ref})$ , then the MSB remains at 1. Otherwise, the MSB of the register is cleared to 0. The SAR logic then switches on the next bit and another comparison will be performed. The procedure continues all the way down to the lest significant bit (LSB). Once this is done, the conversion is complete and the digital output is available in the SAR logic. In the bottom part of figure 3.17 an example of a 4-bit conversion procedure is shown. The output value in the figure is 1001. A complete conversion in the SAR architecture requires N switching and comparison operations to convert the input signal into an N-bit digital output.

The two critical components of a SAR ADC are the comparator and the DAC. The comparator should resolve the small differences in  $V_{in}$  and  $V_{DAC}$  within the specified time. The speed of a SAR ADC is limited by the settling time of the DAC, which requires to settle within 1/2 LSB of the overall ADC resolution. Also the linearity and accuracy of



Figure 3.17: Successive approximation ADC architecture (top) and its operation principle (bottom).

this architecture depend on the performance of the DAC. The SAR ADC is frequently the architecture of choice for medium-to-high resolution with moderate sampling rate. The primary advantages of SAR ADCs are low power consumption, high resolution and accuracy [36, 37]. Therefore it is suitable to a power efficient ADC design, which will be discussed in more detail in section 3.4.

## 3.3.7 Sigma-Delta ADC

All the ADC architectures described in previous sections are often called Nyquist rate ADCs. In this section a sigma-delta converter also called oversampling ADC is presented, i.e. the sampling rate is much higher than the Nyquist rate [38, 39]. A block diagram of a sigma-delta ADC is shown in figure 3.18. The converter consists of a S/H, a sigma-delta



Figure 3.18: Sigma-delta ADC architecture.

modulator, a low-pass digital filter and a down sampler. The sigma-delta modulator includes an integrator, an internal A/D converter and a D/A converter used in the feedback path.

The sigma-delta converter's primary internal cells are the  $\Sigma\Delta$  modulator and the decimation filter. The  $\Sigma\Delta$  modulator samples the input signal at a very high rate into a 1-bit stream. It uses a function called noise-shaping that pushes low-frequency quantization noise up to higher frequencies where it is outside the band of interest. Then the out-of-band quantization noise can be removed by a digital low-pass filter following the ADC. After the low-pass filtering is performed, the digital signal can be downsampled to the Nyquist rate without affecting the signal to noise ratio. The collective operation of low-pass filtering and downsampling is known as decimation filter.

The resolution of the sigma-delta ADC can be improved by increasing the order of the modulator [40]. Multi-order modulators shape the quantization noise to even higher frequencies than the lower-order modulators. However, some of the disadvantages of the second- or multi-order modulators appear including increased complexity, multiple loops and design difficulty.



Figure 3.19: ADC architecture power efficiency comparison.

# 3.4 Column-Level ADC Suitable to Vertex Detector

Since more than 30 different MIMOSA<sup>1</sup> prototypes are designed and fabricated, column-parallel readout architecture has become increasingly popular to increase the readout frequency, allowing reading up to 10 k frames/s. The sensors foreseen for the outer layers, which are the largest ones, standing for about 90% of the total VTX surface, have less constrains in term of spatial resolution and read-out speed. A single point resolution of 3-4  $\mu$ m combined with an integration time shorter than 100  $\mu$ s are expected to constitute a valuable trade-off. In this case, the design effort focuses on minimizing the power consumption. A larger pixel pitch of 35  $\mu$ m combined with a 4-bit ADC is proposed, therefore reducing the power consumption and keeping necessary spatial resolution. Thus a power efficient column-level ADC architecture is quite necessary for the outer layers.

In order to achieve an integration of  $100\mu$ s in a full size sensor (about  $2 \times 2$  cm<sup>2</sup>), the ADC accommodating the pixel read-out in parallel is required to work at a frequency of 6.25 MHz (160 ns/row). Figure 3.19 shows the power efficient of the different ADC

<sup>&</sup>lt;sup>1</sup>standing for Minimum Ionizing particle MOS Active pixel sensor.

architectures [41]. The operation rate of the ADC for the sensor is located in the blue zone. It can be derived that the successive approximation register (SAR) ADC is the best choice for our application. The SAR ADC employs only one comparator, thus has a lower power consumption at moderate speed. As described in Chapter 4, the power consumption can be controlled by using an intelligent logic to achieve an ultra low value, and techniques are developed to achieve a significantly low-area while maintaining the conversion of signals within the expected frequencies.

## 3.5 Summary

Performance of the ADC have been reviewed in this chapter including DC specifications and dynamic specifications. Some error reduction techniques are described. A brief introduction of different ADC architectures has been presented, including flash ADC, two-step ADC, subranging ADC, pipeline ADC, folding ADC, successive approximation register ADC, and sigma-delta ADC. A power efficiency comparison of these architectures has been analyzed and the SAR architecture is chosen for the column-level ADC in the CMOS sensor prototype. A detailed description of the design of the sensor prototype will be presented in the next chapter.

# **Bibliography**

[1] R. J. Baker, CMOS Mixed-Signal Circuit Design, 2nd ed. Wiley-IEEE Press, 2009.

- [2] R. van de Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Converters, 2nd ed. Boston: Kluwer Academic Publishers, 2003.
- [3] B. Razavi, Design of Analog CMOS Integrated circuits. New York: McGraw-Hill, 2001.
- [4] Y. Tsividis, *Operation and Modeling of the MOS Transistor*, 2nd ed. New York: Oxford University Press, 1999.
- [5] B. Razavi, *Principles of Data Conversion System Design*. Piscataway, NJ: IEEE Press, 1995.
- [6] M. Gustavsson, J. J. Wikner, and N. N. Tan, CMOS Data Converters for Communications. Boston: Kluwer Academic Publishers, 2003.
- [7] R. Gregorian, Introduction to CMOS OP-AMPS and Comparators. New York: John Wiley & Sons, 1999.
- [8] B. Razavi and B. A. Wooley, "Design techniques for high-speed, high-resolution comparators," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1916–1926, Dec. 1992.
- [9] B. J. Sheu and C. Hu, "Switch-induced error voltage on a switched capacitor," *IEEE J. Solid-State Circuits*, vol. 19, no. 4, pp. 519–525, Aug. 1984.
- [10] G. Wegmann, E. A. Vittoz, and F. Rahali, "Charge injection in analog MOS switches," *IEEE J. Solid-State Circuits*, vol. 22, no. 6, pp. 1091–1097, Dec. 1987.
- [11] B. J. Sheu, J. Shieh, and M. Patil, "Modeling charge injection in MOS analog switches," *IEEE Trans. Circuits and Systems*, vol. 34, no. 2, pp. 214–216, Feb. 1987.
- [12] W. B. Wilson, H. Z. Massoud, E. J. Swanson, R. T. George, and R. B. Fair, "Measurement and modeling of charge feedthrough in n-channel MOS analog switches," IEEE J. Solid-State Circuits, vol. 20, no. 6, pp. 1206–1213, Dec. 1985.

[13] A. Ismail and M. Elmasry, "A 6-bit 1.6-GS/s low-power wideband flash ADC converter in 0.13- $\mu$ m CMOS technology," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 1982–1990, Sep. 2008.

- [14] C. Paulus, H. M. Bluthgen, M. Low, E. Sicheneder, N. Bruls, A. Courtois, M. Tiebout, and R. Thewes, "A 4GS/s 6b flash ADC in 0.13μm CMOS," Symposium on VLSI Circuits, pp. 420–423, Jun. 2004.
- [15] M. Kijima, K. Ito, K. Kamei, and S. Tsukamoto, "A 6b 3GS/s flash ADC with background calibration," *IEEE Custom Integrated Circuit Conference*, *CICC*, pp. 283–286, Sep. 2009.
- [16] K. Lad and M. Bhat, "A 1-V 1-GS/s 6-bit low-power flash add in 90-nm CMOS with 15.75 mW power consumption," *International Conference on Computer Communication and Informatics (ICCCI)*, pp. 1–4, Jan. 2013.
- [17] J. Kim, B. Sung, W. Kim, and S. Ryu, "A 6-b 4.1-GS/s flash ADC with time-domain latch interpolation in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1429–1441, Jun. 2013.
- [18] K. Shushihara and A. Matsuzawa, "A 7b 450MSample/s 50mW CMOS ADC in 0.3mm<sup>2</sup>," *IEEE International Solid-State Circuit Conference, ISSCC*, pp. 170–171, 2002.
- [19] K. Deguchi, N. Suwa, M. Ito, T. Kumamoto, and T. Miki, "A 6-bit 3.5-GS/s 0.9-V 98-mW flash ADC in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 10, pp. 2303–2310, Oct. 2008.
- [20] Y. Chung and J. Wu, "A CMOS 6-mW 10-bit 100-MS/s two-step ADC," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2217–2226, Nov. 2010.
- [21] Z. Cao and S. Yan, "A 52mW 10b 210MS/s two-step ADC for digital-IF receivers in  $0.13\mu m$  CMOS," *IEEE Custom Integrated Circuit Conference*, CICC, pp. 309–312, Sep. 2008.
- [22] H. van der Ploeg, G. Hoogzaad, H. A. H. Termeer, M. Vertregt, and R. L. J. Roovers, "A 2.5-V 12-b 54-Msample/s 0.25- $\mu$ m CMOS ADC in 1-mm<sup>2</sup> with mixed-signal chop-

ping and calibration," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1859–1867, Dec. 2001.

- [23] H. Pan, M. Segami, M. Choi, L. Cao, and A. A. Abidi, "A 3.3-V 12-b 50-MS/s A/D converter in 0.6-μm CMOS with over 80-dB SFDR," *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1769–1780, Dec. 2000.
- [24] J. Mulder, C. M. Ward, C. Lin, D. Kruse, J. R. Westra, M. Lugthart, E. Arslan, R. J. van de Plassche, K. Bult, and F. M. L. van der Goes, "A 21-mW 8-b 125-MSample/s ADC in 0.09-mm<sup>2</sup> 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2116–2125, Dec. 2004.
- [25] S. K. Gowdhaman and M. S. Baghini, "6-bit low-power subranging ADC with increased throughput," *IEEE Midwest Symposium on Circuit and Systems, MWSCAS*, pp. 497–500, Aug. 2010.
- [26] Y. Asada, K. Yoshihara, Y. Urano, M. Miyahara, and A. Matsuzawa, "6-bit, 7mW, 250fj, 700MS/s subranging ADC," *IEEE Asian Solid-State Circuit Confer*ence, ASSCC, pp. 141–144, Nov. 2009.
- [27] Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, and A. Ogawa, "A 30mW 12b 40MS/s subranging ADC with a high-gain offset-canceling positive-feedback amplifier in 90nm digital CMOS," *IEEE International Solid-State Circuit Conference, ISSCC*, pp. 802–811, Feb. 2006.
- [28] Y. Chiu, P. R. Gray, and B. Nikolic, "A 14-b 12-MS/s CMOS pipeline ADC with over 100-dB SFDR," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2139–2151, Dec. 2004.
- [29] D. Kurose, T. Ito, T. Ueno, T. Yamaji, and T. Itakura, "55-mW 200-MSPS 10-bit pipeline ADCs for wireless receivers," *IEEE J. Solid-State Circuits*, vol. 41, no. 7, pp. 1589–1595, Jul. 2006.
- [30] K.-H. Lee, Y.-J. Kim, K.-S. Kim, and S.-H. Lee, "14 bit 50 ms/s 0.18  $\mu$ m CMOS pipeline ADC baded on digital error calibration," *Electronics Letters*, vol. 45, no. 21, pp. 1067–1069, Oct. 2009.

[31] C. Tseng, H. Chen, W. Shen, W. Cheng, and H. Chen, "A 10-b 320-MS/s stage-gain-error self-calibration pipeline ADC," *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1334–1343, Jun. 2012.

- [32] Z. Wang, H. Pan, C. Chang, H. Yu, and M. Chang, "A 600-MSPS 8-bit folding ADC in 0.18μm CMOS," Symposium on VLSI Circuits, pp. 424–427, Jun. 2004.
- [33] W. R. W. Ahmad, S. L. M. Hassan, I. S. A. Halim, N. E. Abdullah, and I. Mazlan, "High speed with low power folding and interpolating ADC using two types of comparator in CMOS 0.18μm technology," *IEEE Symposium on Humanities, Science and Engineering Research, SHUSER*, pp. 715–720, Jun. 2012.
- [34] S. Oza and N. M. Devashrayee, "Low power folding and interpolating ADC using 0.35-μm technology," Nirma University International Conference on Engineering, NUiCONE, pp. 1–7, Dec. 2011.
- [35] Y. Chen, S. Tsukamoto, and T. Kuroda, "A 9b 100MS/s 1.46mW SAR ADC in 65nm CMOS," *IEEE Asian Solid-State Circuit Conference*, ASSCC, pp. 145–148, Nov. 2009.
- [36] P. Kamalinejad, S. Mirabbasi, and V. C. M. Leung, "An ultra-low power SAR ADC with an ara-efficient DAC architecture," *IEEE International Symposium on Circuits and Systems, ISCAS*, pp. 13–16, May 2011.
- [37] W. Pang, C. S. Wang, Y. Chang, N. Chou, and C. K. Wang, "A 10-bit 500-KS/s low power SAR ADC with splitting comparator for bio-medical applications," *IEEE Asian Solid-State Circuit Conference*, ASSCC, pp. 149–152, Nov. 2009.
- [38] G. Mitteregger, C. Ebner, S. Mechnig, T. Blon, C. Holuigue, and E. Romani, "A 20-mW 640-MHz CMOS continuous-time ΣΔ ADC with 20-MHz signal bandwith, 80-dB dynamic range and 12-bit ENOB," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2641–2649, Dec. 2006.
- [39] M. Z. Straayer and M. H. Perrott, "A 12-bit 10-MHz bandwidth continuous-time  $\Sigma\Delta$  ADC with a 5-bit 950-MS/s VCO-based quantizer," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 805–814, Apr. 2008.

[40] J. Garcia, S. Rodriguez, and A. Rusu, "A low-power CT incremental 3rd order  $\Sigma\Delta$  ADC for biosensor applications," *IEEE Trans. Circuits and Systems I: Regular Papers*, vol. 60, no. 1, pp. 25–36, Jan. 2013.

[41] B. Murmann, "ADC performance surver 1997-2012." Aug. 2012. [Online]. Available: http://www.stanford.edu/~murmann/adcsurvey.html

# Chapter 4

# Design of a Sensor Prototype

The performance and characteristics of different ADC architectures have been described in the previous sections. The architecture suitable for the CMOS pixel sensor has been selected. In this chapter, a sizable sensor prototype consisting of a pixel array integrated with 4-bit column-level ADCs is presented. The first section of this chapter describes the system level design of the prototype chip. In the following sections circuit implementations including pixel and ADC are presented. Also the design requirements and considerations are described in more details. The simulation results of the prototype chip are presented, and finally, a short conclusion about this design is provided.

## 4.1 Global Architecture

The sensor prototype is composed of a matrix of  $48 \times 64$  pixels with a 4-bit column-parallel analog-to-digital converter (ADC). Figure 4.1 shows the overall architecture of the proposed CPS. The sensor is composed of a pixel array, column sample-and-hold (S/H) circuits, 4-bit column-level ADCs, reference buffers, latch array, memory buffers, 8 to 1 multiplexers, timing control circuits, a finite state machine sequencer, JTAG controller, a row sequencer, bias circuits and analog drivers. Each pixel, with 35  $\mu$ m pitch, incorporates in-pixel amplification for mitigating the noise sources of the signal, and a correlated double sampling operation for subtracting the average pixel noise. The pixel array is readout in a rolling shutter mode which is steered through a row sequencer located on the left side. In order to realize a fast frame rate, the 48 pixels per row are simultaneously read out in 160 ns (one horizontal scanning time).



Figure 4.1: Global architecture of the CMOS pixel sensor.

As described later, the ADCs accommodating the pixel readout in parallel complete the conversion by performing a multi-bit/step approximation. The ADC design resembles the successive approximation register architecture (SAR), featuring low power consumption with moderate speed (several MS/s). Previous prototypes allowed to check that the noise floor of the pixel is about 1 mV. Since the particle position reconstruction improves when using the charge center of gravity, the small signals (a few mV) approaching to the noise are much more important. In order to improve the resolution on the particle position reconstruction, the least significant bit (LSB) is set at the level of the noise. Earlier physics studies show that a rough encoding of the high amplitude delivered by those pixels in a cluster does not degrade the resolution. Therefore a variable charge encoding is employed, ranging from a maximum of 4 bits for signals of small magnitude to only 2 bits for large signals. After the A/D conversion, the digital outputs are loaded in memory buffers, which will be serially transmitted to the outside through the 8 to 1



Figure 4.2: SB pixel architecture.

multiplexer. The setting parameters of the sensor are remotely programmable through the JTAG circuits. In order to compare the performance of the readout circuits, the chip is integrated with eight analog drivers.

#### 4.2 Pixel Circuit

In order to improve the spatial resolution and tracking performances of detectors equipped with CMOS sensors, the performance of the pixel circuit such as the signal-to-noise ratio should be increased. Therefore an in-pixel amplifier is used in the pixel circuit. The objective of the in-pixel amplifier is to maximize the signal-to-noise ratio for a given pixel pitch size and Nwell charge collection diode size, and minimize the power consumption. Also the pixel circuit should exhibit a small pixel-to-pixel performance difference due to the CMOS process variation.

Figure 4.2 shows the schematic of the pixel, which has been well used in previous sensors (MIMOSA 26 designed for EUDET beam telescope [1], [2] and ULTIMATE equipping the STAR-PXL sub-system [3], [4]). The pixel concept combines in-pixel amplification with a correlated double sampling operation. In a twin-well CMOS technology, the difficulty of in-pixel circuits design is that only NMOS transistors can be used, because

78 4.2. Pixel Circuit

Table 4.1: The simulation results of the pixel circuit

| Parameter                     | Value                                          |
|-------------------------------|------------------------------------------------|
| Pixel size                    | $35 \ \mu\mathrm{m} \times 35 \ \mu\mathrm{m}$ |
| Power supply                  | 3.3 V                                          |
| ENC (equivalent noise charge) | 11 e <sup>-</sup>                              |
| Conversion gain               | $80~\mu\mathrm{V/e^-}$                         |
| Current in pixel              | $3 \mu A$                                      |
| Current in SF                 | $50 \mu A$                                     |
| Read-out time                 | 160 ns                                         |

any additional Nwell used to fabricate PMOS transistors would compete with the sensing Nwell/Pepi diode for charge collection.

The collected charges are converted into signal voltage through an Nwell/Pepi diode. In order to maximize the signal to noise ratio, a common source amplifier with enhanced gain and feedback featuring low offset was employed [5]. The amplifier uses a cascode architecture, including transistor  $M_2$ ,  $M_3$ ,  $M_4$  and  $M_5$ . With a biasing transistor  $(M_6)$ , the gain of the amplifier is significantly improved. The feedback is composed of a transistor  $(M_1)$  and a capacitor  $(C_1)$ , which features a low pass filter with a large time constant  $(C_1/g_{m1})$  and provides bias via high resistive diode  $(D_2)$  for the sensing diode  $(D_1)$ . Due to the low pass filter and diodes capacitances, the resulting discharge time is very long, but can be accepted at low occupancy rates. Both the discharge time and the diode capacitance are very large, resulting in the storage of low frequency noise at the output of the amplifier. Thus a correlated double sampling (CDS), based on a clamping technique  $(M_7 \text{ and } C_2)$ , is used to reduce the low frequency noise. Here both capacitors are implemented with NMOS transistors. The pixel circuit can be selected by a select transistor  $(M_9)$ . The pixel output signal after clamping is buffered to the ADC by a source follower  $(M_8)$ . At each scanning time, only one row is powered on, reducing the pixel array's power consumption to one row only.

The designed layout is simulated using Spectre, taking into account the extracted parasitic capacitances. The pixel circuit is powered with an analog power supply of 3.3 V during a readout time of 160 ns. Thus the readout time of the pixel circuit is 160 ns.



Figure 4.3: ADC block diagram.

The pixel circuit has been simulated at the room temperature. The simulation results of the pixel circuit are summarized in table 4.1.

## 4.3 Column-Level ADC

The fundamental building blocks of this ADC are a S/H circuit, a comparator, digital logic and a DAC. In this section, the design considerations of the fundamental building blocks are described.

## 4.3.1 Design Requirements

In this application, the design of such a column-parallel ADC is constrained by several factors. The ADC needs continuous signal conversion, and therefore does not have dead time. In order to read out a 2 cm long sensor in about 100  $\mu$ s or less [6], the column ADC requires a high sampling rate. Due to the cooling system limitation which contributes as well as to the material budget, the power consumption of the column ADC must be minimized, which should be less than 500  $\mu$ W [7]. The dimensions of the dead zones should be minimized to enable the pixel array to fit in the available space allowed by the vertex detecting environment. The readout chain ought to introduce very low noise (less



Figure 4.4: ADC approximation procedure diagram.

than 1 mV) in order to get a reasonable signal to noise ratio. Also, offset compensations between different column ADCs should be considered.

#### 4.3.2 Operation principle

The main components of the ADC are a sample-and-hold (S/H), a digital-to-analog converter (DAC), a comparator, and a digital state machine (FSM) [8], [9]. A block diagram of this ADC is shown in figure 4.3. A pipelined front-end S/H is employed to sample and amplify the pixel signal. A switched DAC generates reference voltage based on the computed digital value from the FSM. The comparator includes a buffer, a preamplifier and a dynamic latch to decide whether the DAC output is positive or negative, serially producing the digital output bits. According to the comparison result, the digital logic performs the multi-bit approximation algorithm and drives the switches of the DAC. Additionally, the Clock Manager block and *Overthreshold* signal are used to generate a dedicated power-saving timing.

The flow of the ADC approximation procedure is shown in figure 4.4. As for the successive approximation, this architecture performs four comparisons and requires four clock cycles to produce a digital output. One of the major differences is that the common-mode voltage of the reference DAC gradually increases from ground  $(V_{threshold})$  to  $V_{ref}$ . Previous prototypes allowed to check that the noise floor of the pixel is about 1 mV. In order to improve the resolution on the particle position reconstruction, the least significant



Figure 4.5: Input/output characteristic of an ideal multiple-bit/step ADC.

bit (LSB) is matched to the level of the noise.

Earlier physics studies have shown that a rough encoding of the high amplitude delivered by those pixels in a cluster did not degrade the resolution on the reconstructed impact position. Therefore a variable charge encoding is employed, ranging from a maximum of 4 bits for signals of small magnitude to only 2 bits for large signals. The ADC operation is described as follows. At the sampling phase, the output of the DAC is switched to ground  $(V_{threshold})$ . Next, the comparator makes the first comparison. If  $V_{in}$  is higher than  $V_{threshold}$ , the switching sequence requires an upward transition by a step of four least significant bit (LSB) voltages to do the next comparison. Otherwise, the conversion stops, producing the digital output value and the DAC is still connected to ground until the next conversion. At the second phase, the comparator does the comparison again. If the comparison result is positive, the switching sequence repeats the same upward transition. Otherwise, the ADC performs a downward successive approximation until the output is decided, with the DAC reference changing by each step of one LSB voltage. At the third phase, the ADC repeats the same procedure. However, the successive step of the



Figure 4.6: ADC operation waveforms showing (a) operation plan (b) timing control.

reference increases to two LSB voltages for downward successive approximation. At the last phase, the output is produced directly according to the comparison result. Therefore 8 references are required in this ADC. Figure 4.5 shows the input/output characteristic of an ideal multiple-bit/step ADC.

# 4.3.3 Optimal Power Saving

Accounting the fact that in the outer layers of ILD-VTX, the hit pixel density is in the order of a few per thousand, the ADC is designed to operate in two modes in order to minimize the power consumption. The ADC employs a threshold voltage  $(V_{threshold})$  as a trigger. If the pixel signal  $(V_{in})$  is higher than  $V_{threshold}$ , the ADC works in active mode and does the conversion. Otherwise, the ADC works in inactive mode and goes asleep until the next conversion.

The operation plan of this ADC is shown in figure 4.6(a). In order to increase the readout speed at a time, the auto-zeroing and sampling operations are implemented asynchronously, thus acting a pipelined stage. As a consequence, this ADC architecture requires a separated sample-and-hold (S/H) circuit. The conversion starts after auto-zeroing and converts the signal voltage from the last sampling. Furthermore, as described below,

this approach improves the operation speed and eliminates fixed pattern noise (FPN) of the column ADC.

Figure 4.6(b) shows the timing control of the ADC. The power consumption of both digital circuits and analog circuits is directly proportional to their operating speeds at a constant supply voltage. This suggests that the ADC power dissipation can be scaled by increasing the clock period and reducing the analog bias currents if the ADC is not always running at a fixed (maximum) sampling rate (i.e., speed-on-demand). These approaches have been implemented successfully in [10], [11]. Alternatively, power scaling can also be achieved by clock-gating the digital circuits and power-gating the analog circuits between sleep-active conversions at a constant operating rate. In the case of digital circuits, the clock cycle is disabled during an inactive conversion. In the case of an analog circuit, the bias current is controlled by a switch similar to [12], thus working in the discretetime domain. In our application, the operation speed (i.e., sampling rate) is fixed, and therefore the latter approach has been used. Due to the fact that the sampling circuit is always running through the entire operation plan in order to avoid the dead time, the power scalability can only be performed in auto-zeroing and bit-cycling phases. During the auto-zeroing, the dynamic latch of the comparator and digital logic are disabled by a clock-gating control. During the active conversion, however, the events occur at irregular intervals, hence the analog bias current and clock are power-gated according to the first comparison result. If the first comparison result is zero (i.e.,  $V_{in} < V_{threshold}$ ), the bias current and clock are disabled. Specifically, the signal Sleepmask clock-gates the dynamic latch and finite state machine, and in active conversion, the signal Overthreshold in figure 4.3 power-gates the analog bias current and clock. This approach can significantly save power consumption.

## 4.3.4 Sample-and-hold

### 4.3.4.1 Capacitive Feedback Analysis

The sample and hold circuit is based on a switched-capacitor (SC) circuit. Figure 4.7 shows the typical architecture of a switched-capacitor circuit. It is composed of switches, a sampling capacitor  $(C_s)$ , a feedback capacitor  $(C_f)$  and a operational amplifier. The operation needs non overlapping clocks in order to sample and amplify the input signal. The SC circuit is often used for an integrator. The input signal is sampled first in the



Figure 4.7: Typical sample-and-hold circuit.

sampling capacitor  $(C_s)$  and in the next clock phase the sampled charge is moved to the feedback capacitor  $(C_f)$ . Then the voltage at the output can be amplified by the gain stage. The transfer function of the SC circuit is given by:

$$\frac{V_{out}}{V_{in}} = -\frac{C_s}{C_f} \tag{4.1}$$

The noise power of the output samples is equal to the power spectral density of  $V_{out}$  (during amplification phase) integrated over all frequencies.



Figure 4.8: Small signal model of the switched-capacitor (SC) circuit.

In order to determine the stability and closed loop gain characteristics, the loop gain

of the SC circuit is analyzed [13]. Figure 4.8 shows the small signal model of the SC circuit. The loop transmission can be given by:

$$T(s) = \beta \cdot G_m \cdot (R_0 \parallel \frac{1}{sC_{Ltot}}) \tag{4.2}$$

where  $\beta$  is the feedback factor and  $C_{Ltot}$  is the total capacitor load. The feedback factor is

$$\beta = \frac{C_f}{C_f + C_s + C_p} \tag{4.3}$$

where  $C_p$  the parasitic capacitor at the input node of the operational amplifier. And the total capacitance is

$$C_{Ltot} = C_L + (1 - \beta)C_f \tag{4.4}$$

From equation 4.2, the loop transmission is equal to  $\beta G_m/sC_{Ltot}$  at high frequencies. Therefore the unity gain (crossover) frequency of T(s) is

$$\left|\frac{\beta G_m}{j\omega_c C_{Ltot}}\right| = 1 \Rightarrow \omega_c = \frac{\beta G_m}{C_{Ltot}}$$
 (4.5)

Note that the resistance load  $R_0$  is irrelevant at high frequencies. The closed loop transfer function can be given by

$$A(s) = -\frac{C_s}{C_f} \cdot \frac{T(s)}{1 + T(s)} \tag{4.6}$$

Therefore the -3dB frequency of the closed loop circuit is

$$\left|\frac{\omega_c}{j\omega_{-3dB} + \omega_c}\right| = \frac{1}{\sqrt{2}} \Rightarrow \omega_{-3dB} = \omega_c \tag{4.7}$$

Note that the closed loop bandwidth is equal to unity gain frequency of T(s). In capacitive feedback circuit, the stability and phase margin can be determined by the phase of  $T(j\omega)$  at its unity gain frequency.

#### 4.3.4.2 Operational Amplifier

The operational amplifier used in a switched capacitor circuit is an important building block, which limits the performance such as the accuracy, the speed, the noise and the power consumption. In this section, the design considerations of the operational amplifier are described.

In switch capacitor circuit, the finite open loop gain of the amplifier can cause static error. Also the operational amplifier needs fast settling to voltage steps at the input in order to reduce the dynamic error. The step response of the SC circuit can be given by [14]

$$V_{out}(t) = -\frac{C_s}{C_f} \cdot V_{step} \cdot \frac{T_0}{1 + T_0} \cdot (1 - e^{-t/\tau})$$
(4.8)

where  $T_0/(1+T_0)$  results in the static error  $\varepsilon_0$  and  $1-e^{-t/\tau}$  results in the dynamic error  $\varepsilon_d$ , respectively. Here,  $T_0$  is equal to  $\beta \cdot G_m R_0$  and  $\tau$  is equal to  $1/\omega_{-3dB}$ .

The gain of the operation amplifier can be derived from the tolerable static error, which is

$$\varepsilon_0 = \frac{1}{T_0} = \frac{1}{\beta A_0} \tag{4.9}$$

where  $A_0$  is the finite gain of the amplifier. The settling time  $t_s$  can be defined by the tolerable dynamic error, which is

$$\varepsilon_{d,tol} = e^{-t_s/\tau} \Rightarrow t_s = -\frac{1}{\omega_{-3dB}} \cdot ln(\varepsilon_{d,tol})$$
(4.10)

where  $\omega_{-3dB}$  is equal to  $\beta G_m/C_{Ltot}$ . Here  $G_m/C_{Ltot}$  can be considered as the unity gain frequency of the operational amplifier. The switch capacitor circuit operates in non overlapping clocks. Thus the required number of time constants within 1/2 period of the sampling clock can define the minimum bandwidth, which is

$$t_s = -\frac{1}{\omega_{-3dB}} \cdot ln(\varepsilon_{d,tol}) < \frac{1}{2} \frac{1}{f_{CLK}}$$

$$\tag{4.11}$$

where  $\omega_{-3dB}$  is equal to  $\omega_c$ . Therefore equation 4.11 can be changed to

$$\frac{f_c}{f_{CLK}} > -\frac{\ln(\varepsilon_{d,tol})}{\pi} \tag{4.12}$$

For example if the dynamic error is 1%, the ratio of  $f_c/f_{CLK}$  is equal to 1.5. Furthermore, if the error is decreased to  $10^{-6}$ , the ratio is only  $\sim 3$  times larger up to 4.4.

The dominant noise in capacitive feedback amplifier is thermal noise, which is related to the  $kT/C_{Ltot}$  noise. In order to reduce the noise, the capacitor load needs to be increased. However, that would cause a large current to drive in the amplifier. Therefore



Figure 4.9: (a) Sample and hold architecture and (b) related timing diagram.

a tradeoff should be taken between noise and power consumption.

## 4.3.4.3 S/H Implementation

This ADC employs a sample-and-hold (S/H) circuit at the front-end to eliminate the conversion dead time. In order to realize high speed and low noise performances, a pipelined dual correlated double sampling (CDS) architecture is proposed. Figure 4.9(a) shows the architecture of the S/H circuit, which is composed of a charge redistribution circuit (i.e., first CDS circuit), an auto-zeroing capacitor ( $C_{offset}$ ) and two analog memories ( $C_1$  and  $C_2$ ). The first CDS circuit is a commonly used switched capacitor circuit, which consists of an operational transconductance amplifier (OTA), an input capacitor ( $C_s$ ), a feedback capacitor ( $C_f$ ) and MOS transistor switches.

The timing diagram is shown in figure 4.9(b). This dual CDS architecture can get a high noise suppression because the FPN cancellation is performed twice. Each CDS is used to eliminate the offset of the pixel and OTA individually. The first CDS sequence

is as follows. Firstly, the pixel output signal charge is placed on  $C_s$  during the sampling (Read) phase. After that, there is a pixel clamp operation causing the reset level of the pixel output to appear at  $V_{pixel\_out}$ . This eliminates the offset of the pixel outputs which causes FPN. During the amplification (Calib) phase, the MOS transistor switch controlled by Calib is reconnected to the pixel output signal, and the charges transfer from  $C_s$  to  $C_f$ . Therefore the sampled signal voltage is amplified by the ratio of  $C_s$  to  $C_f$ . The use of column S/H circuits causes FPN due to variations in the OTAs. To reduce the offset of the column OTA, a second CDS circuit is employed, which consists of an auto-zeroing capacitor ( $C_{offset}$ ), two analog memories ( $C_1$  and  $C_2$ ) and MOS transistor switches. The auto-zeroing capacitor stores the offset of the OTA during the sampling phase and corrects the output value during the amplification phase. The two memory capacitors are used to realize a pipelined stage. While one is sampling the input signal, the other one is holding the output voltage to be processed. This pipelined architecture strongly increases the readout speed.

The gain of this S/H circuit is given by

$$G_{S/H} = \frac{V_{out}}{V_{pixel\_out}} \approx \frac{C_s}{C_f} \cdot \frac{C_{offset}}{C_{offset} + C_{1,2}}.$$
(4.13)

Here,  $C_s/C_f$  is the gain of the switched capacitor circuit. The value of  $C_{1,2}$  is chosen to limit the kT/C noise effect. In order to maximize the gain,  $C_{offset}$  should be maximized; however, that would produce a large parasitic capacitor causing a large current to drive in the OTA. Therefore a tradeoff should be taken between gain and power.

The performances of the S/H circuit can be affected by non-idealities such as capacitor mismatch, finite operational amplifier (opamp) gain and incomplete settling. To reduce the error due to capacitor mismatch, symmetric capacitor layout is mandatory. The opamp must have a high gain and a large bandwidth to meet the accuracy and speed requirements. For a n-bit converter, the error due to finite gain and incomplete settling must be less than 1/2 LSB. Therefore the requirements for the opamp gain and unity-gain bandwidth have to satisfy the following condition

$$\left(\frac{1}{A_0 \cdot \beta} + e^{-\omega_{\mu} \cdot \beta/2f_s}\right) < \frac{1}{2^{n+1}}$$
 (4.14)

where  $A_0$  is the opamp dc gain,  $\beta = C_f/(C_s + C_f)$  denotes the feedback factor,  $\omega_\mu$  is the



Figure 4.10: OTA schematic.

opamp unity-gain bandwidth and  $f_s$  is the ADC sampling frequency. For simplicity, each of the errors should be less than 1/4 LSB. Thus, the two requirements for the opamp are

$$A_0 > \frac{2^{n+2}}{\beta} \tag{4.15}$$

$$\omega_{\mu} > \frac{2f_s \cdot (n+2) \cdot ln2}{\beta}.\tag{4.16}$$

The speed of a S/H circuit is determined by the settling time of the opamp that can be categorized into the nonlinear slewing time and the quasilinear settling time. For our application, the period of the sample-and-hold circuit is 160 ns and the time allocated for settling (*Calib*) is 50 ns. The opamp unity gain bandwidth is directly related to the capacitive load. The larger the load capacitance, the higher the power is required to achieve a given bandwidth. From equation 4.15 and 4.16, the required gain and unity gain bandwidth of the OTA for a 4-bit ADC can be calculated.

The simplest way to design a high-gain amplifier is to use a telescopic cascode archi-



Figure 4.11: Loop gain and phase margin of the OTA.

tecture. The single-stage telescopic cascode architecture can achieve the same gain as a two-stage amplifier with only two current legs, therefore having maximum power efficiency. An issue of the telescopic amplifier is its low output swing. In order to get a high output voltage swing, an auxiliary biasing branch is inserted in the structure. Figure 4.10 shows the schematic of the high-gain OTA.

The gain of the telescopic amplifier can be given by

$$A_v \cong g_{m1,2}[(g_{m3,4}r_{o3,4}r_{o1,2}) \parallel (g_{m7,8}r_{o7,8}r_{o5,6})]$$
(4.17)

where  $g_{mi}$  and  $r_{oi}$  are the transconductance and the resistance of the transistor, respectively. The unity gain frequency of the amplifier is given by

$$\omega_u = \frac{g_{m1,2}}{C_{Ltot}} \tag{4.18}$$

where  $g_{m1,2}$  is the transconductance of the input transistor M1, M2 and  $C_{Ltot}$  is the load capacitance. The input referred noise voltage can be given by

$$\overline{V_n^2} = \frac{16}{3}kT \cdot \frac{1}{g_{m1,2}} \left(1 + \frac{g_{m5,6}}{g_{m1,2}}\right) + 2\frac{K_N}{(WL)_{1,2}C_{ox}f} + 2\frac{K_P}{(WL)_{5,6}C_{ox}f} \frac{g_{m5,6}^2}{g_{m1,2}^2}$$
(4.19)

| Parameter                          | Value                                           |  |
|------------------------------------|-------------------------------------------------|--|
| Power supply                       | 3 V                                             |  |
| Loop gain                          | 46.5 dB                                         |  |
| Phase margin                       | 86.5°                                           |  |
| Closed loop bandwidth              | 19.1 MHz ( $C_{load} = 500 \text{ fF}$ )        |  |
| Offset with Monte Carlo simulation | Mean value = $0.39 \text{ mV}$                  |  |
| Offset with Monte Carlo Simulation | Standard deviation = $2.7 \text{ mV}$           |  |
| Current                            | $87 \mu A$                                      |  |
| Input referred noise               | $68.3 \mu V$                                    |  |
| Total area of S/H                  | $35 \ \mu\mathrm{m} \times 193 \ \mu\mathrm{m}$ |  |

Table 4.2: The simulation results of the OTA

where k is Boltzmann constant, T is the Kevin temperature,  $g_{mi}$  is the transconductance of the transistor,  $K_N$ ,  $K_P$  are the process-dependent constants of NMOS and PMOS transistors, respectively. The first item is thermal noise and the last two items are flicker noise, also called 1/f noise. The flicker noise can be ignored. From the equation 4.19, the input referred noise can be reduced by increasing the transconductance of the input transistors, or decreasing the transconductance of the load transistors.

The gain of the sample-and-hold circuit is designed by 4. Considering the parasitic capacitance at the input node, the feedback factor is set to 0.13. According to equation 4.15 and 4.16, the required loop gain and closed loop bandwidth are calculated as 36 dB and 13.2 MHz. In practise, the actual opamp gain and bandwidth should be larger than this calculated value considering any process variation. Figure 4.11 shows the loop gain and phase margin simulation curve. The simulation results of the capacitive feedback OTA are summarized in table 4.2.

## 4.3.5 Comparator

### **4.3.5.1** Principle

The comparator is one of the most important functional blocks in the SAR ADCs. The performance of the converter strongly depends on its constituent comparator to achieve



Figure 4.12: Input/output characteristic of an ideal comparator and a high-gain amplifier.

high resolution and speed. The comparator can be considered as a 1-bit converter, which compares the given input signal with a reference and then produces an output voltage depending on the polarity of the input. Here, the output voltage works as a logic output of ONE or ZERO. The critical performance parameters of the comparator are gain, speed and offset. In the following sections the parameters are described in more detail.

Figure 4.12 (the left) shows the input/output characteristic of an ideal comparator with infinite gain [15], indicating a steep transition while  $V_{in}$  is equal to  $V_{ref}$ . The nonlinear characteristic can be approximated with that of a high-gain amplifier, as shown in the right of the figure 4.12. Here, the slope of the transfer curve around the crossing point  $(V_{in} = V_{ref})$  is equal to the DC gain of the amplifier. Therefore in order to achieve high resolution, the gain of the amplifier should be increased. However, a comparator using a high-gain amplifier will suffer from the trade-offs among speed, gain and power consumption. Usually the comparator incorporates positive feedback to obtain a very large gain and a high speed.

Regenerative latches can be used as a positive-feedback amplifier to realize high gain and speed. In order to avoid unwanted latch-up, a strobe clock signal is used to enable the



Figure 4.13: A simple latch comprising two back-to-back amplifiers.



Figure 4.14: A simplified small signal model of the latch.

latch at the proper time. The latch amplifies small inputs by the positive feedback to a digital output in the regeneration phase. The time response of a simple latch is analyzed in the following.

Figure 4.13 shows a simple latch comprising two identical back-to-back amplifiers with a single-pole response. The simplified small-signal model of the latch is shown in figure 4.14. From the small signal circuit, we can write

$$-G_m V_{in} = \frac{V_{out}}{R_0} + C_L \frac{dV_{out}}{dt}$$

$$\tag{4.20}$$

$$-G_m V_{out} = \frac{V_{in}}{R_0} + C_L \frac{dV_{in}}{dt}$$

$$\tag{4.21}$$

The above equations can be rearranged to

$$-G_m R_0 V_{in} = V_{out} + R_0 C_L \frac{dV_{out}}{dt}$$

$$\tag{4.22}$$



Figure 4.15: Time response of the latch.

$$-G_m R_0 V_{out} = V_{in} + R_0 C_L \frac{dV_{in}}{dt}$$

$$\tag{4.23}$$

Here,  $G_m R_0$  and  $R_0 C_L$  denote the gain  $(A_0)$  and time constant  $(\tau_0)$  of the amplifiers, respectively. Then we have

$$-A_0 V_{in} = V_{out} + \tau_0 \frac{dV_{out}}{dt} \tag{4.24}$$

$$-A_0 V_{out} = V_{in} + \tau_0 \frac{dV_{in}}{dt} \tag{4.25}$$

Subtracting the second equation from the first one, we have

$$V_{out} - V_{in} = \frac{\tau_0}{A_0 - 1} \cdot \frac{d(V_{out} - V_{in})}{dt}$$
 (4.26)

If the initial voltage  $(V_{out} - V_{in})|_{t=0} = V_0$ , then

$$V_{out} - V_{in} = V_0 \cdot exp[(A_0 - 1)\frac{t}{\tau_0}]$$
(4.27)

In a typical latch,  $A_0 \gg 1$ , resulting in a positive exponential function. Therefore  $V_{out} - V_{in}$  can regenerate very quickly in a short time. The regeneration time constant is equal to  $\tau/(A_0 - 1)$ . Figure 4.15 shows the time response of the latch.

The settling time of the latch used for the comparator is the time needed to produce a logic output to trigger the following digital circuit. If  $V_{out} - V_{in}$  is required to reach a



Figure 4.16: Offset compensated comparator with preamplifier.

certain value  $V_1$ , then the settling time is given by

$$T_1 = \frac{\tau_0}{A_0 - 1} ln \frac{V_1}{V_0} \tag{4.28}$$

Equation 4.28 indicates that  $T_1$  can be reduced by decreasing  $\tau$  or increasing  $A_0$ . The required time should be shorter than the allocated time in the regenerative phase. If  $T_1$  is quite long, the phenomenon called "metastability" occurs.

The latch used for comparator can achieve high gain and speed. However, the resolution is limited by its large offset, which is an important parameter to be considered. In order to reduce the offset of the latch, a preamplifier is usually used [16], which is placed in front of the latch. Figure 4.16 shows the architecture of the offset compensated comparator. The offset of the comparator is reduced by the gain of the amplifier so that the precision is improved. Also the kickback noise generated from the regenerative latch can be decreased by the preamplifier. Note that the use of the preamplifier introduces static power consumption, which is larger than the dynamic power consumption of the latch. In order to reduce the static power consumption, the preamplifier can be disabled by a clock signal when appropriate.

#### 4.3.5.2 Implementation

The comparator is responsible for resolving small inputs into digital values. It is composed of a buffer, a preamplifier, a regenerative latch and a static flip-flop, as shown in figure



Figure 4.17: Auto-zeroed comparator diagram.

4.17. The preamplifier provides sufficient gain to compensate for the input referred offset voltage of the dynamic latch and isolates the latch kickback noise. Additionally, the buffer is used to improve the drive capability of the input. In order to reduce their contribution to the comparator offset, an output offset storage (OOS) architecture is used.

The differential inputs of the preamplifier are the pixel output and the DAC output individually. Specifically, the pixel output voltage is stored on the memory capacitor ( $C_1$  or  $C_2$ ). This value is sensitive to the kickback noise, and therefore a buffer is added in the front of the preamplifier to mitigate the kickback noise. Figure 4.18 shows the circuit diagram with the buffer. During the conversion period, either Sample1 or Sample2 is closed and the pixel output voltage is actively settled. When the switch  $\overline{Offsetcancel}$  is closed, the charge on the capacitor is shared with the parasitic capacitance ( $C_{ip}$ ). Consequently, a low input capacitor buffer is employed to reduce this error. Additionally, in order to avoid an accumulated error between different conversions, the charge on the parasitic capacitor should be reset before the sampling switch is closed.

The preamplifier employs two stages to get reasonable gain and speed, as shown in figure 4.19. The first stage is a differential nMOS input pair with diode-connected loads, which determines the gain of the preamplifier. The second stage is a pMOS source follower (SF) to improve the speed, which does not have body effect. Due to the required bias current, the preamplifier consumes static power, which is larger than that of the dynamic latch. In order to reduce the average current draw, the preamplifier is enabled using M9



Figure 4.18: Circuit diagram with buffer.



Figure 4.19: Switched preamplifier schematic.

to turn off the current when appropriate. During the conversion period, when the first comparison result is zero (i.e.,  $V_{in} < V_{threshold}$ ), M9 is turned off and the current through the preamplifier is disabled until the next conversion. The power savings are proportional to the amount of time that the preamplifier is disabled.

The preamplifier biases are chosen to satisfy four specifications: offset, noise, gain and speed. The offset and noise of the preamplifier are eliminated by the OOS architecture.



Figure 4.20: Dynamic error versus the settling time of the ADC array.

Therefore only gain and speed are considered. Note that the input to the comparator has been amplified by the sample and hold, and the preamplifier must settle the input in half of the clock period, with the dynamic latch using the other half. Thus, the requirements for the preamplifier gain and speed are given by [17]

$$A_V \frac{C_0}{C_0 + C_L} > \frac{V_{OS}}{V_{FS}} \cdot \frac{2^{n+2}}{G_{S/H}}$$
(4.29)

$$\omega_{-3dB} > 2f_{clk} \cdot (n+2) \cdot ln2 \tag{4.30}$$

where  $C_0$  is the auto-zeroing capacitor limited by the kT/C noise constraint,  $C_L$  is the input capacitance of the latch,  $V_{OS}$  is the input referred offset voltage and  $V_{FS}$  is the full scale input voltage. Note that  $A_V \approx g_{m1}/g_{m3}$ , where  $g_m$  is the transconductance of the transistor. Thus, in order to improve the preamplifier gain with limited current, the current efficiency  $(g_m/I_D)$  of the input devices should be improved, which suggests the MOS transistors operate in moderate to weak inversion. However, the cutoff frequency  $(f_T)$  of the transistors drops significantly in weak inversion, and therefore a suitable current density  $(I_D/W)$  in moderate inversion has been chosen to satisfy the operating

Value Parameter 3 V Power supply Gain  $16.3~\mathrm{dB}$ Bandwidth 186 MHz ( $C_{load} = 100 \text{ fF}$ ) **CMRR** 60 dBMean value = 0.87 mVOffset Standard deviation = 6.6 mVCurrent  $60 \mu A$ 1.8 V Output swing  $35 \ \mu \text{m} \times 28 \ \mu \text{m}$ Area

Table 4.3: The simulation results of the preamplifier

#### speed constraint.

In order to allocate the settling time for the amplifier, the required time for the offset cancellation is analyzed. When the switch  $\overline{Offsetcancel}$  is closed or opened, the comparator needs time to be stable. Figure 4.20 shows the dynamic error versus the settling time during the offset cancellation. Here, the simulation is performed with 500 ADCs because the sensor prototype aims to be extended to the full size sensor of  $2 \times 2$  cm<sup>2</sup> in the future. For our application, the readout time of each row is set to 160 ns. Thus the required time for the offset cancellation is set to 40 ns in order to get a low dynamic error. Therefore the settling time allocated for the amplifier is 10 ns. According to equation 4.29 and 4.30, the required gain and bandwidth of the amplifier should be larger than 8 dB and 66 MHz. The simulation results of the amplifier are summarized in table 4.3.

The dynamic latch employs a cascode architecture, as shown in figure 4.21. It does not consume static current, thus it is suitable for power efficient design. The latch is a conventional sense-amplifier flip-flop [18], which must be reset after every bit-decision. The operation principle is as follows. When  $Clk\_comp$  is low, the two switches M7 and M8 are closed, and transistor M13 is turned off. The latch outputs are reset to low. When  $Clk\_comp$  goes to high, two switches M7 and M8 are opened. The transistor M13 is turned on and starts regeneration. The input devices (M1-M2) compare the two input voltages since the gain from the inputs to their drains can cause a large difference in



Figure 4.21: Dynamic latch with current source.

| Table 4.4: The simulation results of the dynamic la | Table 4.4: | lts of the dynamic l | e latch |
|-----------------------------------------------------|------------|----------------------|---------|
|-----------------------------------------------------|------------|----------------------|---------|

| Parameter         | Value                                          |  |
|-------------------|------------------------------------------------|--|
| Power supply      | 3 V                                            |  |
| Response time     | 2 ns                                           |  |
| Offset            | Mean value = $61.6 \mu V$                      |  |
|                   | Standard deviation $= 2 \text{ mV}$            |  |
| Input capacitance | 89.2 fF                                        |  |
| Area              | $35 \ \mu\mathrm{m} \times 31 \ \mu\mathrm{m}$ |  |

their drain voltages. This causes a difference in the current through the regenerative loads (M3-M6). As a result, the branch currents get disturbed depending on the input voltages, and therefore the latch regeneration is triggered.

The offset voltage of this dynamic latch can be expressed as [19]

$$V_{OS} = \Delta V_{TH1,2} + \frac{(V_{GS} - V_{TH})_{1,2}}{2} \left(\frac{\Delta S_{1,2}}{S_{1,2}} + \frac{\Delta R}{R}\right)$$
(4.31)

where  $\Delta V_{TH1,2}$  is the threshold voltage mismatch of the input transistors (M1-M2),  $(V_{GS}-V_{TH})_{1,2}$  is the overdrive voltage of the input differential pair,  $\Delta S_{1,2}$  is the physical dimension mismatch between M1 and M2, and  $\Delta R$  is the load resistance mismatch introduced by M3-M6. Note that the first term is a static offset which has a relationship with the size of the differential pair, and the second term is a dynamic offset correlated to the overdrive voltage of the input pair. Thus, the offset can be mitigated by enlarging the size of the differential pair and reducing  $(V_{GS}-V_{TH})_{1,2}$ . However, a large input transistor size introduces a large parasitic capacitor, which decreases the gain of the preamplifier. In this dynamic latch, the size of the input differential pair is set by considering 4.29. In order to reduce the input overdrive voltage, a simple way is to control the tail current of the input pair. Therefore a biased MOS transistor (M14) is cascoded at the bottom of the switched MOS transistor (M13), as shown in figure 4.21. Thus the offset can be significantly reduced with this method. In the Monte Carlo simulation, this dynamic latch has an offset of less than 2 mV. The simulation results of the dynamic latch are summarized in table 4.4.

## 4.3.6 Digital Logic

The digital logic directly drives the switches in the DAC, which must respond fast to ensure a sufficiently short settling time of the DAC. The logic for a successive approximation ADC is usually implemented with a finite state machine. It is based on a shifter and consumes energy that grows approximately with the number of the shifters. Note that a multi-bit/step approximation algorithm starting from the ground reference is used for our application, and four comparison periods are required for a 4-bit ADC. Therefore total 8 states are required in the state machine.

In order to implement a power efficient logic controller, this digital logic uses a Clock Manager to generate the necessary clock signals. Figure 4.22 shows a schematic and a timing diagram of the control logic, which considers the auto-zeroing operation and the four bit-cycles. Sleepmask is the control signal for clock-gating the dynamic latch (i.e., signal Clk\_fsm) and state machine (i.e., signal Clk\_comp) during auto-zeroing. As shown in figure 4.22(b), both clocks are disabled at the high level of Sleepmask. The Overthreshold signal is used to power-gate the comparator and state machine during bit-cycling, according to the first comparison result. If the first comparison result is high,



Figure 4.22: (a) clock manager and (b) related timing diagram.

the *Overthreshold* value keeps high. Then the relevant DAC reference is switched from ground  $(V_{threshold})$  to  $V_{ref}$ . If the the first comparison result is low, the *Overthreshold* value goes low, causing the relevant DAC reference is kept connected to ground. Thus, the comparator and the logic are put into sleep until the next conversion. Using this timing arrangement avoids unnecessary energy consumption.

## 4.3.7 DAC

In a typical SAR ADC, the DAC is implemented with a capacitor array, which is a set of binary-weighted capacitors and an extra unit capacitor. For a n-bit converter, the number of unit capacitors in a capacitor array is  $2^n$  and the unit capacitor size is chosen by the kT/C noise specification. In this ADC, the unit capacitor is chosen at least 100fF. Therefore, the capacitor array can occupy a large area and needs a large current to drive. With the simulation of a full scale array (about  $576 \times 576$ ) with the column pitch of  $35 \mu m$  in our application, the total capacitor network needs eight analog buffers to satisfy



Figure 4.23: DAC diagram.



Figure 4.24: Simulated DAC (a) output voltage and (b) dynamic error.

the speed constraint of the ADC. During the bit-cycling, the capacitor is charged by a buffered analog voltage and the amount of charge is proportional to the size of the capacitor array. Thus, the total switching energy in each conversion can be derived as [20]

$$E_{D/A} = N \sum_{i=1}^{n} \alpha_{D_i} C_{unit} V_{ref}^2$$
 (4.32)



Figure 4.25: Simulated (a) DNL and (b) INL versus code.

where N is the number of column,  $\alpha_{D_i}$  is a parameter corresponding to the digital input code  $D_i$ . In order to minimize the energy of the binary-weighted capacitor array, the unit capacitor should be minimized; however, that would not satisfy the kT/C noise requirement.

Therefore, the concept of completely removing the capacitor network is proposed. Due to the fact that this ADC only needs eight references, the DAC can be implemented with a switch multiplexer, as shown in figure 4.23. Here, the switches are implemented with CMOS switches. The smaller switch size is employed in this DAC to reduce the capacitive load of the reference buffer, and therefore improves the speed. With the switches directly driven by the digital logic, the DAC still needs eight reference buffers. In simulations of the full size array, these eight buffers are enough to drive the total switched DAC. Figure 4.24 shows the output voltages and dynamic error of the DAC in a full-size array. In this approach, the DAC takes a much smaller area and consumes less power while using the equivalent number of reference buffers.

## 4.4 Simulation Results and Layout

The simulation is implemented with a basic clock of 100 MHz. Simulation results show that the equivalent noise charge (ENC) of the pixel circuit is  $11e^-$  with the pixel conversion gain of  $80~\mu\text{V/e}^-$ . The ADC features a variable encoding. Fig.4.25 shows the simulated

Table 4.5: Performance Summary

| Parameter                            | Value                                      |  |
|--------------------------------------|--------------------------------------------|--|
| CMOS technology                      | $0.35~\mu\mathrm{m}~2\mathrm{P4M}$         |  |
| Array size                           | $48 \times 64$                             |  |
| Pixel size                           | $35~\mu\mathrm{m} \times 35~\mu\mathrm{m}$ |  |
| ENC                                  | 11 e <sup>-</sup>                          |  |
| Conversion gain                      | $80~\mu\mathrm{V/e^-}$                     |  |
| Current in one row                   | $53 \mu A$                                 |  |
| ADC resolution                       | 4/3/2                                      |  |
| LSB                                  | 1 mV                                       |  |
| Input range                          | 16 mV (single-ended)                       |  |
| ADC DNL                              | 0.14/-0.09 LSB                             |  |
| ADC INL                              | 0.05/-0.15 LSB                             |  |
| Conversion time                      | 80 ns                                      |  |
| Row time                             | 160 ns                                     |  |
| Inactive power (without hit) per ADC | $486~\mu W @ 3V$                           |  |
| Active power (with hit) per ADC      | $714~\mu W @ 3V$                           |  |
| ADC active area                      | $35 \times 545 \ \mu\mathrm{m}^2$          |  |

differential nonlinearity (DNL) and integral nonlinearity (INL) of the ADC. The maximum DNL and INL are 0.14/-0.09 LSB and 0.05/-0.15 LSB, respectively. The conversion time of the ADC is 80 ns at a sampling frequency of 6.25 MHz and consumes 486  $\mu$ W without hit, which is by far the most frequent. This value rises to 714  $\mu$ W with hit. The specifications and simulation results are summarized in Tab.4.5.

This prototype sensor was designed and fabricated in a 0.35  $\mu$ m 2-poly 4-metal CMOS process through the Austria Mikro System (AMS) company. The layout of the whole chip is shown in figure 4.26. The total area of the chip is 4 × 4.8 mm<sup>2</sup>. The pixel array has 48 × 64 pixels with 35- $\mu$ m pitch. Column-parallel ADC arrays are located below the pixel array. The area of an ADC is 35 × 545  $\mu$ m<sup>2</sup>.

106 4.5. Summary



Figure 4.26: Layout of the sensor prototype.

# 4.5 Summary

In this chapter, a CMOS pixel sensor integrated with 4-bit column-level ADC aiming for the ILD-VTX outer layers has been presented. The architecture includes a pixel array of 48 columns of 64 pixels, column-level ADCs and peripheral digital read-out microcircuit. Each pixel combines in-pixel amplification with a correlated double sampling operation. The ADC uses an efficient power management to minimize digital power dissipation. For further power saving, the static bias current in the preamplifier can be selected dynamically. The proposed switching DAC leads to both lower energy and smaller area. The post simulation results demonstrate the power and area efficiency. In the next chapter, the test results of the prototype will be presented.

BIBLIOGRAPHY 107

# **Bibliography**

[1] C. Hu-Guo and et al., "CMOS pixel sensor development: a fast read-out architecture with integrated zero suppression," *Journal of Instrumentation - JINST*, vol. 4, p. P04012, Apr. 2009.

- [2] C. Hu-Guo, J. Baudot, G. Bertolone, A. Besson, A. S. Brogna, C. Colledani, G. Claus, R. De Masi, Y. Degerli, A. Dorokhov, G. Doziere, W. Dulinski, X. Fang, M. Gelin, M. Goffe, F. Guilloux, A. Himmi, K. Jaaskelainen, M. Koziel, F. Morel, F. Orsini, M. Specht, Q. Sun, and O. Torheim, "First reticule size MAPS with digital output and integrated zero suppression for EUDET-JRA1 beam telescope," Nucl. Instr. and Meth. Phys. Res. A, vol. 623, pp. 480–482, Nov. 2010.
- [3] A. Dorokhov, G. Bertolone, J. Baudot, C. Colledani, G. Claus, Y. Degerli, R. De Masi, M. Deveaux, G. Dozière, W. Dulinski, M. Gélin, M. Goffe, A. Himmi, Ch. Hu-Guo, K. Jaaskelainen, M. Koziel, F. Morel, C. Santos, M. Specht, I. Valin, G. Voutsinas, and M. Winter, "High resistivity CMOS pixel sensors and their application to the STAR PXL detector," Nucl. Instr. and Meth. Phys. Res. A, vol. 650, pp. 174–177, Sep. 2011.
- [4] I. Valin, C. Hu-Guo, J. Baudot, G. Bertolone, A. Besson, C. Colledani, G. Claus, A. Dorokhov, G. Doziere, W. Dulinski, M. Gelin, M. Goffe, A. Himmi, K. Jaaskelainen, F. Morel, H. Pham, C. Santos, S. Senyukov, M. Specht, G. Voutsinas, J. Wang, and M. Winter., "A reticle size CMOS pixel sensor dedicated to the STAR HFT," *Journal of Instrmentation JINST*, vol. 7, p. C01102, Jan. 2012.
- [5] A. Dorokhov, "Optimization of amplifiers for monolithic active pixel sensors," in *Topical Workshop on Electronic for Particle Physics*, Prague, Czech Republic, Sep. 2007.
- [6] M. Winter, J. Baudot, A. Besson, G. Claus, A. Dorokhov, M. Goffe, Ch.Hu-Guo, F. Morel, I. Valin, G. Voutsinas, and L. Zhang, "Development of CMOS pixel sensors fully adapted to the ILD vertex detector requirements," presented at the International Workshop on Future Linear Colliders (LCWS'11), Granada, Spain, Sep. 2011.

108 BIBLIOGRAPHY

[7] M. Winter, "Power consumption of CMOS sensors for an ILD vertex detector," presented at the Linear Collider Power Distribution and Pulsing Workshop, LAL, Orsay, May 2011.

- [8] J. L. McCreary and P. R. Gray, "All-MOS charge redistribution analog-to-digital conversion techniaues-Part I," *IEEE J. Solid-State Circuits*, vol. 10, no. 6, pp. 371– 379, Dec. 1975.
- [9] N. Verma and A. P. Chandrakasan, "An ultra low energy 12-bit rate-resolution scalable SAR ADC for wireless sensor nodes," *IEEE J. Solid-State Circuits*, vol. 42, no. 6, pp. 1196–1205, Jun. 2007.
- [10] K. Gulati and H. S. Lee, "A low-power reconfigurable analog-to-digital converter," IEEE J. Solid-State Circuits, vol. 36, no. 12, pp. 1900–1911, Dec. 2001.
- [11] G. Geelen, E. Paulus, D. Simanjuntak, H. Pastoor, and R. Verlinden, "A 90 nm CMOS 1.2 V 10 bit power and speed programmable pipelined ADC with 0.5 pj/conversion-step," in *IEEE Int. Solid-State Circuits Conf. (ISSCC 2006) Dig. Tech. Papers*, San Francisco, CA, Feb. 2006, pp. 214–215.
- [12] I. Ahmed and D. A. Johns, "A 50 MS/s (35 mW) to 1-KS/s (15 μW) power scaleable pipeline 10 bit ADC with minimal bias current variation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC 2005) Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, pp. 280–281.
- [13] M. Tian, V. Visvanathan, J. Hantgan, and K. Kundert, "Striving for small-signal stability," *IEEE Circuits and Devices Magazine*, vol. 17, no. 1, pp. 31–41, Jan. 2001.
- [14] H. C. Yang and D. J. Allstot, "Considerations for fast settling operational amplifiers," IEEE Trans. Circuit and Systems, vol. 37, no. 3, pp. 326–334, Mar. 1990.
- [15] B. Razavi, Principles of data conversion system design. Piscataway, NJ: IEEE Press, 1995.
- [16] B. Razavi and B. A. Wooley, "Design techniques for high-speed, high-resolution comparators," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1916–1926, Dec. 1992.

BIBLIOGRAPHY 109

[17] B. P. Ginsburg and A. P. Chandrakasan, "Dual time-interleaved successive approximation register ADCs for an ultra-wideband receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 247–257, Feb. 2007.

- [18] J. Montanaro, T. Witek, K. Anne, J. Black, E. M. Copper, W. Dobberpuhl, M. Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, H. Lee, C. M. Lin, C. Madden, D. Murray, H. Pearce, S. Santhanam, J. Snyder, R. Stephany, and C. Thierauf, "A 160-MHz, 32-bit, 0.5-W CMOS RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.
- [19] S. Jiang, M. A. Do, K. S. Yeo, and W. M. Lim, "An 8-bit 200-MSample/s pipelined ADC with mixed-mode front-end S/H circuit," *IEEE Trans. Circuit Syst. I, Reg. Papers*, vol. 55, no. 6, pp. 1430–1440, Jul. 2008.
- [20] C. Liu, S. Chang, G. Huang, and Y. Lin, "A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 731–740, Apr. 2010.

# Chapter 5

# **Experimental Results**

The prototype sensor described in the previous chapter was fabricated in a 0.35  $\mu$ m 2-poly 4-metal CMOS process. In this chapter, the test results of the chip are described. In the first section, the test board and measurement setup for the sensor prototype are described. Then the laboratory test results are presented, which have been performed on pixels and column ADCs in order to determine the basic performances including temporal noise, fixed pattern noise (FPN), equivalent noise charge (ENC), charge collection efficiency (CCE), charge-to-voltage conversion factor (CVF) and nonlinearity.

# 5.1 Test Board and Setup

The test board named proximity board where the chip is bounded should be designed with minimal features in order to minimize the size. It includes few front-end electronics, such as signal buffering and amplification for the critical signals. Furthermore the buffered signals can be transmitted from the chip to the outside or from the outside to the chip. All the test points for the bias are provided on the proximity test board, therefore they can be measured and injected from outside if needed. Figure 5.1 shows the photograph of the proximity board used in the experimental evaluation.

Due to the LVDS circuit inside the chip, the proximity board is provided with 100 MHz basic clock. A slow control JTAG circuit is integrated in the chip in order to remotely program the parameters of the bias reference, signal selection, pattern value, ADC selection and voltage reference. The JTAG configuration will be done by a software under Windows. The synchronization of the chip is realized by two signals of START and



Figure 5.1: Proximity test board.

SPEAK. The 48 columns are multiplexed to 12 serial digital outputs, combining with a programmable pattern header. There are 8 analog parallel outputs, which are buffered to increase the drive capability. The analog buffers are designed with a gain of 2.35. The analog and digital outputs have their own markers in each matrix in order to trigger the Data Acquisition System (DAQ). In order to obtain the characteristic of the ADC, two test voltages are built in the chip. They can be injected from outside if necessary. Also the required 8 voltage references for the DAC can be supplied with external sources if needed.

The data are acquired by a logic analyzer and NI FlexRIO DAQ system. The proximity board is interfaced with the DAQ system by two auxiliary boards. One is called digital auxiliary board, which has been well used for the test of MIMOSA 22 [1]. It generates the default 100 MHz clock by a quartz mounted on the board, buffering the digital outputs and the JTAG configurations. This board makes all the signals from the chip available in LVDS standard and transmits them through RJ45 connectors. It provides also the power supply of the chip and the proximity board. The other one is called RJ45 or PXIe digital board, which accommodates RJ45 connectors from digital auxiliary board and converts them to VDHCI connectors (LVDS) for NI DAQ system and Berg connectors for logic analyzer respectively.



Figure 5.2: Microphotograph of wire-bonded chip.

The chip has been integrated with eight analog drivers in order to compare with the performance of the readout circuits. An another board also called "analog auxiliary board" can be provided for measuring the analog outputs only, which is ever used for the MIMOSA 22 analog test. This board buffers the 8 analog signals into differential signals to transmit in a long distance. The acquisition of the 8 channels is done by two VME and USB2 imager cards [2]. Each card has four 12-bits ADCs with 40 MHz and four buffers, used to perform the CDS operation, allowing therefore to measure the performance of the pixel.

The prototype chip is unpackaged in order to avoid package parasitic capacitances and resistances. The unpackaged chip is mounted on the Printed Circuit Board (PCB) and wire bonded directly on the back of the board in order to perform test with a <sup>55</sup>Fe source. The microphotograph of the fabricated chip is shown in Fig.5.2.

## 5.2 Test Results

MIMOSA 31 has been tested since late November 2012. Preliminary test results indicate that MIMOSA 31 should meet the requirements of the design specifications. Laboratory tests have been performed in three parts: pixel array with column ADCs, pixel test and column ADC test. The test results are used to determine the basic performances includ-

5.2. Test Results



Figure 5.3: Normalized response versus threshold voltage.

ing temporal noise, fixed pattern noise (FPN), equivalent noise charge (ENC), charge collection efficiency (CCE), charge-to-voltage conversion factor (CVF) and nonlinearity.

## 5.2.1 Noise Performance

The transfer curves indicate the noise performance, which have been obtained by sweeping an external threshold voltage. Using the method described in [3], each curve is normalized by calculating the probability of "1" over a large quantity of cycles. The ADC starts a conversion while the signal is larger than the threshold voltage. Therefore the ADC can be considered as a comparator responding with an "S" curve. The output is either "1" or "0". When the signal is close to the threshold voltage, the ADC is self triggered giving a random output. Fig.5.3 shows the transfer curves of the pixel array with ADCs. All the measurements are provided with a 100 MHz main clock frequency. From these curves, which follow the cumulative distribution, one can estimate the temporal noise and the FPN. These results have been analyzed using ROOT software. Fig.5.4 shows that the measured temporal noise and FPN are 1.36 mV ( $\sim 23~e^-$ ) and 0.98 mV ( $\sim 16~e^-$ ), respectively.



Figure 5.4: (a) Measured temporal noise (b) fixed pattern noise

#### 5.2.2 Pixel Performance

The prototype chip has 8 analog channels in order to measure the pixel performance. A DAQ system developed by the IPHC group is used to acquire the raw data. It includes eight 12-bits ADCs on the boards and quantizes the signal information into digital codes stored into a memory. A row of eight pixels is read in 160 ns at a 100 MHz clock frequency. The output signal is sampled twice (Read and Calib), and therefore the useful information is calculated as the subtraction of two successive frames. Such approach is based on the clamping technique in order to perform in-pixel correlated double sampling (CDS).

The experimental data has been analyzed by a MIMOSA Analysis Framework (MAF) [4] and the ROOT analysis software [5]. In particular, MAF is implemented in the Labview environment. This software is dedicated to the laboratory test results with a <sup>55</sup>Fe source calibrations. It was optimized for off-line data treatment, which has proved to be efficient and reliable. MAF has been used for off-line calibration of other CMOS sensor prototypes.

The analog test has been performed by using a soft X-ray source of  $^{55}$ Fe, which emits 5.9 keV photons. Each photon generates a charge of 1640 electrons, and therefore can be used to calibrate the pixel conversion gain and equivalent noise charge (ENC). In the single pixel, the calibration peak (assuming 100% charge collection efficiency) corresponding

5.2. Test Results



Figure 5.5: Calibration results with 5.9 keV X-ray photons for a single pixel. The tested pixel is with a standard epitaxial layer.

to 5.9 keV is  $\sim 237$  ADC units with a Gaussian fit function, as shown in Fig.5.5. The calibration peak of the source corresponds to relatively rare events when all the charges liberated by photons are collected by one diode. This is the reason why the calibration peak has a low amplitude. The ADC has a 12-bit resolution, and is mounted on the acquisition board. Therefore, the conversion performance can be obtained from the readout chain. The measured charge-to-voltage conversion factor (CVF) obtained for single pixel is  $\sim 60~\mu\text{V/}e^-$ . Here, the tested pixel is with standard epitaxial layer.

In order to study the charge distribution, the performance of pixel clusters were analyzed. The software allows performing different cluster sizes  $(2 \times 2, 3 \times 3 \text{ and } 5 \times 5)$  built around the central pixel. Fig.5.6 shows the total charge collection peak in different clusters. With the calibration peak, the charge collection efficiency (CCE) has been evaluated. The measured CCEs on p1, p4, p9 and p25 are 18%, 49%, 66% and 74%, respectively.

The noise ( $\sim 2.69$  ADC) is a measured root mean square (rms) value derived form temporal noise and pedestal noise. The measured equivalent noise charge (ENC) corresponds to  $\sim 18.6~\rm e^{-}_{rms}$ . The row temporal noise was affected by the clamping voltage. Since the clamping capacitor is implemented with a MOS transistor, the kT/C noise will



Figure 5.6: Total charge collection with 5.9 keV X-ray photons for different clusters. The test is performed at room temperature.

be slightly reduced by increasing the clamping voltage. Moreover, this high voltage causes the source follower to operate into the linear region.

## 5.2.3 ADC Performance

In order to obtain the basic performance of the column-level ADC, the test was performed without pixel signal. The test bench, as shown in figure 5.7, is used to measure the temporal noise, the FPN and the nonlinearities. The digital control is composed of a JTAG configuration to set the timing, and a digital oscilloscope to monitor the signal

5.2. Test Results



Figure 5.7: Test bench of the ADC measurement.



Figure 5.8: Timing diagram of the ADC measurement.

generator. The threshold voltage generator generates two kinds of voltages. One is used as a common mode voltage, and the other one is as a variable threshold signal. The digital board provides power supply and data transmission. The digital output of the column-level ADC is buffered with LVDS drivers, due to the large parasitic capacitance of the lines, and captured by the logic analyzer with a PC interface. A digital marker is also provided to the logic analyzer to synchronize the ADC.

The ADC was measured with two built-in test voltages, giving an analog stimulus with variable amplitude similar to the signal from the pixel. Each ADC can be disabled by a JTAG register. The digital control JTAG configuration is implemented in a LabView



Figure 5.9: Normalized number of counts versus threshold voltage.



Figure 5.10: (a) Measured temporal noise and (b) fixed pattern noise

environment. A timing diagram for the ADC measurement is shown in figure 5.8. The signal *Read* and *Calib* are used for CDS, and the signal *Sample1* and *Sample2* are used for the pipelined sample-and-hold. The signal *Sample1* samples the pixel signal of row n while *Sample2* samples the pixel signal of row n+1. The ADC requires 8 references, which can be provided by either the JTAG controlled DAC references or the external source via

120 5.2. Test Results



Figure 5.11: Basic histogram test setup.



Figure 5.12: Number of counts versus ADC output code.

the pads.

Fig.5.9 shows the normalized response versus threshold voltage, measured on the full column ADCs. All the ADCs measured here are provided with 100 MHz main clock frequency. From these curves, which follow the cumulative distribution, one can estimate the variance  $\sigma$  value of the temporal noise and FPN. These results have been analyzed using the ROOT software. Figure 5.10 shows the measured temporal noise and FPN of the column ADCs which are 0.96 mV<sub>rms</sub> and 0.40 mV<sub>rms</sub>, respectively.

In order to obtain the nonlinearity of the ADC, the decision levels such as the input voltages at all code boundaries need to be determined. A popular way is to use the



Figure 5.13: (a) Measured DNL and (b) INL versus code.

histogram testing method to find code transitions. The basic histogram test setup is shown in figure 5.11. It employs a very slow ramp signal as the input signal. The speed of the ramp signal is adjusted to provide the nonlinearity resolution, e.g. an average of 10 outputs of each code represents 1/10 LSB resolution. Figure 5.12 shows the number of counts of each code. From these counts, differential nonlinearity (DNL) can be derived from the total number of occurrences of each code. The digital outputs are processed in histograms described as follows. Step one removes the over-ranged bins, i.e. 0 and 12 for this ADC. Step two divides the count of each code by the average count. Step three subtracts one from the result. Then the result is DNL, which is given by

$$DNL_b = \frac{Count_b}{Average\ count} - 1 \tag{5.1}$$

The integral nonlinearity (INL) is obtained simply using the running sum of the DNL, given by

$$INL_b = \sum_{i=1}^{b-1} DNL_b \tag{5.2}$$

This ADC features a variable encoding. Fig.5.13 shows the measured differential non-linearity (DNL) and integral nonlinearity (INL) obtained from single column ADC. The measured extremum DNL and INL are 0.49/-0.28 LSB and 0.29/-0.20 LSB, respectively. The conversion time of the ADC is 80 ns at a sampling frequency of 6.25 MHz and con-

5.3. Summary

Table 5.1: Performance Summary of the Sensor Prototype

| Parameter                            | Value                                          |
|--------------------------------------|------------------------------------------------|
| CMOS technology                      | $0.35~\mu\mathrm{m}~2\mathrm{P4M}$             |
| Array size                           | $48 \times 64$                                 |
| Pixel size                           | $35 \ \mu\mathrm{m} \times 35 \ \mu\mathrm{m}$ |
| Row time                             | 160 ns                                         |
| Current per pixel                    | $3 \mu A$                                      |
| ENC                                  | $18.6 \; {\rm e}^{-}_{rms}$                    |
| Conversion gain                      | $60~\mu\mathrm{V/e^-}$                         |
| ADC resolution                       | 4/3/2                                          |
| Conversion time                      | 80 ns                                          |
| Temporal Noise                       | $0.96~\mathrm{mV}_{rms}$                       |
| Column FPN                           | $0.40~\mathrm{mV}_{rms}$                       |
| ADC DNL                              | 0.49/-0.28 LSB                                 |
| ADC INL                              | 0.29/-0.20 LSB                                 |
| Inactive power (without hit) per ADC | $486 \ \mu W @ 3V$                             |
| Active power (with hit) per ADC      | $714 \ \mu W @ 3V$                             |
| ADC active area                      | $35 \times 545 \ \mu \mathrm{m}^2$             |

sumes 486  $\mu$ W without hit, which is by far the most frequent. This value rises to 714  $\mu$ W with hit. These computations indicate an average power consumption of each column in the order of 487  $\mu$ W, assuming a typical occupancy of  $\sim 0.5\%$  [6] in the whole sensor. This value slightly rises to 489  $\mu$ W with a safety factor of 3. The experimental results are summarized in Tab.5.1.

### 5.3 Summary

MIMOSA 31 is the first CMOS sensor prototype integrating 4-bit column-level ADCs for the ILD-VTX outer layers. The preliminary test results indicate that MIMOSA 31 should meet the requirements of the design specifications. The characterization of MIMOSA 31 will be completed by the beam tests. The prototype sensor was designed with the specifications of the full scale sensor (about  $2 \times 2$  cm<sup>2</sup>), and therefore can be easily extended in the future. In the next chapter, improvements on MIMOSA 31 will be presented.

124 BIBLIOGRAPHY

#### **Bibliography**

[1] G. Claus, W. Dulinskiand, K. Jaaskelainen, M. Specht, and M. Goffe, "Mimosa 22 testing preparation," presented at the EUDET-JRA1 meeting, DESY, Hamburg, Jan. 2008.

- [2] A. Besson, G. Claus, G. Deptuch, M. Deveaux, W. Dulinskiand, G. Gaycken, D. Grandjean, A. Himmi, K. Jaaskelainen, P. Jalocha, and M. Winter, "A portable system for monolithic active pixel sensors characterization," presented at the IEEE NSS/MIC Conference, Rome, Italy, Oct. 2004.
- [3] Y. Degerli, A. Besson, G. Claus, M. Combet, A. Dorokhov, W. Dulinski, M. Goffe, A. Himmi, Y. Li, and F. Orsini, "Development of binary readout CMOS monolithic sensors for MIP tracking," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 1, pp. 354–363, Feb. 2009.
- [4] A. Besson, "Mimosa analysis framework (maf) used in test beam," presented at the EUDET-JRA1 meeting, Geneva, Jan. 2007.
- [5] "Root." [Online]. Available: http://root.cern.ch/drupal
- [6] Rita de Masi and M. Winter, "Improved estimate of the occupancy by beamstrahlung electrons in the ILD vertex detector," arXiv:0902.2707v1[physics.ins-det], 2009.

# Chapter 6

# Improvements on MIMOSA 31

Improvements on the CMOS pixel sensors for the outer layers have been studied since the sensor prototype was designed and fabricated. Due to the massive beamstrahlung background expected at the ILC, and the correspondingly increased raw data rates, it is necessary to develop faster, sparsified readout architecture for the CMOS pixel sensors. Based on the physical model of clusterized hit patterns in CMOS sensors, a zero suppression algorithm and circuit have been analyzed to reduce the sensor data rate by more than an order of magnitude, and to reduce the load on data acquisition by performing a fist reconstruction on-chip. In this chapter, a zero suppression method for the digital outputs in MIMOSA 31 is described in more detail. Also, in the following sections, optimizations such as power saving techniques for the column ADC are presented, which can be also extended to the discriminator.

### 6.1 Zero Suppression for MIMOSA 31

The raw data flow of CMOS pixel sensors integrated with column-level 4-bit ADCs can reach up to 80 Gbits/s per frame, which is impossible to transmit to the outside. This implies the use of a fast zero-suppression circuit in order to increase the readout speed. The zero-suppression micro circuit allows a data compression factor ranging from 10 to 1000, depending on the hit density per frame.

In this section, we start by presenting physical characteristics as a way of calculating hits due to particles, and how these hits are modeled in a pixel matrix with 4-bit digital outputs. Then the sparsified readout architecture suitable to MIMOSA 31 is presented



Figure 6.1: Characteristics of the full size sensor in the outer layers.

and the separated sparse banks in pixel matrix are described.

#### 6.1.1 Physical Characteristics

Figure 6.1 shows the characteristics of the full size sensor in the outer layers. The beam structure anticipated for the ILC, contain 2820 bunch crossings (BXs), each separated by 369 ns and followed by a bunch gap of 199 ms. The physic environment can be used to calculate counts of hits traveling through the sensor. Due to a frame readout time of typically 100  $\mu$ s, there are 271 bunch crossings per image in the CMOS sensor. The hit density in the outer layers is about 0.03 hits/cm<sup>2</sup>/BX for instance, which contributes to 8.1 hits/cm<sup>2</sup> in one frame. The full size sensor with a pixel pitch of 35  $\mu$ m is about 2 × 2 cm<sup>2</sup>, containing a matrix of 576 × 576 pixel array. Then, each frame can cause about 33 hits in the outer layer. Assuming each hit is reconstructed by a regular cluster of 5 pixels (one central pixel with four surrounding pixels), therefore the occupancy of the pixel matrix can be calculated as 0.05%.

#### 6.1.2 Hit Recognition

The CMOS sensor operates in a rolling shutter mode and transmits the serialized digital outputs by activating a multiplexer in 16 clock cycles. The digital matrix is processed row by row containing the hits being encoded and addressed. This is performed in terms



Figure 6.2: Schematic of the pixels delivering signal above threshold.

of states, which contains the column address of the first hit pixel and digital outputs of the associated pixels. Figure 6.2 shows a concept of digital matrix with hits. Each row includes M states to be processed, which is derived from a statistical study based on the highest occupancy expected in the pixel matrix.

#### 6.1.3 Data Sparsification Algorithm

The data sparsification architecture for the CMOS sensors can be chosen according to the readout strategy. As already introduced in previous section, the integration time of the CMOS sensor in the outer layers is less than 100  $\mu$ s with row by row rolling shutter mode. Therefore the data sparsification algorithm for analyzing the pixel matrix is similar to that of MIMOSA 26, which is based on a row by row sparse data scan readout.

In CMOS pixel sensors, it is assumed that a regular cluster is composed of a central pixel and four crown pixels, as shown in figure 6.3. For a one-dimensional row analysis, the combination of sparse scan and state encoding can be used to identify and encode patterns of hits in a row. The sparse data scan operates as a sliding window, processing each row from left to right. Each hit can be recognized by providing the address of the first



Figure 6.3: Concept of a hit represented by a regular cluster.



Figure 6.4: Concept of sparse scan including searching states and identifying hits with a group of three pixels.

pixel in a cluster and the digital outputs value of the hit pixels. The operation principle is shown in figure 6.4.

The architecture performs zero suppression by encoding groups of neighbouring hit pixels in each row. With the state encoding, the pixel matrix is processed line by line within several segments. The circuit for identifying and encoding states has been well used in MIMOSA 26, as shown in figure 6.5.

The state encoding circuit employs a pipelined datapath. The operation principle is as follows. The digital outputs of each ADC are transmitted to a combinational state encoder, going through a NOR gate to identify every pixel that corresponds to the first pixel in a state. For each pixel identified as the first pixel of a state, an enable signal is validated, together with the corresponding 4-bit digital outputs for every such pixel.

In order to select the enabled state, a sparse scan based on a priority look ahead (PLA) encoding is performed from left to right until being blocked by the enabled first pixel. Then the corresponding digital outputs are selected. This stage also handles the



Figure 6.5: Block diagram of the state encoding circuit.

column address encoding. In the next state, the encoded state including column address and digital outputs are transmitted.

#### 6.1.4 Separated Sparse Banks in Pixel Matrix

The long path of the sparse scan going through the state encoding circuit dictates the speed of looking ahead. The delay through the sparse chain gives the minimally allowed clock periods, while the hit density gives the maximum number of states to be extracted from a line, and therefore the minimally required clock periods. As a consequence, with a 2 cm long sensor, the required clock frequency becomes higher than the maximum clock frequency. Therefore it is necessary to split the state encoding circuit into separated sparse banks, which allows finding up to N states per bank which result from the encoding of 3

130 6.2. Self-Timed ADC

contiguous hit pixels. The outputs of different banks can be collected in the next stage by a multiplexer structure that is used to select states for the entire line.

With the simulation of the state encoding circuit, up to six states are extracted during the processing of each line. Each hit represents a regular cluster size of  $3 \times 3$  dimensions and can be encoded in three states. This allows up to 2 hits per line to be processed in a matrix without missing hits. Therefore the maximally tolerated segment width for a give hit density can be calculated with the maximally allowed number of hits per bank.

The probability of having exactly n bits in a bank can be given by the Poisson equation, which is

$$P(n,\lambda_l) = \frac{\lambda_l^n \times e^{-\lambda_l}}{n!} \tag{6.1}$$

where  $\lambda_l$  is the average number of hits per line. If  $\lambda_l \ll 1$ ,  $e^{-\lambda_l}$  can be equal to  $(1 - \lambda_l)$ . Then equation 6.1 changes to:

$$P(n,\lambda_l) = \frac{\lambda_l^n \times (1-\lambda_l)}{n!} = \frac{\lambda_l^n}{n!}$$
(6.2)

The average number of hits per line can be calculated as:

$$\lambda_l = \frac{hits/frame}{number\ of\ lines} \tag{6.3}$$

In order to process separated banks, it has to start with the average number of hits per frame and divide by the number of lines to obtain the average number of hits per line in a group. Then the number of hits per line has to be divided by the assumed number of banks. With the cumulative Poisson distribution function, the probability of a hit within each bank can be found. Having the probability of a hit in a bank, it can be used to verify if the number of banks is suitable to keep the probability below an acceptable level. As a design rule of thumb, a probability of  $10^{-3}$  for hits in a line has been considered acceptable for the CMOS sensors. As a result, the pixel matrix of MIMOSA-out can be divided into 9 banks of 64 columns.

#### 6.2 Self-Timed ADC

The ADC in MIMOSA 31 has proven to offer attractive performances while maintaining low power consumption and small area. In order to further reduce the power consumption,



Figure 6.6: Improved ADC block diagram.

an ultra-low power column-level ADC is developed. The structure of the new solution consists of an improved sample-and-hold circuit and a self-timed comparator. The total power consumption is significantly reduced while at the same time the ADC keeps a high conversion speed. The post simulation results demonstrate that the improvements are close to 53% while all the parameters are kept identical.

The circuits including sample-and-hold and bit-cycling have been optimized to further reduce the power consumption of the column-level ADC in MIMOSA 31. The multi-bit/step self-triggered ADC has demonstrated its power and area efficiency within the expected frequency range. As a result, the multi-bit/step architecture is preferred. In this section, the column-level multi-bit/step ADC is reviewed, and techniques are described in more detail to achieve ultra-low power consumption while maintaining the conversion of signals with the frequencies expected.

#### 6.2.1 Architecture Design

Figure 6.6 shows the overall architecture of the self-timed 4-bit ADC, which is similar to that of MIMOSA 31. The main components of the ADC are a front-end sample-and-hold (S/H), a reference voltage based digital-to-analog converter (DAC), a self-timed comparator, and a digital logic. A pipelined front-end S/H is employed to sample and amplify the pixel signal. A switched DAC generates a reference voltage based on the computed digital value from the logic. The comparator includes a buffer, a preamplifier

132 6.2. Self-Timed ADC



Figure 6.7: (a) Enhanced sample and hold architecture and (b) related timing diagram.

and a dynamic latch to decide whether the DAC output is positive or negative, serially producing the digital output bits. Based on the output of the comparator, the digital logic performs the multi-bit approximation algorithm and drives the switches of the DAC. Additionally, a *Done* feedback signal from the comparator is employed for self-timed bit-cycling.

### 6.2.2 Enhanced S/H Circuit

This ADC still employs a sample-and-hold (S/H) circuit at the front-end to eliminate the conversion dead time. In order to realize high speed and low noise performances, a pipelined correlated double sampling (CDS) architecture is used. The S/H is enhanced by using single correlated double sampling (SCDS) architecture to reduce the fixed pattern

noise (FPN) from the pixel and operational amplifier. This avoids an extra auto-zeroing capacitor, minimizing the power consumption and area while maintaining the conversion of signals within the expected frequency range. Figure 6.7(a) shows the schematic of the proposed CDS circuit. The CDS circuit is composed of an OTA, an input capacitor  $(C_s)$ , a feedback capacitor  $(C_f)$ , two analog memories  $(C_1$  and  $C_2)$  and MOS transistor switches.

The timing diagram is shown in figure 6.7(b). The S/H operation is controlled by two nonoverlapping clock phases (Read and Calib) and two sampling phases (Sample1 and Sample2). During the Read period, the pixel output voltage is sampled in  $C_s$  while the offset voltage ( $V_{os}$ ) of the OTA is sampled in both of  $C_s$  and  $C_f$ . Thus  $C_s$  is charged to  $V_{in} - V_{os}$ , and  $C_f$  is charged to  $V_{os}$ . After that, there is a pixel clamp operation causing the reset level of the pixel output. This eliminates the offset of the pixel outputs which causes FPN. And then, Calib is turned on to correspond the signal level pixel output in  $C_s$ . Since capacitors  $C_s$  and  $C_f$  hold the OTA offset voltage, they are always connected to the virtual ground node X. Then when the switch driven by Calib turn on, the total charge entering node X is  $C_sV_{in} + C_fV_{out} = 0$ , which leads to the relation between  $V_{in}$  and  $V_{out}$  independent of  $V_{os}$  [1], [2]. Therefore the OTA offset is eliminated and the charge in  $C_s$  transferred into  $C_f$ . As a result, the transfer function of the CDS circuit is given by

$$V_{out} = \frac{C_s}{C_f} V_{in}. (6.4)$$

The two memory capacitors chosen from the kT/C noise constraint are used to realize a pipelined stage. While one is used to sample the input signal, the other one is used to holde the output voltage to be processed. This pipelined architecture strongly increases the readout speed.

The performances of the S/H circuit can be effected by non-idealities such as capacitor mismatch, finite operational amplifier (opamp) gain and incomplete settling. To reduce the error due to capacitor mismatches, a symmetric capacitor layout must be necessary. During the Calib intervals the output is pulled to  $V_{os}$ , and the opamp must have a high slew rate and fast settling time to enable  $V_{out}$  to slew back. Also the closed-loop gain of the stage is affected by the dc gain of the opamp. Therefore the opamp must have high gain and wide bandwidth to meet the accuracy and speed requirements.

As mentioned in 4.3.4.3, The simplest way to design a high-gain amplifier is to use a

134 6.2. Self-Timed ADC



Figure 6.8: OTA schematic.



Figure 6.9: Loop gain and phase margin of the OTA.

telescopic cascode architecture. The single-stage telescopic cascode architecture is similar to that of MIMOSA 31, which can achieve the same gain as two stage amplifier with only two current legs, therefore having maximum power efficiency. An issue of the telescopic amplifier is its low output swing. In order to get a high output voltage swing, an auxiliary biasing branch is inserted in the structure. Figure 6.8 shows the schematic of the high-gain

| Parameter             | Value                                          |
|-----------------------|------------------------------------------------|
| Power supply          | 3 V                                            |
| Loop gain             | 41.3 dB                                        |
| Phase margin          | 87.9°                                          |
| Closed loop bandwidth | $25.5 \text{ MHz} (C_{load} = 240 \text{ fF})$ |
| Current               | $40 \mu A$                                     |
| Input referred noise  | $75 \mu V$                                     |
| Total area of S/H     | $35~\mu\mathrm{m}~	imes~129~\mu\mathrm{m}$     |

Table 6.1: The simulation results of the OTA

#### OTA.

The gain of the sample-and-hold circuit is designed by 4 ( $C_s = 400 fF$ ,  $C_f = 100 fF$ ). Considering the parasitic capacitance at the input node, the feedback factor is set to 0.18. According to equation 4.15 and 4.16, the required loop gain and closed loop bandwidth are calculated as 36 dB and 13.2 MHz. In practice, the actual opamp gain and bandwidth should be larger than this calculated value considering any process variation. Figure 6.9 shows the loop gain and phase margin simulation curve. The simulation results of the capacitive feedback OTA are summarized in table 6.1.

#### 6.2.3 Self-Timed Comparator

The efficiency of the comparator is improved by using a self-timed bit-cycling, and the preamplifier settling time is relaxed, thus reducing the current. Together with the self-timed comparator the total power consumption is significantly reduced up to 54% in inactive mode and 40% in active mode, compared with the ADC in MIMOSA 31.

One disadvantage of the ADC architecture is the feedback required between successive clock periods, which limits the speed of the converter. Specially, the result of the previous comparison is necessary to generate an improved estimation for determining the next bit. While this feedback path is entirely digital, its latency must be minimized to allow maximum time for analog signals to settle in the switched DAC array and preamplifiers.

In order to decrease the latency during bit-cycling, a self timing technique is used [3], [4], wherein the latch triggers the start of the next bit-cycle when it has determined a

136 6.2. Self-Timed ADC



Figure 6.10: Waveforms showing (a) standard bit-cycling (b) self timed bit-cycling.

value. Self-timing is particularly useful because the latch typically resolves in much less than 2 ns (10% of the clock period).

A standard timing scheme for controlling bit-cycling is shown in figure 6.10(a). The dynamic latch starts comparing during the first half of the clock cycle. After the latch output has settled, the remaining of the second half of the clock is used by the DAC and preamplifiers for the next bit. Figure 6.10(b) shows the self-timed bit-cycling used in this design. During the bit-cycling, the regeneration of the latch is always triggered by the rising edge of Clk\_comp. In the first bit-cycle, the latch resolves quickly, asserting the Done signal (NAND of the latch outputs). This causes the rising edge Clk\_fsm, the clock for the finite state machine of the digital logic, to set the next bit immediately, rather than waiting for the falling edge of Clk\_comp. Then the DAC and preamplifiers start settling to the corresponding value. Consequently, their settling time can be increased to nearly one whole clock cycle. This relaxes the requirements for the preamplifier, and correspondingly reduces the power consumption.

In case of metastability of the latch that may not trigger, the rising edge of  $Clk\_fsm$  is triggered by ClkADC, and the bit-cycling continues as normal. In this way, the self-timed bit-cycling increases the time given to settle the DAC array and preamplifiers by up to 60%.



Figure 6.11: Self-timed comparator diagram.



Figure 6.12: Switched preamplifier schematic.

The remaining limiting part of the converter is the comparator, which is crucial for the overall power consumption. It is composed of a buffer, a preamplifier, and a regenerative latch, as shown in figure 6.11. It has been modified by simplifying the preamplifier and dynamic latch. The preamplifier provides sufficient gain to compensate for the input referred offset voltage of the dynamic latch and isolates the latch kickback noise. Additionally, the buffer is used to improve the drive capability of the input. In order to reduce their contribution to the comparator offset, an output offset storage (OOS) architecture

6.2. Self-Timed ADC

| Parameter    | Value                                          |
|--------------|------------------------------------------------|
| Power supply | 3 V                                            |
| Gain         | 13.2 dB                                        |
| Bandwidth    | $45.8 \text{ MHz} (C_{load} = 100 \text{ fF})$ |
| Current      | $20 \ \mu A$                                   |
| Output swing | 1.2 V                                          |
| Area         | $35 \ \mu \text{m} \times 19 \ \mu \text{m}$   |

Table 6.2: The simulation results of the preamplifier



Figure 6.13: Dynamic latch with current source.

is used.

138

The preamplifier employs a differential nMOS input pair with triode load to get reasonable gain and speed, as shown in figure 6.12. The pMOS transistors M3 and M4 operate in the linear region and behave as a resistor. Due to the required bias current, the preamplifier consumes static power, which is larger than that of the dynamic latch. In order to reduce the average current draw, the preamplifier is enabled by using M5 to turn off the current when appropriate. During the conversion period, when the first comparison result is zero (i.e.,  $V_{in} < V_{threshold}$ ), M5 is turned off and the current through



Figure 6.14: Simulated (a) DNL and (b) INL versus code.

the preamplifier is disabled until the next conversion. The power savings are proportional to the amount of time that the preamplifier is disabled.

The settling time allocated for the DAC and amplifier is set to 16 ns. According to equation 4.29 and 4.30, the required gain and bandwidth of the amplifier should be larger than 8 dB and 41 MHz. The simulation results of the amplifier are summarized in table 6.2.

Figure 6.13 shows a schematic of the dynamic latch. Because a dynamic latch does not consume static current, it is suitable for an energy efficient design. The latch has been designed using a NAND gate to enable the duty cycle. When the latch regeneration is triggered, the *Done* signal is pulled to high to enable the asynchronous control clock.

#### 6.2.4 Simulation Results

The self-timed ADC has been designed in a 0.35  $\mu$ m 2-poly 4-metal CMOS process. In order to reduce the effect of crosstalk and give enough space to transmission lines, the layout was drawn with a smaller width, slightly increasing the length. The active area of the ADC is 35  $\times$  590  $\mu$ m<sup>2</sup>. Figure 6.14 shows the simulated differential nonlinearity (DNL) and integral nonlinearity (INL) of the ADC. The extremum DNL and INL are 0.11/-0.16 LSB and 0.10/-0.06 LSB, respectively. The conversion time of the ADC is 80 ns at a sampling rate of 6.25 MS/s. With the self-timed technique, the power consumption

| Item                                | Data                                  |
|-------------------------------------|---------------------------------------|
| Process                             | $0.35~\mu\mathrm{m}~\mathrm{CMOS}$    |
| Pitch size                          | $35~\mu\mathrm{m}$                    |
| Resolution                          | 4/3/2                                 |
| Input range                         | 16 mV (single-ended)                  |
| Least significant bit (LSB) Voltage | 1 mV                                  |
| Sampling rate                       | 6.25  MS/s                            |
| Conversion time                     | 80 ns                                 |
| ADC DNL                             | 0.11/-0.16 LSB                        |
| ADC INL                             | 0.10/-0.06 LSB                        |
| Inactive power (without hit)        | $225~\mu\mathrm{W} \ @ \ 3\mathrm{V}$ |
| Active power (with hit)             | $425~\mu\mathrm{W} \ @ \ 3\mathrm{V}$ |
| Active area                         | $35 \times 590 \ \mu \mathrm{m}^2$    |

Table 6.3: Performance summary

is reduced to 225  $\mu$ W without hit, which is by far the most frequent. This value rises to 425  $\mu$ W with hit. These computations indicate an average power consumption of each column in the order of 226  $\mu$ W, assuming a typical occupancy of  $\sim 0.5\%$  in the whole sensor. This value slightly rises to 228  $\mu$ W with a safety factor of 3. The specifications and simulation results are summarized in table 6.3.

#### 6.3 Extended Self-Timed Discriminator

The self-timed technique can be extended to the discriminator in order to further reduce the power consumption. The discriminator has been optimized based on ULTIMATE which is fabricated in 0.35  $\mu$ m technology. Figure 6.15 shows the schematic of the discriminator, which employs four continuous stages to get a high resolution. It is composed of preamplifiers, source followers and dynamic latch. In order to reduce the offset, a combination of output offset storage and input offset storage architectures is used.

In order to reduce the average current draw, the preamplifiers and source followers are



Figure 6.15: Schematic of the discriminator.



Figure 6.16: Enabled (a) preamplifier and (b) source follower.

enabled using a MOS switch to turn off current when appropriate. During the operation period, the regeneration of the latch is always triggered by the rising edge of *Latch*. When the comparison result is stable, the MOS switch is turned off and the current through the preamplifiers and source followers are disabled until the next conversion. The power savings are proportional to the amount of time that the preamplifiers and source followers are disabled. Figure 6.16 shows the schematic of enabled preamplifier and source follower.

The control logic is shown in figure 6.17(a), which use a NAND gate and D flip-flop to

142 6.4. Summary



Figure 6.17: (a) Enable signal generation and (b) related timing.

generate the enable signal. A self-timed timing scheme for controlling is shown in figure 6.17(b), which considers the *Read* and *Calib* operation. In the sampling operation, the enable signal is generated by the rising edge of *Read*. Then the preamplifier and source follower are switched on and start amplifying the input signal. During the comparison period, the latch resolves quickly, asserting the *Done* signal (NAND of the latch outputs). Then the enabled signal is triggered to switch off the preamplifiers and source followers until the next conversion. Using this self-timed method the power consumption can be saved up to 26.5% in simulation without introducing extra control signals.

### 6.4 Summary

Based on the physical environment of the ILC vertex detector, zero suppression algorithms for MIMOSA 31 have been analyzed to reduce the output data rates. The self-timed ADC employs an enhanced sample-and-hold circuit and a self-timed timing to obtain an ultra

low-power consumption while keeping a reasonable area. The post simulation results show that the power consumption can save up to 53% while all the parameters are kept identical. Also the self-timed technique can be extended to the discriminator to save power consumption.

144 BIBLIOGRAPHY

### **Bibliography**

[1] S. Lim, J. Cheon, Y. Chae, W. Jung, D. Lee, M. Kwon, K. Yoo, S. Han, and G. Han, "A 240-frames/s 2.1-Mpixel CMOS image sensor with column-shared cyclic ADCs," *IEEE J. Solid-State Circuits*, vol. 46, no. 9, pp. 2073–2083, Sep. 2011.

- [2] C. C. Enz and G. C. Temes, "Circuit techniques for reducing the effect of op-amp inperfections: Autozeroing, corelated double sampling, and chopper stabilization," vol. 84, no. 11, pp. 1584–1614, Nov. 1996.
- [3] G. Promitzer, "12-bit low-power fully differential switched capacitor noncalibrating successive approximation ADC with 1 MS/s," *IEEE J. Solid-State Circuits*, vol. 36, no. 7, pp. 1138–1143, Jul. 2001.
- [4] B. P. Ginsburg and A. P. Chandrakasan, "Dual time-interleaved successive approximation register ADCs for an ulta-wideband receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 247–257, Feb. 2007.

# Conclusions and Perspectives

#### Conclusions

The International Linear Collider (ILC) physics programme expresses a growing need for high precision flavour tagging, especially on short lived particles (e.g. charmed meson and tau lepton) through their decay vertex. This requires an excellent vertexing and tracking system in order to reconstruct the secondary vertices and measure precisely the momenta of tracks. This objective translates into the necessity of a precise vertex detector compared to the existing state of the art. Taking advantage of the ILC running conditions, which are much less demanding than those at the Large Hadron Collider (LHC), physics driven specifications such as spatial resolution can be privileged at the expense of read-out speed or radiation tolerance. CMOS Pixel Sensors (CPS) also called Monolithic Active Pixel Sensors (MAPS) have demonstrated good performances towards the specifications of vertex detector. They can easily match the targeted granularity and material budget, and do not introduce a cooling system which adds material budget in the fiducial volume of the vertex detector.

The topic of this thesis was the design of a CMOS pixel sensor prototype adapted to the ILD vertex detector (VTX) outer layers. The ILD VTX has driven stringent requirements on the CMOS pixel sensors. There are two different geometries for the VTX. One of them (VTX-SL) features 5 equidistant single layers, while an alternative option (VTX-DL) features 3 double layers. Sensors equipping the innermost layer in both geometries should exhibit a single point resolution better than 3  $\mu$ m associated to a very short integration time because of the beamstrahlung background. This requirement motivates an R&D effort concentrating on a high read-out speed design. The sensors envisioned for the outer layers have less constrains. A single point resolution of 3-4  $\mu$ m combined with an integration time shorter than 100  $\mu$ s are expected to constitute a valuable trade-off. In

this case, the design effort focuses on minimizing the power consumption. A larger pixel pitch of 35  $\mu$ m combined with a 4-bit ADC is proposed, therefore reducing the power consumption and keeping necessary spatial resolution.

The prototype sensor (called MIMOSA 31) includes a pixel array of 48 columns of 64 pixels, column-level ADCs and peripheral digital read-out micro-circuit. The pixels are read out in row by row rolling shutter mode. Each pixel combines in-pixel amplification with a correlated double sampling operation, which has been well used in previous sensors (MIMOSA 26 designed for EUDET beam telescope and ULTIMATE equipping the STAR-PXL sub-system). The column-level ADC accommodating the pixel readout in parallel completes the conversion by performing a multi-bit/step approximation. The ADC design resembles the successive approximation register architecture (SAR), featuring low power consumption with moderate speed (several MS/s). Previous prototypes allowed to check that the noise floor of the pixel is about 1 mV. Since the particle position reconstruction improves when using the charge center of gravity, the small signals (a few mV) approaching to the noise are much more important. In order to improve the resolution on the particle position reconstruction, the least significant bit (LSB) is set at the level of the noise. Earlier physics studies show that a rough encoding of the high amplitude delivered by those pixels in a cluster does not degrade the resolution. Therefore a variable charge encoding is employed, ranging from a maximum of 4 bits for signals of small magnitude to only 2 bits for large signals. After the A/D conversion, the digital outputs are loaded in memory buffers, which will be serially transmitted to the outside through the 8 to 1 multiplexer. The setting parameters of the sensor are remotely programmable through the JTAG circuits. In order to compare the performance of the readout circuits, the chip is integrated with eight analog drivers.

Accounting the fact that in the outer layers of ILC VTX, the hit pixel density is in the order of a few per thousand, the ADC is designed to operate in two modes (active and inactive) in order to minimize the power consumption. The ADC employs a threshold voltage to trigger the conversion. If the pixel signal is higher than the threshold, the ADC works in active mode and does the conversion, otherwise, the ADC works in inactive mode and goes asleep until the next conversion. Using this way can significantly save power dissipation.

This prototype sensor was designed and fabricated in a 0.35  $\mu$ m 2-poly 4-metal CMOS process, covering an area of 4 × 4.8 mm<sup>2</sup>. The prototype chip is unpackaged in order to

avoid package parasitic capacitances and resistances. The unpackaged chip is mounted on the Printed Circuit Board (PCB) and wire bonded directly on the back of the board in order to perform tests with <sup>55</sup>Fe source. The test board named "proximity board" where the chip is bounded was designed with minimal features in order to minimize the size and components cost. It includes few front-end electronics, such as signal buffering and amplification for the critical signals. The proximity board is interfaced with the DAQ system by two auxiliary boards. The chip has been integrated with 8 analog drivers in order to compare with the performance of the readout circuits. Another board called analog auxiliary board was provided for measuring the analog outputs only. The acquisition of the 8 channels is done by two VME and USB2 imager cards. Each card has four 12-bits ADCs with 40 MHz and four buffers, used to perform the CDS operation, and therefore the performance of the pixel can be obtained.

MIMOSA 31 has been tested since late November 2012. Laboratory tests have been performed in three parts: noise performances, pixel test and column-level ADC test. The test results are used to determine the basic performances including temporal noise, fixed pattern noise (FPN), equivalent noise charge (ENC), charge-to-voltage conversion factor (CVF), charge collection efficiency (CCE) and nonlinearity.

The transfer curves indicate the noise performances, which have been obtained by sweeping the external threshold voltage. From these curves, which follow the cumulative distribution, one can estimate the temporal noise and the FPN. The measured temporal noise and FPN of the pixel array with column ADCs are 1.36 mV ( $\sim 23~e^-$ ) and 0.98 mV ( $\sim 16~e^-$ ), respectively. With the pixel test, the measured equivalent noise charge (ENC) corresponds to 18.6  $e_{rms}^{-}$ . The measured charge-to-voltage conversion factor (CVF) obtained for single pixel is  $\sim 60 \ \mu V/e^-$ . In order to study the charge distribution, performances of pixel clusters were analyzed. The measured charge collection efficiency (CCE) on p1 (central pixel), p4 ( $2 \times 2$  pixels), p9 ( $3 \times 3$  pixels) and p25 ( $5 \times 5$  pixels) are 18\%, 49%, 66% and 74%, respectively. With the ADC test, the nonlinearity has been analyzed. The measured extremum differential nonlinearity (DNL) and integral nonlinearity (INL) are 0.49/-0.28 LSB and 0.29/-0.20 LSB, respectively. The conversion time of the ADC is 80 ns at a sampling frequency of 6.25 MHz and consumes 486  $\mu$ W without hit, which is by far the most frequent. This value rises to 714  $\mu$ W with hit. These computations indicate an average power consumption of each column in the order of 487  $\mu$ W, assuming a typical occupancy of  $\sim 0.5\%$  in the whole sensor. This value slightly rises to 489  $\mu W$ 

with a safety factor of 3.

MIMOSA 31 is the first CMOS sensor prototype integrating 4-bit column-level ADCs for the ILC VTX outer layers. The preliminary test results indicate that MIMOSA 31 should meet the requirements of the design specifications. The characterization of MIMOSA 31 will be completed by the beam tests. The prototype sensor was designed with the specifications of the full size sensor (about  $2 \times 2 \text{ cm}^2$ ), and therefore can be easily extended in the future.

#### Perspectives

Improvements on CMOS pixel sensor for the outer layers have been studied since the sensor prototype was designed and fabricated. Based on the physical model of clusterized hit patterns in MIMOSA 31, a zero suppression algorithm and circuit have been analyzed to reduce sensor data rate with more than an order of magnitude. Due to the massive beamstrahlung background required for ILC, and correspondingly increased raw data rates, it is necessary to develop a faster, sparsified readout architecture for MIMOSA 31 in the future.

In order to further reduce the power consumption, an ultra-low power self-timed ADC is proposed. The structure of the new solution consists of an improved sample-and-hold (S/H) circuit and uses a self-timed technique. The S/H is enhanced by using single correlated double sampling (CDS) architecture to reduce the fixed pattern noise (FPN) from the pixel and operational amplifier. This avoids an extra auto-zeroing capacitor, minimizing the power consumption and area while maintaining the conversion of signals within the expected frequency range. The efficiency of the comparator is improved by using a self-timed timing, and the preamplifier settling time is relaxed by the self-timed bit-cycling. Together with the self-timed comparator the total power consumption is significantly reduced up to 54% in inactive mode and 40% in active mode, compared with the ADC in MIMOSA 31. Therefore the next step is to fabricate the prototype integrated with the self-timed ADC and compare its performance with MIMOSA 31.

The prototype sensor has proven to offer attractive performances since it was manufactured in a 0.35  $\mu$ m CMOS process. However, it is far from approaching the potential of the CMOS sensor technology relying on the fabrication parameters. For example, the

number of metal layers, limited to 4, complicates substantially the integration of the ADC. The next step of the development is therefore relying on a 0.18  $\mu$ m process, which features several improvements with respect to the 0.35  $\mu$ m. This R&D programme is currently pursued in IPHC group. With a smaller feature size technology, the intrinsic speed of the circuit will improve the frame rate of the sensor. Also the power consumption will be significantly reduced because of the overall reduction of capacitors, and the low power supply. Besides the improved ionizing radiation tolerance due to the thinner gate oxide, the metal layers are increased up to 6-7 making it easier to interconnect and decreasing the dead zone. The process provides deep well allowing using both types of transistors inside the pixel circuit. Furthermore, the resistance of the epitaxial layer amounts to several  $k\Omega \cdot cm$ , and therefore improves the charge collection efficiency.

# Appendix A

# JTAG Configurations

The instruction register of the JTAG controller is loaded with the code of the desired operation to perform or with the code of the desired data register to access. Table A.1 shows the JTAG instruction registers in MIMOSA 31.

Table A.1: JTAG instruction registers

| Instruction | $f 5 \; Bit \; Code_{16}$ | Selected Register | Size |
|-------------|---------------------------|-------------------|------|
| ID_CODE     | 0E                        | DEV_ID            | 32   |
| BIAS_GEN    | 0F                        | BIAS_DAC          | 160  |
| PATT_LINE0  | 10                        | PATTERN_LINE0     | 96   |
| DIS_ADC     | 11                        | DIS_ADC           | 48   |
| PIX_SEQ     | 12                        | SEQ_PIX           | 112  |
| MONITORING1 | 13                        | MONITORING1       | 30   |
| PATT_LINE1  | 14                        | PATT_LINE1        | 96   |
| ADC_SEQ     | 15                        | SEQUENCER_ADC     | 160  |
| MONITORING2 | 17                        | MONITORING2       | 8    |
| RO_MODE2    | 1C                        | ReadOut Mode 2    | 8    |
| RO_MODE1    | 1D                        | ReadOut Mode 1    | 8    |
| RO_MODE0    | 1E                        | ReadOut Mode 0    | 8    |

## A.1 DEV\_ID Register

The device identification register has fixed value inside the chip, as shown in table A.2.

| Bit  | Bit Name | Default Value Code <sub>16</sub> |             |     |
|------|----------|----------------------------------|-------------|-----|
|      |          |                                  | ASCII       | HEX |
|      |          |                                  | М           | 4D  |
| 31-0 | ID_CODE  | 4D333101                         | 3           | 33  |
|      |          |                                  | 1           | 31  |
|      |          |                                  | <soh></soh> | 01  |

Table A.2: ID code of MIMOSA 31

## A.2 BIAS\_DAC Register

The BIAS\_DAC register sets 20 DAC registers simultaneously, as shown in table A.3.

| Bit Range | DAC Number | Internal Name  | Description                |
|-----------|------------|----------------|----------------------------|
| 159-152   | DAC19      | outVclpPix     | Pixel clamping voltage     |
| 151-144   | DAC18      | iPix           | Pixel source follower bias |
| 143-118   | DAC17-16   | iRefTest<1:0>  | Reference test voltage     |
| 127-120   | DAC15      | iRefADCCM      | Common mode voltage        |
| 119-56    | DAC14-7    | irefADC<7:0>   | ADC reference voltage      |
| 55-48     | DAC6       | iBiasADC<2>    | S/H bias                   |
| 47-40     | DAC5       | iBiasADC<1>    | Amplifier bias             |
| 39-32     | DAC4       | iBiasADC<0>    | Buffer bias                |
| 31-24     | DAC3       | iLVDSRx        | LVDS PAD bias              |
| 23-16     | DAC2       | iLVDSTx        | LVDS PAD bias              |
| 15-8      | DAC1       | iBiasAnaDriver | Analogue Buffer bias       |
| 7-0       | DAC0       | iBiasBuffer    | Reference Buffer bias      |

Table A.3: Bias generation register

### A.3 PATT\_LINE Register

This register 'patt\_line' (192 bits large) emulates the ADC outputs. There are two modes to control the registers: en\_patt\_only and en\_linemarker. With en\_patt\_only being active (high level), the pixel matrix is ignored and replaced by a virtual matrix constituted of 'patt\_line0' and 'patt\_line1'. This test mode emulates the pixel response with the contents of the patt\_line0 and patt\_line1 in order to verify the digital processing. In the en\_linemarker mode, it adds two rows at the end of matrix as readout outputs.

Table A.4: Pattern line register
register Name Basic Configuration (

| Bit Range | register Name | Basic Configuration Code <sub>1</sub> |  |
|-----------|---------------|---------------------------------------|--|
| 191-96    | patt_line0    | AAAA                                  |  |
| 95-0      | patt_line1    | AAFFFF1                               |  |

### A.4 DIS\_ADC Register

This register is used to disable the ADC in a specific column if it is noisy. The default value of the DIS\_ADC register is 0. In Mimosa31, the DisADC<47> is on the left side while DisADC<0> is on the right side.

Table A.5: Disable ADC register

| 47 MSB     | 0 LSB     |
|------------|-----------|
| DisADC<47> | DisADC<0> |

### A.5 PIX\_SEQ Register

The PIX\_SEQ registers are 112 bits large. This register contains all parameters to generate readout pixel sequence, as shown in table A.6.

| Bit Range | Bit Number | Signal Name | Basic Config |
|-----------|------------|-------------|--------------|
| 111-96    | DataRdPix  | sel_row_int | FFFF         |
| 95-80     | DataClp    | clamp       | 01C0         |
| 79-64     | DataCalib  | calib       | FC00         |
| 63-48     | DataRdDsc  | read        | 001F         |
| 47-32     | DataLatch  | latch       | C000         |
| 31-0      | DataPwrOn  | pwr_on      | FFFFFFFF     |

Table A.6: Pixel sequencer configuration

## A.6 ADC\_SEQ Register

The ADC\_SEQ registers (160 bits large) contains all parameters to generate ADC sequence, as shown in table A.7.

| Bit Range | Bit Number    | Signal Name  | Basic Config |
|-----------|---------------|--------------|--------------|
| 159-144   | delkade       | clkadc       | AAAA         |
| 143-128   | dsleepmask    | sleepmask    | 00FF         |
| 127-112   | doffsetcancel | offsetcancel | 000F         |
| 111-96    | dsample1_0    | sample1_0    | 0000         |
| 95-80     | dsample1_1    | sample1_1    | 7FFE         |
| 79-64     | $dsample2\_0$ | $sample2\_0$ | 7FFE         |
| 63-48     | $dsample2\_1$ | sample2_1    | 0000         |
| 47-32     | dinitadc      | initadc      | 000E         |
| 31-16     | Unused        | -            | -            |
| 15-0      | Unused        | -            | -            |

Table A.7: ADC sequencer configuration

Appendix B

Schematic of the Test Board



Figure B.1: First page of the schematic.



Figure B.2: Second page of the schematic.



Figure B.3: Third page of the schematic.