

# Multilevel Half Rate Phase Detector with using Gate Diffusion Technique to Recover the Data and Clocks

# Kanagavalli S A<sup>1</sup>& Padma S I<sup>2</sup> & Mustafa Nawaz S M<sup>3</sup>

<sup>1</sup>PG SCHOLAR, PET ENGINEERING COLLEGE <sup>2</sup>ASSISTANT PROFESSOR, PET ENGINEERING COLLEGE <sup>3</sup>ASSISTANT PROFESSOR, PET ENGINEERING COLLEGE \*\*\*

**Abstract** – In the recent communication system of electronic industry will focusing on high speed signal processing application, with be assistance of optical to electrical data communications, multiplexing technique of TDMA, CDMA, OFDMA. Here this communication will required a high priority to recover clock and reduced jitter in clock and data with high performance synchronous operations such as the retiming and demodulation. Here this work will present a technique of Half-Rate (HR), Bang-Bang (BB), Phase Detector(PD) based multiple decision of clock and data recovery about the sign and magnitude of the phase shift between the PD inputs. Here, this proposed architecture will introduced a GDI (Gate Diffusion Input) technique and reduced the number of transistors in this data and clock recovery technique of HR-BBPD and ML-HR-BBPD at 90nm CMOS Technology with using DSCH3 and MICROWIND software and compared all the parameters in terms of Area, Delay and Power.

**Key Words:** GDI (Gate Diffusion Input), Half Rate (HR), Phase Detector(PD), Bang Bang (BB), Multilevel (MI), ML-HR-BBPD.

## **1. INTRODUCTION**

Clock and data recovery (CDR) is a key function in many serial communication systems, from optical to electrical communications, but especially for high-speed signaling. The performance of the clock recovery is crucial for the reliability of the communication system, especially important to perform synchronous operations such as the retiming and demodulation of the input data. Jitter in the clock, defined as the uncertainty in the edge placement in the clock waveform, results in distortion of the data signals waveforms. This jitter translates in oscillator phase deviation from ideal, which results in phase noise. Although other systems such as delay-locked loops or phase interpolator-based CDR are used in some cases, phase locked loops (PLLs) are the most widespread systems to implement a reference-less CDR. It is composed of a voltage-controlled oscillator (VCO), which generates the required clock, a phase detector (PD), which compares the phase of the generated clock to that of the randomized input data, and a charge pump (CP), which charges or discharges a loop filter (LF) to generate the required control signal for the VCO. The PD is one of the critical blocks of the CDR as it determines the phase error between the input data and the clock, which conditions the control voltage for the VCO, and therefore the correct agreement between the clock and data edges. Although a linear PD is sometimes used in a binary or bangbang (BB) PD is usually preferred in high-speed CDRs due to its simplicity, good phase adjustment, highspeed operation, and low power. The BBPD provides a binary output, which gives information about the sign of the phase shift between its inputs, i.e., if the clock is lagging or leading the input data The Alexander PD or variations of it, such as the inverse Alexander PD where the outputs (Early and Late) are inverted (Late and Early) are the most commonly used PD in highspeed designs. Other topologies have been presented but their complexity is increased. All these Alexander-based PDs work at a full-rate clock frequency; which means that the frequency of oscillation of the VCO is the same as the data rate of the input data. At high speed, a half-rate PD (HR-PD) is very useful to reduce the requirements of the VCO and increase the throughput of the system. CDRs implemented with an HR-PD sense the input data at full rate but use a VCO running at half the input rate. This technique also relaxes the speed requirement of the PD. In this brief, we propose a new multilevel HR BB PD (ML-HR-BBPD). Thanks to the ML operation that provides information about the sign and the magnitude of the phase difference between the PD inputs, the bit error rate (BER) performance of the output data as well as the jitter of the clock generated with a PLL-based CDR is improved compared to the conventional two-levels HR-PD. Although ML-BBPD have been already proposed in some PLL-based CDR with very interesting results, to our best knowledge, they have never been proposed for an HR system. The main objective of this brief is, therefore, to

provide an ML alternative to the conventional HR-PD and perform a comparison of the two topologies. For that the two PDs have been included in a PLL-based CDR system that is used as a testbench for comparison the conventional topology of single-level HR PDs the proposed ML HR PD is presented followed by the details of subblocks provides the main performances results of the proposed detector in a 5 Gb/s HR CDR circuit in 28nm FDSOI and compares them to the performance of the conventional detector. A VCO with a typical 0.5 GHz/V gain generates the required clocksignal. As described before, in the proposed ML-HR-PD topology, four phases of the clock are needed (0°, 45°, 90°, 135°) plus the negated one (180°), while in the conventional topology two phases (the clock  $0^{\circ}$  and its quadrature replica  $90^{\circ}$ ) and their respective opposite (180°) plus an extra delay phase are needed. Although coupled LC-VCOs could be used to generate the required phases, a fourstage differential ring oscillator is preferred in applications where phase noise is not critical, as it presents lower power consumption and area as well as direct generation of the require clocks phases. The same number of stages is required for both PD topologies so no extra difficulty or power is added for the proposed topology. State of art VCOs allow us to have estimated power consumption as low as 180 µW working at frequencies higher than 5 GHz.



Fig 1: GDI -ML-HR-BBPD DESIGN

## **2. METHODOLOGIES**

With the intensified research in low power, high speed embedded systems such as mobiles, laptops, etc has led the VLSI technology to scale down to nano regimes, allowing more functionality to be integrated on a single chip. The wish to improve the performance of logic circuits, once based on traditional CMOS technology, resulted in the development of many logic design techniques during the last two decades. One form of logic that is popular in low-power digital circuits is passtransistor logic (PTL). Formal methods for deriving passtransistor logic have been presented for nMOS. They are based on the model, where a set of control signals is applied to the gates of nMOS transistors. Another set of data signals are applied to the sources of the ntransistors. The PTL (Pass Transistor Logic) is most popular for low power digital circuits. Some of the main advantages of PTL over standard CMOS design are 1) high speed, due to the small node capacitances; 2) low power dissipation, as a result of the reduced number of transistors; and 3) lower interconnection effects due to a small area. But the implementation of PTL has two basic problems: 1) slow operation at reduced power supply as the threshold voltage drop across the single channel pass transistor results in low drive current, 2) the high input voltage level at the regenerative inverter is not Vdd, the PMOS device in the inverter is not fully turned OFF and hence direct path static power dissipation is significant. GDI is a technique which is suitable for design of fast, low power circuits using reduced number of transistors compared to traditional CMOS design and existing PTL techniques.



Fig: 2 GDI NAND SCHEMATIC

In Fig 2. The GDI NAND gate which is equal to an GDI AND gate followed by a NOT gate. Here the input B is given at the drain of NMOS(n1). The outputs of GDI NAND gates

are high if any of the inputs are low as shown in the table 1.

| INPUT |   | OUTPUT |
|-------|---|--------|
| А     | В | Y      |
| 0     | 0 | 1      |
| 0     | 1 | 1      |
| 1     | 0 | 1      |
| 1     | 1 | 0      |

Table 1 : NAND Gate Truth Table



Fig: 3 GDI XOR SCHEMATIC

In Fig 3 will shown GDI XOR Gate its similar to the CMOS XOR gate, its take only 4 CMOS transistor, compared to Conventional CMOS XOR operation it will take very less transistor operation in such us it will take very less area, delay and power. Truth table of GDI XOR Gate will shown in Table.2

| INPUT |   | OUTPUT |  |
|-------|---|--------|--|
| А     | В | Y      |  |
| 0     | 0 | 0      |  |
| 0     | 1 | 1      |  |
| 1     | 0 | 1      |  |
| 1     | 1 | 0      |  |

 Table 2: XOR Gate Truth Table



Fig: 3 GDI AND SCHEMATIC

In Fig 3 will shown GDI AND Gate its similar to the CMOS conventional AND gate, its take only 2 CMOS transistor, compared to Conventional CMOS AND operation it will take very less transistor operation in such us it will take very less area, delay and power. Truth table of GDI AND Gate will shown in Table.3.

| INPUT |   | OUTPUT |
|-------|---|--------|
| А     | В | Y      |
| 0     | 0 | 0      |
| 0     | 1 | 0      |
| 1     | 0 | 0      |
| 1     | 1 | 1      |

Table 3: AND Gate Truth Table

It can be seen that large number of functions can be implemented using the basic GDI cell. MUX design is the most complex design that can be implemented with GDI, which requires only 2 transistors, which requires 8-12 transistors with the traditional CMOS or PTL design. Many functions can be implemented efficiently by GDI by means of transistor count. Table 4 shows the comparison between GDI and the static CMOS design in terms of transistors count. It can be seen from table II that using GDI technique AND, OR, Function1, Function2, XOR, XNOR can be implemented more efficiently. However to implement NAND, NOR it requires 4 transistors as that in Static CMOS design. NAND and NOR the universal logic gates, any Boolean Function can be implemented using these gates, are most very efficient and popular with static design style. Function1 and Function2 are universal set for GDI, and consists of only two transistors, compared to NAND and NOR. These functions can be used

e-ISSN: 2395-0056 p-ISSN: 2395-0072

synthesize other functions more effectively than NAND and NOR gates. Hence the circuit with minimum powerdelay product is taken as the trade off and is considered as the circuit which has incorporated maximum optimization. Efforts are being made to show an implementation of D-FF with a minimal power dissipation to the best extent possible. After all the study that has been of the various circuits the GDI technique

# **RESULT AND DISCUSSION**



Fig: 4 ML\_HR\_BBPD MICROWIND DESIGN.

| Fechnology and Design rules<br>CMOS 90nm, 6 Metal Copper - strained                                     | SiGe - LowK(default.rul)                                                                                               |
|---------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| Set as default technology                                                                               | Detail of design rules                                                                                                 |
| Stucture<br>sources 22655/300000<br>ext 65/2090<br>nain array 1505 x 68                                 | 7.0% full<br>3.3% full<br>30.1% full                                                                                   |
| Layout Bize<br>Watts: 101:7µm (2034 lambda)<br>Height: 50:2µm (1004 lambda)<br>Sutt 5105:3µm2 (0.0 mm2) | Electrical Properties<br>electrical nodel Click "Extract<br>nNOS devices Click "Extract<br>pNOS devices Click "Extract |

Fig: 5 ML\_HR\_BBPD DESIGN PROPERTIES.

|                       | Multilevel Half-RatePhaseDetectorforClock and Data Recovery Circuits(Dsch31)CMOSGDI |                |             |                |
|-----------------------|-------------------------------------------------------------------------------------|----------------|-------------|----------------|
|                       | HR_BB<br>PD                                                                         | ML_HR_BB<br>PD | HR_BB<br>PD | ML_HR_BB<br>PD |
| Symbo<br>ls           | 224                                                                                 | 363            | 195         | 293            |
| Node                  | 90                                                                                  | 151            | 76          | 113            |
| Lines                 | 50                                                                                  | 52             | 37          | 52             |
| Delay<br>Regist<br>er | 845                                                                                 | 1363           | 575         | 881            |

#### Table 4: COMPARISON CMOS & GDI

|                            | Multilevel Half-Rate Phase Detector for<br>Clock and Data Recovery Circuits<br>(Microwind31) |                |             |                |
|----------------------------|----------------------------------------------------------------------------------------------|----------------|-------------|----------------|
|                            | CMOS                                                                                         |                | GDI         |                |
|                            | HR_BB<br>PD                                                                                  | ML_HR_BB<br>PD | HR_BB<br>PD | ML_HR_BB<br>PD |
| Width<br>(um)              | 102.1                                                                                        | 101.7          | 101.7       | 101.7          |
| Width<br>in<br>lambd<br>a  | 2042                                                                                         | 2034           | 2034        | 2034           |
| Height<br>(um)             | 34.6                                                                                         | 58.6           | 33.4        | 50.2           |
| Height<br>in<br>lambd<br>a | 692                                                                                          | 1172           | 668         | 1004           |
| Surfac<br>e Area<br>(um2)  | 3532.7                                                                                       | 5959.6         | 3396.8      | 5105.3         |
| Power<br>(mW)              | 0.263                                                                                        | 0.831          | 0.154       | 0.431          |
| boxes                      | 16296                                                                                        | 26941          | 14777       | 22655          |

## Table: 5 COMPARISON CMOS & GDI PROPERTIES

## CONCLUSION

Design of GDI (Gate Diffusion Technique) of all the logic gates in this work, afterwards using those logic gates to construct DFF and HR-BBPD and ML-HR-BBPD and proved the comparisons in terms of Area, Delay and Power at 90nm CMOS Technology. It can be seen that large number of functions can be implemented using the basic GDI cell. MUX design is the most complex design that can be implemented with GDI, which requires only 2 transistors, which requires 8-12 transistors with the traditional CMOS or PTL design. Many functions can be implemented efficiently by GDI by means of transistor count.

# REFERENCES

[1] Sujatha Hiremath, Akshata Mathad, Amruta Hosur, Dr.Deepali Koppad, Design of Low Power standard cells using Full Swing Gate Diffusion Input||, IEEE International Conference on Smart Technology for Smart Nation, pp 940-945, 948-1-5386-0569-1, 2017.

[2] Arkadiy Morgenshtein, Viacheslav uzhaninov, Alexey Kovshilovsky, Alexander Fish, Full-Swing Gate Diffusion Input Logic Case-study of low-power CLA adder design ,INTEGRATION, the VLSI journal 47, pp 62-70, 2014.

[3] P.A.Irfan Khan, SK.Dilshad, B.Karuna Sree, Design of 2x2 Vedic Multiplier using GDI Technique||, IEEE International conference on Energy, Communication, Data Analytics and Soft. Computing (ICECDS- 2017), pp 1925-1928, 978-1-5386-1887-5, 2017.

[4] M. Alioto, Ultra-low power VLSI circuit design demystified and explained: a tutorial, IEEE Transactions on Circuits and Systems Part I (invited) 59 (1) (2012) 3–29.

[5] M Thamaraiselvi, GR Mahendra Babu, Design of High Accuracy Fixed Width Modified Booth Multiplier for MAC Unit, Ciit journal of Digital Signal Processing, Vol 04, Issue 4, pp 142-145.

[6] M. Verbeke, P. Rombouts, X. Yin, and G. Torfs, "Inverse Alexander phase detector," Electron. Lett., vol. 52, no. 23, pp. 1908–1910, 2016.

[7] D. Rennie and M. Sachdev, "A novel tri-state binary phase detector," in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, LA, USA, May 2007, pp. 185–188.

[8] M. Ramezani, C. Andre, and T. Salama, "Analysis of a half-rate bangbang phase-locked-loop," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 7, pp. 505–509, Jul. 2002.

[9] G. Shu et al., "A reference-less clock and data recovery circuit using phase-rotating phase-locked loop," IEEE J. Solid-State Circuits, vol. 49, no. 4, pp. 1036–1047, Apr. 2014.

[10] C. Sanchez-Azqueta, C. Gimeno, C. Aldea, and S. Celma, "New multilevel bang-bang phase detector," IEEE Trans. Instrum. Meas., vol. 62, no. 12, pp. 3384–3386, Dec. 2013.