## SA 19.1: A 10Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission

Richard C. Walker, Kuo-Chiang Hsieh, Thomas A. Knotts, Chu-Sun Yen

Hewlett-Packard Laboratories, Palo Alto, CA

With Internet host counts doubling every five to seven months, there is a pressing need for high-speed interconnect circuits in routers, switches, and computer systems [1]. These transmitter (TX) and receiver (RX) chips for 10Gb/s serial data transmission provide clock generation, 16:1 multiplexing, clock recovery, 1:16 demultiplexing and loss of signal (LOS) detection. No errors are seen in a 24-hour test period across 21 feet of 0.190" diameter coax, implying a BER better than  $10^{-14}$  and demonstrating the feasibility of short-distance 10Gb/s transmission using copper-based links. The monolithic 3.0W TX and 5.5W RX chips are implemented in a 25GHz f<sub>T</sub> Si-bipolar process [2]. A data rate of 40% of f<sub>T</sub> is the highest reported architectural performance for a chipset at this level of integration.

A simplified block diagram of the TX chip is shown in Figure 1. 16bit. Parallel data is provided to the chip along with a 625MHz clock. The on-chip PLL locks the internal 2.5GHz multi-phase VCO to the incoming clock using a wide-range phase/frequency detector. The input data is serialized in two stages and drives a high-speed output amplifer.

The RX chip uses a multiphase 2.5GHz clock generator consisting of a 4-stage (8-phase) ring oscillator followed by phase-interpolators (Figure 2). The four main clock phases at  $0^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$  and  $270^{\circ}$ are used to sequentially latch the incoming data bits. An intermediate  $45^{\circ}$  phase and two non-critical phases at  $22^{\circ}$  and  $67^{\circ}$  are used to implement phase and data-lock detectors. The PLL and statemachine are otherwise identical to that previously described [3].

A conventional 8-phase 2.5GHz ring oscillator requires four 50ps delay stages, with each delay stage contributing 45° to the oscillation period (Figure 3a). Because it is not possible to achieve a 50ps variable-delay cell over process and temperature, a "leap-frog" VCO structure is implemented to accommodate larger nominal gate delays. In Figure 3b, the input of delay stage n is phase advanced by summing in a signal from the output of stage n-2. Each stage of a 2.5GHz oscillator contributes 67.5° (75ps) rather than the 45° (50ps) of a conventional circuit.

In the RX, a three-stage amplifier, shown in Figure 4, conditions incoming 10Gb/s data. Each stage is an  $f_{\rm T}$ -doubler circuit formed by two cross-coupled differential pairs (an architecture often used in small-signal linear amplifiers) [4]. The enhanced small-signal bandwidth of this circuit improves the jitter performance of the amplifier. A similar circuit is used in the TX output driver.

A conventional multi-clock-phase 4-to-1 data multiplexer (MUX), shown in Figure 5a, experiences an unequal propagation delay at the output MXOUT from clocks applied to lower CLK00 and upper CLK90 switches. This effect produces a non-equal data width at the output. An improved version of a 4:1 data MUX is shown in Figure 5b, where  $f_{r}$ -doubler circuits replace the lower switches. A primary 4:1 MUX core (Q1 to Q8) and a replica (Q11 to Q18) are connected so that CLK00 is applied, with proper level shift, at the lower and upper switches of the primary and replica cores. A similar connection is applied from CLK90. The different propagation delays from CLK00 and CLK90 to the output are now common to all four data channels. Combining the  $f_{r}$ -doubler and the replica MUX, an equal data width, 4:1 10Gb/s MUX is achieved.

Figure 6 shows the transmitter output eye with  $2^{31}$ -1 PRBS pattern, along with the same signal after 21 feet of 0.190" coax. Under these conditions, with differential drive, measured link BER is better than  $10^{-14}$ . Measurement of the 4 output phases of the non-retimed mux shows a systematic skew <3 ps. This performance is due to careful layout and design of the precision 8-phase ring-oscillator and mux. The non-systematic TX jitter is 3.3ps rms and 18ps peak/peak.

An enhanced thick-film hybrid is used for mounting both chips. Gold thick-film paste is printed across a 25mil thick alumina substrate and photolithographically etched back to achieve 0.002" lines and spaces. The hybrids are mounted in a custom 100-pin Kovar QFP package with ceramic sidewalls. The package achieved better than 20dB return loss up to 10GHz for the differential highspeed inputs and outputs.

Figures 7 and 8 are micrographs of the 2.6x4.4mm<sup>2</sup> TX and 3.9x4.4mm<sup>2</sup> RX die. Both designs use full-custom layout for the critical multiphase-VCO and  $\rm f_r$ -doubler circuitry. The lower-speed MUX, DEMUX, and state-machine logic are implemented in ECL gate-array cells for ease of design modifications.

## Acknowledgments:

The authors thank J. Kerley and L. Dove for hybrid and package development, and J. Norman for microassembly of the TX chip prototypes.

## References:

[1] Lotter, M., "Internet Growth (1981-1991)," RFC 1296, Stanford Research Institute, Jan., 1992.

[2] Huang, W., et al., "A high-speed bipolar technology featuring selfaligned single-poly base and submicrometer emitter contacts," IEEE Electron Device Letters., vol. 11, no. 9, pp. 412-414, Sept., 1990.

[3] Walker, R., C. Stout, C-S. Yen, "A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection," ISSCC Digest of Technical Papers, pp.246-247, Feb., 1997.

[4] Battjes, "Amplifier Circuit," U.S. Patent 3633120, assigned to Tektronix Corp., 1972.











Figure 2: Receiver block diagram.

Figure 3: (a) Conventional 4-stage 2.5 GHz oscillaltor requiring 50ps gate delays; b: "Leap frog" architecture using 75ps gates.



Figure 4: 3 Stage f<sub>T</sub>-doubler input amplifier.



Figure 5: Left circuit is conventional 4:1 multiplexer. Right circuit is improved non-retimed mux with  $f_T$ -doubler core.





## 20 ps/div

Figure 6: Top trace is transmitter output eye diagram. Bottom trace after 21 feet of 0.190" diameter coax.



Figure 7: TX die micrograph.



Figure 8: RX die micrograph.





Figure 1: Transmitter block diagram.





Figure 2: Receiver block diagram.





Figure 3: (a) Conventional 4-stage 2.5 GHz oscillator requiring 50ps gate delays; b: "Leap frog" architecture using 75ps gates.





Figure 4: 3 Stage f<sub>T</sub>-doubler input amplifier.





Figure 5: Left circuit is conventional 4:1 multiplexer. Right circuit is improved non-retimed mux with  $f_T$ -doubler core.







20 ps/div

Figure 6: Top trace is transmitter output eye diagram. Bottom trace after 21 feet of 0.190" diameter coax.





Figure 7: TX die micrograph.





Figure 8: RX die micrograph.

