Integrated Hybrid Sub-Aperture Beamforming and Time-Division Multiplexing for Massive Readout in Ultrasound Imaging

This paper demonstrates hybrid sub-aperture beamforming (SAB) with time-division multiplexing (TDM) for massive interconnect reduction in ultrasound imaging systems. A single-chip front-end system prototype has been fabricated in 180-nm HV BCD technology that combines 5×1 SAB with 8×1 TDM to efficiently reduce the number of receive signal interconnects by a factor of 40. The system includes on-chip high-voltage (HV) pulsers capable of generating unipolar pulses up to 70 V in transmit (TX) mode. The receiver (RX) chain consists of a T/R switch, a variable-gain low-noise amplifier (VG-LNA) with 4-step gain control (15-32 dB) for time-gain compensation followed by a programmable switched-capacitor analog delay-and-sum beamformer. The proof-of-concept prototype operates at a 200-MHz clock frequency and the SAB provides 32-step fine delays with a maximum delay of 310 ns corresponding to better than <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula>/20 delay quantization at 5 MHz. With these specifications, the SAB is capable of beam steering from 0<inline-formula><tex-math notation="LaTeX">$^{\circ }$</tex-math></inline-formula> to 45<inline-formula><tex-math notation="LaTeX">$^{\circ }$</tex-math></inline-formula> for a 5-element subarray with 150-micron pitch (<inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula>/2), providing a near-ideal phased array imaging performance. The sub-aperture beamformer is followed by the TDM system where each of the 8 channels is sampled at a rate of 25 MS/s after an anti-aliasing bandpass filter. The full functionality of the prototype chip is validated through electrical and acoustic measurements on a 1-D capacitive micromachined ultrasonic transducer (CMUT) array designed for intracardiac echocardiography (ICE).


I. INTRODUCTION
A S THE number of elements increases in 1-D and 2-D ultrasound imaging arrays, reducing the number of interconnects becomes critical in terms of feasibility of implementation and cost, especially for catheter-based systems. Recent progress in microfabrication and interconnect technology allows the integration of transducer arrays with integrated interface circuits (interface ICs) using flip-chip bonding and transduceron-CMOS approaches. These methods, in addition to the on-chip signal processing circuitry, such as sub-aperture beamforming (SAB) (also known as microbeamforming) enabled commercial implementations in Transesophageal Echocardiography (TEE) systems with over 2000 elements [1], [2]. These systems still use over 100 cables to connect the imaging head to the backend system, which makes them challenging for catheter-based imaging systems introduced to the body through the vasculature. Although recently some 3-D imaging Intracardiac Echocardiography (ICE) catheters have been developed [3], these systems have complex interconnect structures, and about 10× reduction in interconnect cabling is achieved for a 900-element 2-D array, which can compromise image quality due to large sub-array size and electrical crosstalk over flexible printed circuit based cabling. Clearly further reduction in interconnect is necessary for increasing the number of independent channels for image quality, enabling coaxial cabling for reducing electrical crosstalk at a reasonable cost.
A variety of approaches have been proposed and published in the literature for cable reduction in ultrasound systems. For the transmit operation, digital programmable transmit beamformers (TX-BF) integrated on the ASIC, enabled by a high-voltage technology can provide beam steering and focusing. As shown in an earlier work, the digital beamformer can be programmed at a high speed over a single cable, this approach can efficiently reduce the number of cables for TX operation [4]. Reduction of electrical connections on the receive (RX) side is more challenging as ideally the RF data from all receiver channels Fig. 1. Simplified conceptual diagram of on-chip micro-beamforming (sub-aperture beamforming-SAB) with time-division multiplexing (TDM). The signals coming from transducer array elements in a sub-aperture are delayed with respect to each other (shown as empty and filled boxes in the micro-beamformer), before being summed up to be sampled and held for on-chip time-division multiplexing. The backend system samples the TDM signal properly to reconstruct the signals from each sub-aperture. Therefore, a transducer array of M × N elements can be read out (a single path to the processing system) by dividing the sub-arrays into M elements and using an N-channel TDM block.
should be collected simultaneously to reduce motion artifacts in real-time imaging. The most straightforward approach uses a separate cable for each channel to connect the transducers to the backend system, but as soon as the channel count increases beyond several hundreds this is prohibitive for probes inserted into the body such as TEE and ICE probes.
Sub-aperture beamforming (SAB), also known as pre-steering or μ-beamforming, in RX is a technique commonly used for cable reduction where the signals from a group of RX channels are combined with beamforming phase/time delays at the probe tip before the summed signal is sent out to the backend system over cables [5], [6], [7], [8], [9], [10], [11]. As described below, delay resolution and accuracy, and the large size of the sub-aperture lead to quantization errors and compromise image quality as compared to fully sampled arrays [12]. Although this approach is used with sub-apertures with up to 15 elements in ICE catheter probes, further reduction of cable connections is necessary for large 2-D arrays and low-cost 1-D arrays with large element count for catheter and portable ultrasound applications [3], [12], [13].
Time-division multiplexing (TDM), where signals from several RX channels are sampled simultaneously and sent over a single cable, is another cable reduction method [14], [15]. Using the same high-speed clock and synchronization, a fast ADC can be used to digitize the signals with direct digital demodulation (DDD) [16]. This approach is attractive because of its ease of implementation, silicon area savings, and less power consumption as compared to on-chip ADCs at the cost of pushing hardware complexity to the backend. In addition, it provides access to the raw RX channel data. When properly designed and implemented, TDM can provide about −30 dB or below cross-talk between its output channels utilizing μ-coaxial cables and 8×1 cable reduction, suitable for applications like ICE [17]. Since cable reduction in TDM is directly related to the higher sampling rates in the backend, this approach has limitations with noise folding with broader bandwidth for large multiplexing ratios and also the bandwidth requirements for small-sized microcoax cables in catheter applications.
For massive cable reduction beyond single-stage SAB or TDM, one realizes that the bandwidth of the SAB signals is essentially the same as individual channels. Therefore, cascading SAB with TDM allows one to use each method with optimum parameters and practical configuration and provides a simpler solution as compared to on-chip analog to digital conversion [8].
In this paper, we implement a hybrid RX front-end architecture to further reduce the number of interconnects by combining SAB and TDM techniques for massive readout. To the best of our knowledge, this is the first time these methods are combined and demonstrated in a single chip along with high voltage transmit circuitry. An ICE imaging system operating around 5-MHz center frequency is used as a case study and a reduction ratio of 40 is demonstrated by cascading 5×1 SAB with 8×1 TDM. Section II presents the proposed hybrid architecture describing the system-level details and specifications for the particular implementation. The circuit-level implementation is described in Section III. The electrical and acoustical characterization results are discussed in Section IV, including the comparison with the state-of-the-art solutions. The concluding remarks are presented in Section V.

II. THE PROPOSED HYBRID ARCHITECTURE
The overall hybrid architecture is shown in Fig. 1, where the transducer array is divided into sub-arrays each having M channels. The signals from the transducers in the sub-array are delayed with respect to each other depending on the steering angle and added to generate a pre-beamformed signal (the boxes are the time delays for each delay line). Each of these analog summed signals obtained from a group of M elements is then filtered to remove out-of-band noise, prevent aliasing, and sampled sequentially for TDM at a frequency of N × f s , where f s is higher than the required Nyquist sampling frequency for the ultrasound signal. These sampled and held N analog signals are fed into a single cable resulting in the N× M reduction in cable count. TDM approach consumes a small fraction of the power and silicon real estate of the system. Furthermore, it is shown that when properly designed TDM can provide low crosstalk levels for the multiplexed signals on the same cable making it a preferred method for secondary cable reduction [16].
The main design parameter of the TDM system is the sampling rate. For an 8×1 multiplexing time-division (TD) system that is determined as 200 MS/s for signals with 5-MHz center frequency and about 10-MHz bandwidth as done in an earlier work [4]. Determining the system parameters for the SAB system, however, is more complicated. As shown in Fig. 2, the sub-arrays collectively provide an approximation of the ideal circular focusing delay pattern, where each subarray is a linearization of the delay over a small range. The difference between the approximation and the ideal curvature (phase error) determines the performance of the SAB system. As analyzed in [12], the phase error in the far-field can be written as a function of the size of the sub-array (M × pitch), the quantization of the time delay in terms of the wavelength Q SAB and the phase difference on the array elements due to the pitch φ p where m 0 is the center element number for a sub-array with odd number of elements. In general, for phased array operation the pitch is selected to be (λ/2) at the center frequency. Therefore, for close to ideal image reconstruction, one needs to have small delay quantization and a trade-off between the size of the subarray and cable reduction ratio. The analysis for a 1-D array shows that for a 5-element sub-array with (λ/2) pitch, Q SAB of λ/8 or better is needed for better than −30 dB sidelobes in the far field [12]. For this work, the time delay step and total delay range are calculated based on the number of elements, array pitch, and view angle. Assuming 150-μm pitch (half wavelength) for the center frequency of 5

III. CIRCUIT IMPLEMENTATION
Although the main focus is the demonstration of the interconnect reduction in receive mode, to show the feasibility of a completely integrated system, a full transceiver system with transmit pulsers is designed and implemented. The overall architecture, as illustrated in Fig. 3, consists of an ultrasound transceiver (US-TRX) for each element of the array. The US-TRX includes the HV pulser, the HV switch, and the low-noise amplifier (LNA) with time-gain compensation circuitry. In the hybrid SAB-TDM architecture, each element is also followed by a programmable analog delay line. A group of elements shares their outputs to enable beamforming. A TDM block at the end of the receiver chain of the whole array sends the independent echo data out over a common data path to the backend system.

A. Ultrasound Transceiver
High-voltage pulses are used to excite array elements for acoustic wave generation with sufficient amplitude. Therefore, a HV technology is used for on-chip HV pulse generation. A digital pulser as depicted in Fig. 4 is designed to generate unipolar HV pulses. It uses laterally-diffused metal oxide semiconductor (LDMOS) transistors at its output stage. These double-diffused transistors have high drain-source junction breakdown voltage (BVDSS), but they need to be carefully designed since their gate-oxide breakdown voltage is much lower than BVDSS. The input pulse width generated from TX control circuitry triggers the pulser to generate a high-voltage pulse with a width equal to the input signal pulse width. The input signal is shifted up from 1.8 V to 5 V through level shifter stages to provide full swing for HV transistors. This results in the creation of 70-V pulses at the output node. A protection level shifter with a Zener diode Fig. 3. Simplified block diagram of the proposed hybrid SAB with TDM and the timing diagram of the TDM block. A group of elements consists of 5 sub-elements each of which has an ultrasound transceiver and a 32-step beamformer element. Note that each element has a HV pulser and a T/R switch to protect the LNA during the transmit operation. Eight identical groups of elements are time-multiplexed to transfer the received echo data out to the backend system. Only one group of 5 elements was implemented which is shown in dark gray. is added to safely drive the HV PMOS transistor, limiting the current drawn from the HV supply. This guarantees avoiding voltage drop across the entire power routing. The pulser is designed to drive a 15-pF capacitive load at 5 MHz with unipolar pulses up to 70-V amplitude.
In the RX chain, a HV switch isolates the RX chain from HV pulses during TX operation assuring safe operation of the LV circuitry in the RX chain bypassing the RX circuitry during TX mode. The parasitics added to the RX chain should be carefully monitored to adjust the gain and noise performance of the RX chain in further steps.
The RX analog front-end (AFE) includes a variable-gain LNA (VG-LNA) which combines the time-gain compensation circuit with an LNA for further saving of silicon area and power. The schematic of the VG-LNA structure is depicted in Fig. 5. It is designed to amplify input echo signals with the dynamic range (DR) of 76 dB and compensate for the time-dependent attenuation caused by the propagation of acoustic waves in a medium with 4-step gains of 15 dB, 21 dB, 27 dB, and 32 dB. The gain compensation can be calculated based on the desired imaging depth and the attenuation characteristic of the medium. A 2-bit TGC controls the gain steps and the Miller compensation capacitor is also controlled by it to preserve the bandwidth and 60 • phase margin. To further reduce the power consumption, the LNA is turned off during the TX mode controlled by an enable signal.

B. Sub-Aperture Beamforming in RX
The simplified conceptual diagram of the hybrid SAB and TDM architecture is depicted in Fig. 1. Each element of the sub-aperture should be properly fine-delayed and summed to form the desired signal. This is basically a reconstruction of a signal from acoustic waves coming back from a certain focal point by delaying the signals relative to each other and coherently summing them up in a sub-array. Therefore, the microbeamformer should provide appropriate delay with adaptable delay steps while it should be simply designed regarding the on-chip area and power consumption.
For implementing the micro-beamformer, an analog delayand-sum topology is utilized in the RX chain employing switched-capacitor delay units. The total RX delay is calculated based on a focal point and the depth of view (DoV). This delay is divided into a coarse delay and a fine delay. The coarse delay can be implemented in a high-performance backend system. The fine delay can be implemented on-chip which significantly reduces the complexity of system implementation on the ASIC side, considering less than 1 μs delay range for each element of the sub-array. The analog delay uses sample-and-hold unit cells comprising a sampling capacitor unit and MOS switches for sampling and readout. The second switch allows each stored data to be shared (read out) with the rest of the channels in the subarray at a certain time, thus coherently summing the data received from the sub-array transducer elements. The buffer added at the end of each channel prevents the effect of loading  The RX beamformer uses a single low-voltage differential signaling (LVDS) data line for loading beamforming delay patterns generated from the backend system. To control the time delays for each element in the sub-aperture, the serial-in-parallel-out (SIPO) shift registers are used. Each sub-array of 5 elements requires 25 bits of delay data and five 5-bit shift registers are used, and FPGA in the backend system generates a 31-bit data packet for the fine delay profile, in which the first 6 bits are for the validation of received data. For instance, a data packet of "1111110001000011001000011001001" generates 20-ns, 30ns, 40-ns, 60-ns, and 90-ns delays for the first to the fifth element of the sub-array, respectively. As illustrated in Fig. 6, the control signal generation for each sub-aperture consists of two blocks. Considering the timing diagram in Fig. 6, the sample clock generation block is shared between elements of the sub-array while each element has its own hold clock generation block.

C. Time-Division Multiplexing in RX
TDM is a well-known approach in communication systems where multiple channels share one path and are assigned specific time slots. The overall diagram of TDM is shown in Fig. 7. It uses analog multiplexer and digital counting logic which can save power while it requires simple design. Considering the Nyquist rate for sampling, echo signals need to be sampled at least 20 MSps for a 7-MHz array with 80% bandwidth. Since TDM is synchronized with the ADCs in the backend system, the 200-MHz clock frequency is enough to provide 25 MSps for each channel in 8×1 TDM circuitry. The TDM circuitry consists of 8 channels each of which has sample-and-hold switches, a buffer,  and link training switches. The link training is performed to synchronize the clocking between the TDM block and the ADCs of the backend system. In the first phase of the link training, the TDM multiplexer generates a certain training sequence where the first channel is connected to a fixed voltage significantly higher than the other channels. The backend system processes the received TDM sequence to determine the first channel. Next, the phase of the clock is adjusted to find the optimum alignment between ADC sampling and the TDM period. In each sample-and-hold unit cell, the sampling capacitor is chosen to be around 300 fF to compromise the kT/C noise and the silicon area. The TDM circuitry is followed by a current feedback source degenerated push-pull buffer for driving 75-Ω cable termination over a bandwidth of about 400 MHz. The details of the TDM circuitry and its demonstration with low crosstalk levels are described in an earlier publication [4], [18].

IV. EXPERIMENTAL RESULTS AND DISCUSSION
The proof-of-concept front-end SoC prototype has been fabricated in 180-nm HV BCD technology. In this particular design, one sub-array block along with the TDM block has been implemented that is sufficient to evaluate the proposed interconnect reduction ratio. It consists of US-TRXs (pulser, T/R switch, LNA, and buffer) with receive 5×1 SAB and 8×1 TDM, which occupies 1.28 × 1.415 mm 2 including the test pads, as shown in Fig. 8. The floor plan of each element designed for 150-μm-pitch of a 1-D ultrasound array is shown in Fig. 8, where each element occupies 1 mm × 150 μm area. The power and area breakdowns are depicted in Fig. 9. Utilizing a 1.8-V supply for all building blocks in RX, each element in a 5-element sub-aperture consumes 4.65 mW. The beamforming block and its clock generation circuitry consume 86% of the total power consumption of each element while they occupy 37% of each element's pitch area, while the TDM consumes negligible area and power, as expected.

A. Electrical Characterization
The prototype ASIC was wire-bonded to a custom printed circuit board (PCB) to electrically characterize the ASIC. The test PCB was then connected to a custom backend system [4] which provides the required power lines and control signals as well as programming the delay pattern and handling TDM output signals. Fig. 10 shows the measured output of pulsers, driving an effective capacitive load of 15 pF (equivalent capacitance of the test 1-D CMUT array). It shows the successful generation of both single and multi-cycle excitation of a 60-V unipolar pulse for conventional ultrasound imaging and pulsed Doppler mode operation, respectively. All pulses have 80-ns pulse width with the rise and fall times of about 5 ns in Fig. 10.
For RX characterization, the input test signals were applied to the RX pads on the ASIC side from five synchronized signal generators. Fig. 11 shows the measured frequency response of the LNA for four different time-gain compensation settings. It illustrates 4 different gains (15 dB, 21 dB, 27 dB, 32 dB) with 3-dB bandwidth of 12 MHz for elements with the center frequency of 5 MHz. The measured input-referred (IR) voltage noise density is less than an average of 8.6 nV/ÝHz at 5 MHz without calibration, in agreement with the simulations. To test the sub-aperture beamformer delay circuitry, five-cycle toneburst sinusoidal signals with a frequency of 5 MHz were applied to the input of single RX channels in the sub-aperture, and the time-domain outputs of the sub-array were recovered utilizing the TDM backend system. Fig. 12 shows the results, where the time delays follow the programmed time delays of 10 ns, 20 ns, 30 ns, 110 ns, 210 ns, and the maximum fine delay of 310 ns. The difference between the desired and measured delays was below 2 ns for the largest delay in the sub-array, which should not impact the image reconstruction in this frequency range. Nevertheless, the errors on the SAB delays due to the system  variation can be measured. These variations can be calibrated in the backend system as part of the digital post-processing of the RF data for accurate imaging performance.

B. Acoustic Characterization
For acoustic characterization of the full system, the fabricated prototype ASIC was connected to a CMUT array coated with Parylene on a PCB and immersed in a water tank. The CMUT array (Vermon SA, France) used in the characterization was designed for ICE catheters and was similar to the arrays used in earlier publications with element capacitance of about 15 pF and 200-μm pitch [19]. This array was chosen as it was available commercially and operated in a stable manner. Although the pitch is larger than the desired 150-μm, it was still suitable for ASIC characterization in terms of frequency response and steering angle allowing for comparison with calculated beam patterns. The PCB was connected to a handle PCB for characterization, where the handle PCB communicated with the backend system utilizing high-speed ethernet cables. The measurement setup is shown in Fig. 13.
To test the functionality of the pulser, a hydrophone was used as a receiver while one of the pulsers generated a 45-V unipolar pulse with 90-ns pulse width, where the CMUT element was biased at 80 V. The pulse level was kept low not to exceed the CMUT array's maximum voltage level. Fig. 14 shows the time and frequency response of the pulser and the CMUT element, where a hydrophone was used to detect the generated pressure wave. The center frequency of the detected pulse at 1.5 cm away from the element shows the center frequency of 6 MHz with ∼80% fractional bandwidth (FBW), showing that the pulser can drive the CMUT elements and the CMUT array is suitable for sub-aperture beamformer characterization at 5 MHz.
To characterize the beam steering capability of the sub-aperture beamforming electronics to different angles, a large piezoelectric transducer was used at the normal direction of the CMUT array to generate sufficient pressure and a plane wave-like incident field at 5 MHz, and the array was steered electronically. The output of the SAB chip was recorded as a function of the programmed steering angle. To compare the results with the model, the output of the 5-element SAB with  200-μm pitch was simulated for the same steering angles. Fig. 15 shows the comparison of the simulations and measured amplitude variation. The results are in good agreement with the simulations, where the half-power beamwidth is about 7 • . Also, note that even with a 200-μm-pitch array, the SAB electronics can steer the beam to 9 different angles.
As the last and complete characterization of the ASIC and CMUT array, the system is used with the TDM backend system. Fig. 16 illustrates the overall performance of SAB with TDM. A piezoelectric transducer was used to generate the acoustic waves in the direction of the main lobe (normal to the array) and the backend generated control signals are used to steer the direction of the received wavefront. The upper and lower plots show the received signals at the output of the beamformer and the reconstructed signals in the backend system received from the TDM output for three different angles, respectively. The recovered TDM output has slightly lower bandwidth which is mainly because of the bandpass filtering before TDM and also in the TDM handle PCB. The results collectively show that 40× reduction in RX interconnect is feasible with the hybrid SAB and TDM system with a suitable number of steering angles for ICE imaging systems. Table I compares this work with the state-of-the-art interconnect reduction solutions for ultrasound imaging systems.
This work provides the highest cable reduction ratio with simultaneous RX signal collection. It has a higher resolution in terms of both delay quantization and provides a larger total delay as compared to other similar ICE ASICs. This results in a smaller delay angle step size (approximately λ/22 at 5 MHz and 150-μm pitch), which improves the image quality. The power consumption per channel is 4.65 mW, which is suitable for practical implementation with large element count arrays. Although the ASIC is demonstrated with a CMUT array here, this pitch-matched system can be used equally well with bulk piezoelectric transducer and PMUT-based imaging arrays [22], [23], [24]. Some recent works in [11], [20] use fully digital beamformers. Both are receiver ASICs that do not have on-chip high-voltage pulsers. While their channel reduction capability is comparable to this work, they have higher power consumption per element and require higher clock rates. Relative to the works in [8], [14], this work has higher channel reduction using the same technology node while achieving about 4 times improvement in the number of steering angles and 3 times higher delay resolution. Noting that a direct comparison is difficult to make in terms of power consumption due to the different parameters of their design for different applications. The real-time 3-D ICE in [3] has about 2 times smaller channel reduction in comparison to this work while consuming less power, but it lacks a pitch-matched design. In summary, this work pushes the state-of-the-art in terms of "complete" transmit-receive ASICs for intracardiac ultrasound imaging with the highest channel reduction, and finest delay resolution in terms of wavelength and delay range. These parameters are critical not only for channel reduction but also for the quality of the resulting real-time images. Note that the receiver channel reduction capability shown here is generic. As mentioned earlier, with HV TX beamformer integration, the transmit operation is controlled with a single cable [4]. On the receive side, the reduction factor due to SAB can be increased by increasing the number of elements in the sub-aperture. For example, it is possible to group up to 15-20 elements in some 2-D ICE arrays [3]. Similarly, if the system bandwidth is lower, such as 3-5 MHz used in most diagnostic ultrasound systems, the TDM factor can be increased to 16× with 200 MS/s rate as used here. Thus, the channel reduction factor can be increased beyond 200×. Therefore, the limit of reduction depends on the tolerable image quality and bandwidth of the ultrasound imaging system.

V. CONCLUSION
In this paper, we present a hybrid technique for massive interconnects reduction in ultrasound imaging systems which cascades sub-aperture beamforming (SAB) with time-division multiplexing (TDM). The proof-of-concept prototype utilizing HV BCD technology combines 5×1 SAB with 8×1 TDM in RX enabling cable reduction by a factor of 40. Electrical characterization of the system shows suitable performance in terms of amplification, 10-ns delay resolution, and 310-ns total delay for the beamformer at 5 MHz, suitable for an ICE system in agreement with simulations. Acoustic characterization performed on a commercially available 1-D CMUT array demonstrates the functionality of both the HV pulsers and the cascaded SAB and TDM system. This pitch-matched ASIC is agnostic to transducer type and can be used equally well with bulk piezoelectric transducer and PMUT-based imaging arrays. This promising approach is scalable to a higher number of elements, and it is especially suitable for dense 2-D arrays with over 1000 elements.