Vertical Stacked LEGO-PoL CPU Voltage Regulator

—This paper presents a 48 V–1 V merged-two-stage hybrid-switched-capacitor converter with a Linear Extendable Group Operated Point-of-Load (LEGO-PoL) architecture for ultra-high-current microprocessors, featuring 3-D stacked packaging and coupled inductors for miniaturized size and vertical power delivery. The architecture is highly modular and scalable. The switched-capacitor circuits are connected in series on the input side to split the high input voltage into multiple stacked voltage domains. The multiphase buck circuits are connected in parallel to distribute the high output current into multiple parallel current paths. It leverages the advantages of switched- capacitor circuits and multiphase buck circuits to achieve soft charging, current sharing, and voltage balancing. The inductors of the multiphase buck converters are used as current sources to soft-charge and soft-switch the switched-capacitor circuits, and the switched-capacitor circuits are utilized to ensure current sharing among the multiphase buck circuits. A 780 A vertical stacked CPU voltage regulator with a peak efﬁciency of 91.1% and a full load efﬁciency of 79.2% at an output voltage of 1 V with liquid cooling is built and tested. This is the ﬁrst demonstration of a 48 V–1 V CPU voltage regulator to achieve over 1 A/mm 2 current density and the ﬁrst to achieve 1,000 W/in 3 power density. It regulates output voltage between 0.8 V and 1.5 V through the entire 780 A current range.

Abstract-This paper presents a 48 V-1 V merged-two-stage hybrid-switched-capacitor converter with a Linear Extendable Group Operated Point-of-Load (LEGO-PoL) architecture for ultra-high-current microprocessors, featuring 3-D stacked packaging and coupled inductors for miniaturized size and vertical power delivery. The architecture is highly modular and scalable. The switched-capacitor circuits are connected in series on the input side to split the high input voltage into multiple stacked voltage domains. The multiphase buck circuits are connected in parallel to distribute the high output current into multiple parallel current paths. It leverages the advantages of switchedcapacitor circuits and multiphase buck circuits to achieve soft charging, current sharing, and voltage balancing. The inductors of the multiphase buck converters are used as current sources to soft-charge and soft-switch the switched-capacitor circuits, and the switched-capacitor circuits are utilized to ensure current sharing among the multiphase buck circuits. A 780 A vertical stacked CPU voltage regulator with a peak efficiency of 91.1% and a full load efficiency of 79.2% at an output voltage of 1 V with liquid cooling is built and tested. This is the first demonstration of a 48 V-1 V CPU voltage regulator to achieve over 1 A/mm 2 current density and the first to achieve 1,000 W/in 3 power density. It regulates output voltage between 0.8 V and 1.5 V through the entire 780 A current range.
Index Terms-Dc-dc power conversion, hybrid switchedcapacitor circuit, voltage regulator, series-input-parallel-output architecture, vertical power delivery, coupled inductor

I. INTRODUCTION
A S the data center industry continues to trend towards consuming more power, efficient power delivery architecture is increasing in significance. Future high performance computing systems (CPUs, GPUs, and TPUs) comprise billions of transistors switching at very fast speeds, and consume hundreds of amperes of current at very low voltage (e.g. J. Baek This paper is a combination and extension of four previously published conference papers, "LEGO-PoL: A 93.1% 54V-1.5V 300A Merged-Two-Stage Hybrid Converter with a Linear Extendable Group Operated Point-of-Load (LEGO-PoL) Architecture" in IEEE COMPEL 2019 [1], "LEGO-PoL: A 48V-1.5V 300A Merged-Two-Stage Hybrid Converter for Ultra-High-Current Microprocessor" in IEEE APEC 2020 [2], "A Merged-Two-Stage LEGO-PoL Converter with Coupled Inductors for Vertical Power Delivery" in IEEE ECCE 2020 [3], and "3D LEGO-PoL: A 93.3% Efficient 48V-1.5V 450A Merged-Two-Stage Hybrid Switched-Capacitor Converter with 3D Vertical Coupled Inductors" in IEEE APEC 2021 [4].  [5], [6]. Microprocessors consume higher power per square millimeter in an increasingly larger die area [7], [8], requiring very efficient and miniaturized power delivery from high voltage.
≥250 A, ≤1.5 V) in a small footprint area [9], [10]. Highefficiency, high-power-density, high-bandwidth power electronics are needed to support high performance computing systems. Figure 1 shows the rapid growth of power consumption (= power area density × die area) of microprocessors against the development of the process nodes. The power area density of microprocessors has exceeded 4.5 W/mm 2 , resulting in stringent requirements to improve the power density, and in particular, the current area density, of power electronics. Making the footprint area of voltage regulators smaller than the microprocessor can enable many system level opportunities.
Another emerging trend for efficient power delivery in data centers is powering servers from a high voltage (e.g. 48 V). Delivering power at 48 V reduces power distribution loss and improves UPS deployment flexibility [11]. Various topologies for 48 V-PoL applications have been proposed, including single-stage architectures [12]- [16] and two-stage architectures [17]- [24]. Single-stage architectures are attractive for their low component count, but they often have difficulty achieving high control bandwidth and high output current capability. Two-stage architectures are more suitable for high output current and high control bandwidth applications. They typically consist of an unregulated stage and a regulation stage. The unregulated stage converts a high input voltage (e.g. 48 V) to a lower bus voltage (e.g., ≤ 12 V) with high efficiency. The regulation stage regulates the output voltage with a high control bandwidth. The unregulated stage can be a transformer based topology [17]- [19] or a switched-capacitor based topology [20]- [24]. Transformer based topologies can achieve high heavy-load efficiency but may have inferior light-load efficiency and power density due to transformers. Switchedcapacitor based topologies are becoming increasingly popular due to their transformerless design. They offer advantages in reduced device voltage stress and current stress, and can provide soft charging and soft switching, but usually require resonant inductors to achieve high performance.
This paper develops a modular and scalable 48 V-1 V CPU voltage regulator solution -the Linear Extendable Group Operated Point-of-Load (LEGO-PoL) architecture -which can achieve extreme current area density and perform vertical power delivery. This architecture decouples the high voltage stress and high current stress with automatically balanced building blocks for modularity and scalability. It has a single magnetic component (the output inductor). The size of the dc decoupling capacitors between two stages is very small. The inductive energy storage per watt of the system is low. The switched-capacitor stage can operate at a low switching frequency (e.g. ≤ 300 kHz) to achieve high efficiency. The buck stage can operate at a much higher switching frequency (e.g. ≥ 1 MHz) to achieve a high control bandwidth. The coupled inductors further reduce the output inductor size with improved system performance in transient [25]- [32].
A 3D stacked 48 V-1 V CPU voltage regulator with vertical coupled inductors was fabricated and tested to deliver 780 A of output current with a 0.8 V-1.5 V regulated output voltage range. Vertically delivering the power enables the prototype to achieve a high current area density. The prototype achieved a peak efficiency of 91.1% and a full load efficiency of 79.2%, a current area density of 1.017 A/mm 2 , and a power density of 1000 W/in 3 at 1 V, 780 A, and 1 MHz buck switching frequency. The system is liquid cooled when operating above 450 A. The semiconductor junction temperature is maintained below 95°C in all operating conditions. The remainder of this paper is organized as follows: Section II introduces the principles of the LEGO-PoL architecture and its operation mechanisms, including soft charging, automatic current sharing, and automatic voltage balancing. Section III presents the design considerations of the 3D stacked 48 V-1 V point-of-load converter. The experimental setup and measurement results of the prototype are presented in Section IV. Section V provides discussion and further improvements to achieve better performance. Finally, Section VI concludes this paper. An extended discussion of the automatic current balancing mechanism is provided in the Appendix.

II. LEGO-POL ARCHITECTURE
A. Principles of the LEGO-PoL Architecture Figure 2 shows the key principles of the merged-two-stage LEGO-PoL architecture with N submodules. One LEGO-PoL submodule comprises two building blocks: a 2:1 switchedcapacitor unit and an M -phase buck unit. The 2:1 switchedcapacitor unit operates with fixed complementary 50% duty cycles (φ 1 and φ 2 ). It is operated at a low frequency to  . By equally distributing high input voltage stress and high output current stress into each module, the architecture can utilize lower rated semiconductor devices with uniformly distributed heat dissipation across the submodules.

B. Operation Mechanisms of the LEGO-PoL Architecture
The LEGO-PoL architecture decouples the voltage stress, current stress, and dynamic requirements and addresses these design challenges. In this section, we present the soft switching, soft charging, current sharing, and voltage balancing mechanisms in detail. Figures 3 and 4 show a topology and operational waveforms of a LEGO-PoL converter comprising three (N = 3) 2:1 switched-capacitor units and three singlephase (M = 1) buck units to illustrate the operation mechanisms. The switched-capacitor units in Fig. 3 are simplified from Fig. 2. The series-connected switches in Fig. 2, such as Q 3 and Q 3i−2 , can be merged as one switch, Q 3 in Fig. 3. One switch (Q 3N ), one capacitor (C 2N ), and two synchronous rectifier switches Q S(4N +3) and Q S(4N +4) in the third 2:1 switched-capacitor unit can be removed because the voltage across C 2N is zero. Switch Q 6 is connected to the output of the third 2:1 switched-capacitor unit.
1) Soft Switching and Soft Charging: There are two design options for the LEGO-PoL converter. The first is when the LEGO-PoL converter is designed with a very low parasitic inductance (e.g., 1 nH) along the current path between the switched-capacitor stage and the buck stage. In this option, the capacitors of the switched-capacitor stage are used as the input capacitors of the buck stage, and the decoupling capacitor between the two stages is eliminated. Buck inductor currents can only conduct through each switched-capacitor unit when the high-side buck switches (Q Hx ) are turned on. Zerocurrent-switching (ZCS) is achieved in the switched-capacitor units by coordinating the switching sequences of the switched- capacitor units and buck units. The key principle is to change the state of the switched-capacitor units during the freewheeling state of the buck units (Fig. 4). This is of particular importance when selecting the switching frequencies of the switched-capacitor stage and buck stage. In contrast to the resonant hybrid-switched-capacitor converter, which achieves soft charging operation by placing an inductor between two capacitors, the LEGO-PoL architecture achieves it by utilizing the inductors in the buck stage. The capacitors of the switchedcapacitor units are always charged and discharged by the buck stage which acts as a current source. The second possible design option is when the LEGO-PoL converter does not have a low enough parasitic inductance (e.g., >1 nH). In this option, a capacitor between two stages is used. This capacitor is large enough to filter the high frequency pulsating current from the buck stage, and is small enough to maintain low charge sharing loss. The current flowing in the switched-capacitor units is the input current of the buck units, filtered by the parasitic inductance and filter capacitance. The switching frequencies of the switched-capacitor units and buck units can be independently selected. This design option is implemented in the developed prototype, using the switching control demonstrated in Fig. 4. The design of this filter capacitor is discussed in Section III-B.
2) Automatic Current Sharing and Voltage Balancing: The automatic current sharing and voltage balancing mechanism of the LEGO-PoL architecture can be explained by analyzing the current flow in the two switching phases in Fig. 3. In each switching cycle, capacitors C 2 and C 4 are charged by one current source in φ 1 , and discharged by another current source in φ 2 . Due to the charge balancing requirements of the switched capacitors, the two current sources have to be equal, leading to current sharing between two adjacent modules, and sequentially current sharing to all modules. For example, C 2 is discharged by i L2 in φ 1 , and charged by i L1 in φ 2 . C 3 is charged by i L2 in φ 1 , and discharged by i L2 in φ 2 . As described in Appendix I, the charge balance requirement of C 2 and C 3 in one switching period forces i L1 to be equal to i L2 in steady state operation. Benefiting from a similar switched-capacitor mechanism, the automatic current sharing leads to automatic voltage balancing between the series stacked switched capacitors. With these features, the LEGO-PoL architecture can handle a very high output current.
3) Passive Phase Current Balancing: The LEGO-PoL architecture automatically balances the current between each submodule. The phase currents of each module can be balanced by coordinating the selection of the switching frequencies of the switched-capacitor stage and buck stage. Since the LEGO-PoL architecture removes large dc decoupling capacitor between the two stages, the virtual intermediate bus voltages contain a higher ripple, which may cause phase current mismatch in a buck unit. There are a few methods to balance the phase currents in the presence of this ripple, including current mode control. The duty ratios of the buck unit switches can be actively modulated to compensate for the input voltage ripple and balance phase currents. Another way is to use a passive phase rotating scheme to balance the current as depicted in Fig. 5. When the buck switching frequency is chosen as where f Buck and f SC are the switching frequency of the buck stage and switched-capacitor stage, respectively, an odd number (seven, in the example of Fig. 5) of buck switching occurrences happen during a half switching cycle of the switched-capacitor stage. This results in rotating through the different buck switches, each taking a turn as the one turned on at the highest V BU S ripple, and resulting in identical average input voltage across all buck switch nodes.

III. VERTICAL STACKED LEGO-POL CONVERTER DESIGN
Vertical power delivery for microprocessors can increase current area density (A/mm 2 ), create space for communication interconnects, reduce I 2 R losses in the power delivery network, and improve transient response by reducing parasitic impedances [35]- [38]. This section details how to design a vertical stacked LEGO-PoL converter exceeding 1 A/mm 2 , approaching the area power density of the silicon core ( Fig. 1). Figure 6 shows the schematic of a 48 V-1 V 780 A LEGO-PoL converter with three series-stacked 2:1 switched-capacitor units and three parallel-connected four-phase buck units with  The twelve phases each deliver a peak current of 65 A with regulated output voltage. To enable vertical power delivery, the inductors are used as a link between the motherboard and the remainder of the converter. Parasitic resistance (R par ) and inductance (L par ) are considered to design a high frequency input filter capacitor (C filter ) for the buck stage. Detailed design considerations of this vertical stack procedure for a high current density of 1 A/mm 2 are provided in the following subsections. Figure 7 and Figure 8 show the PCB layout and mechanical demonstration of the 48 V-1 V 780 A vertical stacked LEGO-PoL converter with lateral or vertical power delivery from the motherboard to the CPU.

A. Series-Stacked Switched-Capacitor Stage
Three series stacked 2:1 switched-capacitor units split 48 V high input voltage into smaller 16 V voltage domains to enable the utilization of low-voltage-rating devices with low on-resistance. The voltage stresses on the active switches are either V BU S or 2V BU S , as the voltage blocked by the switches is always clamped by the capacitors. Due to low voltage stress and low switching frequency, switches in the switchedcapacitor stage are implemented as standards MOSFETs.
In many resonant switched-capacitor designs, the capacitors need to be carefully selected because the capacitance value  board (PCB) area. On the top layer, MOSFETs and capacitors are placed as close as possible to reduce parasitics. Then, the empty space is filled with copper traces and capacitors. On the bottom layer, the capacitors are fully modularized to optimize the current path and reduce the PCB conduction loss. The capacitance of the switched capacitors is selected to optimize the power density, efficiency, and intermediate bus voltage ripple. The bus voltage ripple can be designed considering the voltage rating of the semiconductor devices. Selecting a 3 V intermediate bus voltage ripple for this application, more layers of capacitors enable lower switching frequency operation and higher efficiency but deteriorates the power density as depicted in Fig. 9. In the prototype, two layers of 0805 capacitors are stacked for a 286 kHz of switching frequency, 2600 kW power density, and 95.6% full load efficiency of the switched-capacitor stage. Figure 10 shows the gate drive structure for three series stacked 2:1 switchedcapacitor units. A pair of 50% complementary gate driver signals (φ 1 and φ 2 ) is used. This is a simple and scalable charge pump circuit for generating the bias voltage for the floating MOSFETs and can be fully integrated.

B. Virtual Intermediate Bus Parasitics
Due to the high frequency operation of the buck stage, a small parasitic inductance between two stages can cause current ringing and increase the stress in the switched-capacitor stage. Figure 11 shows an equivalent RLC circuit of one submodule of the LEGO-PoL converter. The switched-capacitor stage is modeled as a sawtooth voltage source whose frequency is twice of that of the switched-capacitor stage (2f SC ). The buck unit is modeled as a pulse wave current source with 4 times the switching frequency of the buck stage (4f Buck ). L par and R par are the lumped parasitic inductance and resistance along the current paths in the switched-capacitor stage. C filter is a small input capacitor of the buck stage to smooth the high frequency current. The input current of four-phase buck units (i Buck ) is clamped by the inductor current, while the current going through switched-capacitor units (i SC ) is determined by the RLC filter and has ringing. This issue commonly exists in merged-two-stage designs [33], [34]. L par contributes to filtering and larger values are beneficial, but layouts that deliberately increase L par often increase resistance and require extra space. Here L par is determined by the practical value achieved in our assembly. R par is minimized to improve efficiency, and thus C filter is the only design parameter for the filter. The cutoff frequency of the RLC filter is The switches of the switched-capacitor stage, interconnects between the switched-capacitor stage, and PCB trace are all sources of parasitic inductance and resistance. For the vertical stacked design introduced in Section III, the calculated parasitic inductance is 2.7 nH and parasitic resistance is 6.1 mΩ.
The buck switching frequency f Buck is 1 MHz. Figure 12 shows a Bode plot of the magnitude response of the RLC filter, considering the buck switched current as the input and the current i SC as the output. The response is plotted for various values of C filter . A small filter capacitance of 0.5 µF amplifies the high frequency current, which can cause increased losses. As the filter capacitance increases, the filter is able to adequately damp the current ringing. Figure

C. Parallel Interleaved Buck Stage
Three parallel-connected four-phase buck units equally share 780 A of output current. Each individual phase delivers 65 A. The peak virtual intermediate bus voltage is 9.5 V (nominally 8 V, with a peak-to-peak voltage ripple of 3 V). The reduced stress enables the use of low voltage high current semiconductor devices and small magnetics. State-ofthe-art control strategies for multiphase buck converters can be adopted. In a traditional 12-phase buck converter design, the controller needs to balance the current of all phases. Due to automatic current sharing, the controller only needs to balance the current of the four phases within each submodule. This unique feature allows LEGO-PoL architecture to be scaled to a very high current without adding significant control complexity. This is a key advantage of LEGO-PoL compared to traditional two-stage intermediate bus architectures with numerous parallel units, which require active control for current balancing.  Figure 7c shows the component placement of the buck stage. Two 5 mm × 6 mm DrMOSes are placed on the top and two are placed on the bottom. Each phase is designed to have the same PCB pattern from the input node to each of the coupled inductor interconnect nodes to minimize the current mismatch. Capacitors, as designed in Section III-B, are placed in the center of the board to filter the high frequency current. Figure 7d shows the PCB layout of the interposer board. The interposer board decouples the design constraints of the buck board and the coupled inductor. Interconnect A is for the buck PCB connection, while interconnect B is for the coupled inductor. The interposer board is 4-layer, 2 oz copper thickness, and 0.6 mm board thickness.

D. Vertical Four-Phase Coupled Inductor
A coupled inductor can reduce the steady-state current ripple in each phase of the buck stage, and achieve fast dynamic performance with small leakage inductance [26]. In the 3D packaged prototype, three four-phase coupled inductors link the buck stage and output board with vertical windings. Figure 14 shows the coupled inductor, which is fabricated with Ferroxcube 3F4 MnZn ferrite. The footprint of the core is 13 mm × 12 mm, and its height is 5.25 mm. This design is optimized to minimize the core and winding losses at an operating point of 20 A per phase. The designed core has higher density and lower dc resistance than the design presented in [3]. To enable vertical power delivery, the machined × × copper windings enter from the bottom of the core, make a 90 • rotation within the core, and exit from the top of the core to the output motherboard. The empty area within the core between windings can be adjusted to control the leakage flux path, which determines the transient and ripple performance [31]. An extended discussion about the coupled inductor design and optimization parameters is provided in [32]. Table I lists the key parameters of the four-phase buck units with the coupled inductors. The leakage inductance per phase is 12.4 nH, to achieve a targeted maximum output current slew rate of 5 A/ns. The overall system transient inductance is 1.03 nH, as there are 12 total phases amongst the three buck units. The peak-to-peak phase current ripple of the buck units is 10.9 A with the coupled inductor. To achieve the transient current speed using uncoupled discrete inductors, four 12.4 nH discrete inductors must be used. This would yield a 70.5 A peak-to-peak phase current ripple, as simulated in Fig. 15. This coupled inductor is designed to be able to handle a phase current mismatch of 10% of the full load current (65 A per phase) without saturation. If the phase currents are well balanced, the coupled inductor will not saturate. Figure 16 compares the size of the coupled inductor against four Coilcraft SLR1050A 85 nH discrete inductors, which achieve a similar peak-to-peak phase current ripple as the coupled inductor. A more detailed comparison of the two solutions is presented in Table I. The coupled inductor achieves a much lower leakage inductance while maintaining a similar peak-to-peak phase current ripple as the discrete inductor solution. It also has a lower dc resistance and lower core loss. The volume of the coupled inductor is only 57.7% of that of the four discrete inductors. Figure 7e shows the PCB layout of the output board. The output board combines the current and hosts the output capacitors. Outside of the power stage area, terminal connections are placed to connect the converter to electronic loads. Each module has four terminals, labeled "interconnect B" for the four-phase coupled inductor. The remainder of the space is used for 14 × 1206 capacitors per module. In the prototype, 220 µF capacitors are used to satisfy a ±2% output voltage ripple requirement. The effective total converter output capacitance is 5.75 mF at an output voltage of 1 V. Figure 7f shows the full 3D assembly drawing of the prototype. The switched-capacitor stage and the buck stage each occupy about one-half of the system volume. Power is vertically delivered from 48 V on the bottom to 1 V on the top. The overall height of the prototype is 16.65 mm.

IV. EXPERIMENTAL VERIFICATION
A 48 V to 1 V, 780 A vertical stacked LEGO-PoL converter was fabricated and tested. Figure 17 shows the 3D structure and assembly procedure of the prototype. The input voltage range is from 36 V to 54 V, and the output voltage range is from 0.8 V to 1.5 V. Three submodules were used, as per the schematic in Fig. 6. Table II   A. Experimental Setup Figure 18 shows the experimental setup to characterize the performance of the vertical stacked LEGO-PoL converter. All of the necessary equipment is placed in a standard 1U server rack setup. A Tektronix MDO4140C Oscilloscope is used to measure operation waveforms. Five Agilent 34401A digital multimeters are used to take automated measurements of the input voltage, input current, output voltage, output current, and DrMOS junction temperature. A BK Precision 9117 3 kW, 80 V, 120 A dc power source is used. Rideon RSN-50 and Rideon RSC-1000 current shunts are used for input and output current measurement. Two electronic loads, a Chroma 63103A 240 A load and a Chroma 63203 600 A load, are used. The vertical stacked LEGO-PoL prototype was tested under two different cooling conditions: air cooling (Fig. 18b) and liquid cooling (Fig. 18c). Two 36 CFM fans are used for the air cooling. Mineral oil is used for the liquid cooling, and two 36 CFM fans and a pump are used to circulate the liquid. Figure 19 shows the measured waveforms of the switchedcapacitor stage at a 48 V input voltage, 1 V output voltage, and an output current of 780 A. The input voltage is shown on top, and the differential voltage across C 2 and C 4 , as well as the leftmost node voltage of C 5 (denoted V C F 5 on the schematic of Fig. 6), are shown below the input voltage. Figure 20 shows voltage ripple due to the higher bias voltage, and module #3 has the lowest voltage ripple. Figure 21 shows the switch node voltages of each of the four phases of the second buck module. The envelope of the switch nodes is equal to V BU S2 , which is the input voltage of the buck unit. The four phases are interleaved, with a duty cycle of 15.7%. The switching frequency is 1 MHz. Figure 22 shows the output voltage and virtual intermediate bus voltage waveforms in response to a buck switch duty ratio change from 15% to 20% at an output load current of 150 A.    Figure 23 shows the input and output voltage ripple waveforms at 48 V input and 1 V/780 A output. The steady state output voltage ripple is 18 mV with 5.75 mF output capacitance. The input capacitor voltage ripple is 400 mV. Figure 24 shows a closed-loop transient test for a 50% load step. A classic voltage mode feedback PI controller is used for this experiment. The three virtual intermediate bus voltages and output voltage in response to an output current load step between 50 A and 450 A are measured. The mergedtwo-stage operation maintains stable intermediate bus voltage without a large decoupling capacitor, with expected ripple due to the increase in output load current. Due to the limited controller bandwidth, a 120 mV peak-to-peak voltage excursion is observed during the transient. Advanced control methods,   Fig. 27. Loss breakdown and calculated efficiencies of the switched-capacitor stage, buck stage, and total system at 1.0 V and 1.5 V output conditions. P MOSFET , P C , P Cfilter , and P Copper are the loss of MOSFETs, switchedcapacitors, filter capacitors, and copper trace including connectors and PCB in the switched-capacitor stage. P DrMOS , P CoupL , and P Copper are the loss of DrMOS, coupled inductors, and copper trace in the buck stage. A detailed theoretical loss breakdown is provided in Fig. 27  for 1.0 V and 1.5 V output voltage. The loss breakdown was performed with the experimental duty ratio of the buck stage and junction temperature of DrMOS in Fig. 28b. Losses from the switched-capacitor stage include loss from the MOSFETs, the flying-capacitors, the filter capacitors, and the copper traces. Loss from the buck stage includes loss from the DrMOS, the coupled-inductor (both core loss and conduction loss), and the copper traces. The DrMOS switching and conduction loss dominates the loss of the overall system due to its high-switching-frequency operation and high output current. The switched-capacitor stage maintains high efficiency (above 95.5%) throughout the entire load range at 1 V output voltage condition. The overall system efficiency curve mirrors the shape of the buck efficiency curve, with a larger slope as the load current increases due to increased conduction loss. The converter achieves peak efficiency at around 20% full power, dominated by the efficiency curves of the buck stage.

B. Operation and Performance
Without liquid cooling, the maximum output current of the system is 450 A. Figure 28a shows a thermal image of the converter at V out = 1.5 V and I out = 450 A. Two 36 CFM fans are used, and the PCBs reach a temperature of 78.7 • C. Figure 28b shows a graph of the DrMOS junction temperature (using the built-in temperature sensing pin) for both the air cooled and liquid cooled operation at V out = 1.5 V. The junction temperature reaches 94.3 • C at I out = 450 A. By employing liquid cooling, the junction temperature of the DrMOS reaches 93.9 • C at I out = 780 A. Table III compares key metrics of the vertical stacked LEGO-PoL converter with other state-of-the-art 48 V-to-1 V point-of-load voltage regulator designs. The converter presented in this work achieves the highest reported output current capability at either 450 A with air cooling or 780 A with liquid cooling. This work achieves both the highest power density and the highest current area density, at 577 W/in 3 and 0.587 A/in 2 for air cooling and 1000 W/in 3 and 1.017 A/in 3 for liquid cooling. This is the first demonstration of a 48 V to 1 V point-of-load CPU voltage regulator to achieve over a 1 A/mm 2 current area density and the first to achieve 1,000 W/in 3 power density. This work achieves a peak efficiency of 91.1% and a full load efficiency of 85.7% with air cooling (79.2% with liquid cooling), which is comparable to other high-density designs. The switching frequency of the voltage regulation stage is 1 MHz, among the highest for a 48 V CPU voltage regulator demonstration. The coupled inductors enable the smallest transient inductance and lowest inductive energy storage per watt (defined as the total 1 2 LI 2 energy storage divided by the output power rating P , ignoring the current ripple) for this work when compared to other work. Table III is visualized in Fig. 29 and Fig. 30. The switching frequency is represented by a color gradient. This work achieves the highest current area density while maintaining state-of-the-art peak efficiency, achieves the highest power density while maintaining high full-load efficiency, and switches at a high frequency of 1 MHz with interleaving.

V. DISCUSSIONS AND FURTHER IMPROVEMENTS
The LEGO-PoL design presented in this paper combines many state-of-the-art technologies together to achieve extreme power density and efficiency. Some of its advantages come from the topology, architecture, and magnetics design, and other advantages come from the possibility of vertically packaging it together with the microprocessors to reduce the loss and parasitics in the interconnects, allowing more cores to be placed closer to each other with high speed communication.
We envision that both the silicon power density and server power density will continue to increase in the near future. Both current area density (A/mm 2 ) and power density (W/in 3 ) of CPU voltage regulators are important design targets. The presented prototype achieves its peak efficiency at around 20% of its thermal determined power (TDP). Depending on the applications, different microprocessors (e.g., CPUs, GPUs, XPUs) need performance optimized at different fractions of TDP, leading to different design tradeoffs. Different priorities among efficiency, density, and transient performance  Table III. These designs switch at different frequencies and have different regulation capabilities. The LEGO-PoL converter switches at 1 MHz. The full load efficiency of the Vicor product is not available and is estimated. also lead to different tradeoffs. Challenges and pathways to achieving over 4.5 W/mm 2 area density -matching that of a state-of-the-art of the silicon core -while maintaining a high efficiency across the entire operation range, include: 1) The DrMOS devices we used limit current area density, efficiency, and switching frequency. Better low-voltage power devices, whether based on Si or wide-bandgap semiconductors, are expected to be instrumental in overcoming all three of these limitations.
2) The height of the LEGO-PoL prototype is limited by the vertical coupled inductors and capacitors. Switching at a higher frequency, enabled by better switches; optimizing the magnetics design with a priority on reducing thickness; and more advanced capacitor technologies can further reduce the height and weight of the system. 3) The current throughput of the prototype is limited by the thermal rating of the switches. Better cooling technology, and semiconductor devices that can work at a higher temperature (such as GaN devices), are promising techniques to improve the power density and improve the system efficiency at full load. 4) In the prototype, passive components (capacitors and magnetics) contribute an order of magnitude more volume and weight than the semiconductor devices. As shown in Fig. 31, semiconductor devices only contribute 4% of the system weight and 2% of the system volume. Devices that can efficiently switch at a higher frequency can further reduce the passive component sizes. 5) Printed circuit boards (PCBs) and copper interconnects occupy a large percentage of the system weight and volume. Advanced packaging techniques are needed to further reduce the size and improve the current density.

VI. CONCLUSION
This paper presents a vertical stacked 48 V to 1 V CPU voltage regulator with a linear-extendable group operated pointof-load architecture. By merging the operation of a switchedcapacitor stage and a multiphase buck stage, the advantages of both can be leveraged while decoupling the design challenges of high efficiency, high density, and high control bandwidth. A vertical stacked design enabled by a multiphase coupled inductor is presented to achieve a high power density and high current area density. The system is highly modular and scalable. The power converter modules are vertical stacked and have the potential to reach the current area density of silicon microprocessors by enabling vertical power delivery from the motherboard to the CPU. A 48 V to 1 V, 780 A CPU voltage regulator is built and tested with air cooling and liquid cooling, achieving a 91.1% peak efficiency, a 1000 W/in 3 power density, and a 1.017 A/mm 2 current area density.

APPENDIX I CURRENT SHARING AND VOLTAGE BALANCING
A large signal average analysis is performed to illustrate the principle of the automatic current sharing and voltage balancing mechanisms. In the three submodule system depicted in Fig. 3, assume the duty ratio of the buck converter, i.e., the duty ratio of high side switches, is D, L x is connected with a series resistance R, the large-signal average current of L x is i x , the large-signal average voltage of C y is v Cy , where x ∈ {1, 2, 3} and y ∈ {2, 4}, the large-signal average models are: (3) Note v C1 , v C3 , and v C5 are canceled out in (4). They don't impact the large-signal dynamics. The charge balance requirement of capacitor C 2 and C 4 leads to the automatic current sharing mechanism among L 1 , L 2 , and L 3 . Assuming that L 1 = L 2 = L 3 = L and C 2 = C 4 = C, the second-order differential equations for the current of the three submodule LEGO-PoL system can be obtained from (3): Note M is a real symmetric matrix. M can be diagonalized as M=QΛQ −1 where Q is a matrix composed of eigenvectors (e 1 , e 2 , and e 3 ), and Λ is a diagonal matrix composed of eigenvalues (λ 1 , λ 2 , and λ 3 ) of M. (5) can be rewritten as (6) by denoting Y = Q −1 X: Since i L1 − i L3 is linearly proportional to y 2 , the secondorder differential equation describing i L1 − i L3 is: This second-order differential equation describes the largesignal dynamics of the current difference between i L1 and i L3 . The natural frequency ω n of this second order oscillation system is D

√
LC . The damping ratio ζ is R D C L . The decay rate α is R 2L , the quality factor Q is D

2R
L C . The current difference will respond to perturbations like a second-order system, and gradually decay to zero in periodic steady state. As i L1 and i L3 converge, based on (5), since y 3 is proportional to i L1 −2i L2 + i L3 , and y 3 damps to zero, all currents are equal in steady state. The current sharing mechanism of the LEGO-PoL converter is very similar to that of the series-capacitor buck converter [13]. As the current differences between inductors are zero, the average voltages of C 2 and C 4 , v C2 and v C4 , reach 2Vin 3 and Vin 3 , respectively, because the average voltage across all switch nodes need to be equal. v C1 , v C3 , and v C5 are set by the switched capacitor mechanism due to the small filtering capacitor C filter . This guarantees automatic voltage balancing of the LEGO-PoL architecture. Figure 32 compares the large-signal average model against SPICE simulation results. In periodic steady state, the large signal current i L1 = i L3 , and di L 1 dt = di L 3 dt = 0. This mechanism holds the large-signal average of v C2 at 2 3 V in and the transient dynamics of the capacitor voltage follows a similar second-order transient dynamic (similar damping ratio and Q) as i L1 − i L3 and gradually damps to 2 3 V in following the same oscillation. V C2 will be automatically maintained at This analysis can be extended and generalized for a LEGO-PoL converter with N submodules. Assuming that L 1 = L 2 = . . . = L N = L with series resistance R 1 = R 2 = . . . = R N = R, C 2 = C 4 = . . . = C 2(N −1) = C, and the duty ratio of all buck units high side switches is D, the large-signal average model of the system is: y 1 (t) = K 11 e (−2αt) + K 12 , k = 1 y k(k≥2) (t) = K k1 e (−α+β k )t + K k2 e (−α−β k )t , where K 1 , K 2 , K k1 and K k2 are constant coefficients, α = R 2L , β k = 1 2 ( R L ) 2 − D 2 λ k LC . There are three cases for the solution of y k(k≥2) (t): two different real roots, repeated roots, and complex roots. In all three cases, since α is positive, y 1 (t) damps to K 12 and y k(k≥2) (t) damps to zero as t → ∞. Therefore, in periodic steady state, the large signal inductor currents of the LEGO-PoL architecture with N submodules and total system output current I O will settle to the same constant value K 12 = I O N :  . . .
The voltages of other capacitors are then balanced by the switched capacitor mechanism due to the existence of C filter . The charge balancing mechanism of the series capacitors guarantees automatic current sharing and automatic voltage balancing for the LEGO-PoL architecture with N submodules.