MSC-PoL: Hybrid GaN–Si Multistacked Switched-Capacitor 48-V PwrSiP VRM for Chiplets

This article presents a multistack switched-capacitor point-of-load (MSC-PoL) voltage regulation module (VRM) with coupled magnetics for ultrahigh-current chiplet systems. In the MSC-PoL architecture, the stacked switched-capacitor cells split the high input voltage into several intermediate voltage rails, which are loaded with the switched-inductor cells to achieve soft charging and voltage regulation. Automatic capacitor voltage balancing and inductor current sharing are realized during the soft charging process. Many inductors of the switched-inductor cells are coupled into one and operated in interleaving to reduce the inductor current ripple and boost the transient speed. A 48-to-1-V/450-A VRM containing two MSC-PoL modules is built and tested, leveraging high-voltage GaN devices for the front end and high-current silicon devices for the back end. Two ladder-structured coupled inductor designs are developed and compared, one of which installs a leakage magnetic plate to adjust the leakage inductance for lower current ripple. Featuring 3-D stacked packaging, the entire power stage, gate drivers, and bootstrap circuits of one MSC-PoL module are enclosed into a <inline-formula><tex-math notation="LaTeX">$\frac{1}{16}$</tex-math></inline-formula>-brick/0.31-in<inline-formula><tex-math notation="LaTeX">$^{3}$</tex-math></inline-formula>/6-mm-thick package. The peak efficiency, the full-load efficiency, and the full-load power density (including both gate loss and size) of the MSC-PoL prototype with and without using the leakage plate are 91.7% and 89.5%, 85.8% and 85.6%, and 621 and 724 W/in<inline-formula><tex-math notation="LaTeX">$^{3}$</tex-math></inline-formula>, respectively. The 6-mm-thick MSC-PoL converter can be embedded into the chiplet or CPU socket, enabling power supply in package for extreme efficiency, density, and control bandwidth.


I. INTRODUCTION
A S DENNARD scaling tapered out, processor performanceper-watt improvement gained from the advances in the fabrication process gradually faded away [1], [2], [3], [4]. To meet the growing computational demand of artificial intelligence (AI) applications and cloud computing, microprocessors have entered a new era, where multiple cores are integrated on one chip and many chiplets are colocated on one interposer [5], incessantly pushing toward larger die area and higher power consumption. However, the continuous scaling of computing systems is hitting both the power wall and the memory wall (see Fig. 1) [9]. With billions of transistors, high-performance microprocessors nowadays can consume hundreds of amperes of current at very low voltage (<1 V), greatly increasing the conduction loss on power distribution networks (PDNs) and narrowing the tolerance for supply voltage variations [10]. Besides, the development of AI algorithms dramatically boosts the memory bandwidth demand. These have brought severe challenges to designing highly sophisticated signal and power network, which requires high converter efficiency, high control bandwidth, and high signal and power integrity.
A recent trend in data centers is to replace the ac power distribution with 48-54 V dc distribution networks on the server racks [11]. To deliver power from 48-V dc bus to low-voltage chiplets, conventional voltage regulation solutions heavily rely on the onboard power conversion with little or without any conversion stress inside the processor package [see Fig. 2(a)]. The onboard point-of-load (PoL) converters can be generally classified into two categories: two-stage architecture [12], [13], [14], [15], [16], [17], [18] and single-stage architecture [19], [20], [21], [22], [23]. In two-stage architectures, an intermediate dc voltage bus is employed to decouple the voltage conversion stress and transient dynamics between the two converter stages. The first stage is usually a transformer-based converter (e.g., LLC converter) [12], [13] or a switched-capacitor (SC) circuit [14], [15], [16], [17], [18] functioning as a fixed-ratio dc transformer (DCX), and the second stage is a multiphase buck switching at high frequencies for the high control bandwidth. Compared to transformer-based topologies, SC converters utilize capacitors to undertake the major voltage stress for the large step-down ratio and can substantially reduce the converter size due to the superior capacitor energy storage density. By merging the two stages, one can soft charge the SC circuits to reduce the charge sharing loss [27], [28], [29], [30], allowing the use of smaller capacitors or lower switching frequency. Single-stage Fig. 1. As microprocessors develop from single-core monolithic die to multicore multiple chiplets, modern computing systems are hitting both power wall and memory wall (replotted from [6]). Process node geometry and die area of selected high-performance-tier GPUs in [7] and [8] are plotted along the scaling curve of GPU thermal design power.  [24], [25], [26], and this article (including gate loss). architectures that have low component count and less power conversion stages can attain high efficiency and high power density, but they might experience difficulty realizing high control bandwidth. Although the onboard power conversion solutions are currently the mainstream due to mature techniques and easier implementation, their long PDN traces lead to high conduction loss, and large onboard areas impede microprocessors from communicating with peripherals, limiting the efficiency, power density, as well as control and communication bandwidth.
An alternative 48-to-1-V voltage regulation solution is to embed a substantial part of or complete power conversion circuits into the processor package, enabling ultracompact powersupply-in-package (PwrSiP) systems [31], as shown in Fig. 2(b). With PwrSiP voltage regulation, power conversion stress is shifted from onboard circuits to in-package circuits. The shortened interconnection lengths can significantly reduce PDN losses and improve signal integrity, making it extremely attractive for powering future high-current microprocessors. Fig. 3 shows an example PwrSiP implementation, where a voltage regulation module (VRM) is copackaged with a chiplet or CPU. To fit into the chiplet/CPU socket, the VRM is required to have  GaN switches can be utilized in the SC stage to undertake high voltage stress, while silicon switches can be used in the regulation stage to undertake high current stress. The hybrid GaN-Si switch combination maximizes the advantages of the latest GaN FETs and silicon MOSFETs [14], [32].
both small area and low z-height. Typically, the VRM height is set by the magnetic components, whose sizes are limited by the fundamental tradeoff between transient and ripple performance. Coupled magnetics with interleaving operation can obtain both high di/dt in transient and low current ripple in steady state, substantially reducing dc energy storage and magnetic size [33], [34], [35], [36].
In pursuit of an ultracompact chiplet/CPU VRM with miniaturized z-height for PwrSiP power conversion, this article presents a multistack switched-capacitor point-of-load (MSC-PoL) architecture with coupled magnetic components, as demonstrated in Fig. 4. Multiple SC cells are stacked in front and break down the high input voltage into many intermediate voltage rails, which are loaded with switched-inductor current sources to perform soft charging and voltage regulation. Different from the two-stage PoL architectures, the intermediate voltage rail herein is not necessarily a fixed dc bus but may step between several dc levels at different switching states [37], [38]. The dc rail voltage is provided by the capacitor network of the SC stage, and thus, large intermediate bus capacitors can be eliminated. The switched-inductor cell will be connected to the intermediate voltage rail at the desired voltage level and will be disconnected when the voltage rail shifts to other voltages. Many inductors of the switched-inductor cells are merged into one and operated in interleaving. Through soft charging multiple SCs with one single coupled magnetic component, the MSC-PoL architecture can minimize both capacitor and magnetic size, achieving extremely low z-height as well as high efficiency and high transient speed.
To validate the MSC-PoL architecture, a 48-to-1-V 6-mmthick MSC-PoL VRM with 3-D stacked ladder-core coupled inductors is built and tested. A 0.8-mm-thick leakage magnetic plate is designed to adjust the leakage inductance for lower current ripple. The MSC-PoL VRM leverages a hybrid GaN-Si switch combination and encloses all the components of power stage, bootstrap, and gate driver circuits into a 1 16 -brick module with 0.31-in 3 ultracompact size. Two MSC-PoL modules can support up to 450-A load current with over 724-W/in 3 power density. The peak efficiency (including gate loss) of the MSC-PoL prototype with and without using the leakage plate is 91.7% and 89.5%, respectively.
The rest of this article is organized as follows. Section II introduces the multistack SC architecture together with several example topology implementations. Section III presents a specific 48-to-1-V MSC-PoL topology, clarifies its working principles, and analyzes its dynamic performance with small-signal modeling. Section IV elaborates the design of the MSC-PoL converter, including the ladder-structured coupled inductor, gate driver circuits, and 3-D stacked packaging. Detailed experimental results are presented in Section V. Section VI presents the performance discussions and comparison. Finally, Section VII concludes this article.

II. MULTISTACK SC ARCHITECTURE
There are many different ways of implementing the SC cells and the switched-inductor current sources of the multistack SC architecture. The SC cells can be implemented as any SC structure that can leverage soft charging, such as Dicksonderived topologies or flying-capacitor-derived topologies; the switched-inductor cells functioning as voltage regulators can be implemented as pulsewidth modulation (PWM) or resonant converters, such as buck, series-capacitor buck (SCB), and SEPIC converters. One can combine different SC and switched-inductor cells to meet diverse design requirements. Fig. 5 shows an MSC-PoL architecture based on modular "H-bridge" structures. The SC cell is configured as a 2:1 Hbridge circuit with one terminal connected to the input side, one terminal connected to ground, and two intermediate voltage rails each providing a half of the input voltage. Two voltage rails are  loaded with switched-inductor circuits that function as voltage regulators and can soft charge and discharge the flying capacitor of the H-bridge SC cell. The MSC-PoL architecture is modular and extendable. One can stack many H-bridge structures to interface with higher voltages (e.g., 96 V, 192 V) or parallel multiple voltage regulator structures to support higher output currents. Redundant switches within the stacked H-bridges or between the SC stage and the switched-inductor stage are merged to reduce component count and power loss [16]. The switched-inductor current sources are operated in interleaving to decrease the output current ripple.
Figs. 6-8 show several example MSC-PoL topologies with 16 output phases. The 16-phase inductors can be implemented as eight two-phase coupled inductors, four four-phase coupled inductors, or one 16-phase coupled inductor. The 16phase switched-inductor cells can be implemented as multiphase buck (see Figs. 6 and 7), multiphase SCB, or a hybrid (see Fig. 8). Fig. 7 shows an alternative implementation of the MSC-PoL architecture, which is capable of producing multiple output voltages. The current sources are connected in parallel but are separately regulated to supply different output voltage levels.   the unbalanced voltages and currents caused by nonideal factors including resistance variation between phases [39], phase-shift error [40], and source impedance [41]. This article is a new development of the family of linearextendable-group-operated point-of-load (LEGO-PoL) [27] and virtual intermediate bus point-of-load (VIB-PoL) [14] converters with a better balancing among power density, efficiency, and component count. Key contributions include: 1) proposing a new category of hybrid-SC architectures and introducing a new way of merging two stages with floating voltage rails; 2) designing a novel coupled inductor structure with a leakage plate to adjust leakage inductance and current ripple; and 3) building a 48-to-1-V VRM with both excellent efficiency and power density. Detailed topological comparison between the MSC-PoL converter and published works is summarized in Appendix III.

III. 48-TO-1-V MSC-POL CPU VOLTAGE REGULATOR
This section presents the operation principles and small-signal models of a 48-to-1-V 450-A MSC-PoL converter. Fig. 9 shows the 48-to-1-V MSC-PoL topology. It consists of one H-bridge SC cell stacking on top of two four-phase SCB cells. The H-bridge SC cell steps down V in by half and distributes 24 V to each SCB cell. Two switches at the output terminals of the H-bridge are merged with the input switches of the SCB circuits. Voltage conversion ratios or power ratings can be extended by stacking more H bridges or paralleling more SCB phases [37]. In Fig. 9, the maximum drain-source voltage stress is labeled aside each switch. Switches in the H-bridge SC cell can use high-voltage GaN FETs to undertake high voltage stress, while switches in the SCB cells can utilize low-voltage low-resistance silicon MOSFETs to support large current output.  The four interleaving-operated inductors are coupled in parallel, leading to reduced inductor current ripples of 4× switching frequency. In Fig. 10, two SCB cells are operated with a 180 • phase shift as an example. Other phase shifts between SCB cells (e.g., 145 • or 225 • ) and alternative coupled inductor solutions (e.g., coupling all eight inductors in parallel) can also be applied to realize eight-phase interleaving with further reduced ripple amplitudes and increased ripple frequency for inductor and output currents. The flying capacitor C fly in the H-bridge SC cell is soft charged and discharged in turns by the first two SCB phases (i.e., phases 1A and 1B), while the blocking capacitors C 1X -C 3X in each SCB cell are soft charged and discharged by neighboring inductor currents. As a result, the 48-to-1-V MSC-PoL topology is capable of automatic voltage balancing for all the capacitors and automatic current sharing for all the parallel output branches.

A. Topology and Operation Principle
Based on inductor volt-second balance, the steady-state output voltage can be expressed as D = 1 6 for the 48:1 voltage conversion ratio. As indicated by (1), the steady-state operation of the MSC-PoL converter resembles that of a multiphase buck converter, but with a reduced input voltage of one-eighth the original value.

B. Dynamic Modeling and Analysis
This subsection analyzes the transient performance of the MSC-PoL converter through small-signal modeling. For the four-phase coupled inductor, dynamic winding voltages and currents are associated by an inductance matrix ⎡ Two effective discrete inductances, the transient inductance (L tr ) and the steady-state inductance (L ss ), can be defined, which have the same transient speed and the same current ripple as the coupled inductor, respectively [34]. If the four-phase coupled inductor is symmetrically coupled, the summation of each column in the inductance matrix is the transient inductance for each phase: L tr = 4 j=1 L jk (k = 1, . . . , 4). Applying switching-cycle averaging and small-signal approximation to the MSC-PoL converter yields the small-signal circuit model, as demonstrated in Fig. 11. It can be treated as the combination of two SCB small-signal circuits [42] linked by the flying capacitor C fly . R eq is the equivalent series resistance at each phase that captures the power losses. Based on (2) and Fig. 11, the overall converter dynamics can be modeled as (3) In (3), impacts of both the flying capacitor and the blocking capacitors are eliminated as summing up the dynamic equations for the eight phases. Detailed derivations are provided in Appendix I. Accordingly, the input-to-output and the control-to-output transfer functions are Fig. 12. Two four-phase coupled inductor designs based on (a) a ladder core and (b) a ladder core plus a leakage plate. The ladder core is made of DMR51 W (µ r = 900), while the leakage plate is made of DMR53 (µ r = 900), a higher frequency magnetic material to enhance the leakage flux path.
. (4) Equations (3) and (4) indicate that the overall system dynamics and transfer functions of the MSC-PoL converter are the same as a multiphase buck with v in /8 input voltage and L tr /8 output inductance. Therefore, it can be controlled by typical control methods for a multiphase buck (e.g., voltage mode control or constant-ON-time control), expecting that the duty ratio is limited within 25%, which might restrain its maximum transient speed.

IV. CONVERTER DESIGN WITH 3-D STACKED PACKAGING
To validate the MSC-PoL architecture, a 48-to-1-V, 450-A, 6-mm-thick MSC-PoL VRM with 3-D stacked ladder-core coupled inductors is built and tested. This section elaborates the design of the ultrathin MSC-PoL VRM, including coupled inductors, gate driver circuits, and 3-D stacked packaging.

A. Ladder-Structured Coupled Inductor
In the 48-to-1-V MSC-PoL converter, each SCB cell requires a four-phase coupled inductor. Fig. 12 shows two ladderstructured coupled inductor designs based on: 1) a ladder core only and 2) a ladder core plus a leakage plate. The ladder magnetic core, made of DMR51W (μ r = 900), couples four horizontally arranged windings in parallel. Stacking the leakage plate on top creates a low-reluctance path for the leakage magnetic flux, and the resulting larger leakage inductance can reduce the inductor current ripple, achieving higher efficiency. In a fully symmetric coupled inductor structure, the frequency of the leakage magnetic flux is four times the switching frequency. As a result, the leakage plate adopts a higher frequency magnetic material DMR53 (μ r = 900) for lower core loss. Fig. 13 annotates the design dimensions for the ladder core. Due to printed circuit board (PCB) layout constraints, the overall core and winding shapes are determined by three dimensionless variables: X Leg , H Leg , and H tot . In this article, geometries of the ladder core are optimized for the minimum sum of conduction loss and core loss. Since the ac root-mean-square (RMS) current is negligible at heavy load, the winding conduction loss is calculated only based on the dc resistance. The core loss is predicted using the improved generalized Steinmetz equation (iGSE) [43], where the power loss density of each core segment Fig. 13. Annotated design dimensions for the ladder core. To fit the PCB layout, the entire inductor shape can be determined by three dimension variables: X Leg , H Leg , and H tot . Predicted core loss for geometry optimization is based on the flux density in each core segment (labeled in blue) using iGSE.
can be expressed as k, α, and β are the material Steinmetz coefficients provided by the manufacturer. It is noticeable that the predicted core loss from iGSE does not capture the impacts of temperature and dc flux density, and the calculated winding conduction loss does not include the loss from winding soldering and winding returning path on the PCB. However, the resistance of soldering and PCB returning path is less dependent on inductor geometry and is relatively constant. Therefore, the calculated inductor loss herein can still provide good guidance for optimizing the dimensions of the coupled inductor. Advanced core loss modeling tools, such as neural network models, can be used to estimate the core loss under particular operating conditions (e.g., waveform, temperature, and dc bias) [44].
In (5), the flux density of each core segment can be calculated based on the equivalent magnetic models in Fig. 14. Fig. 14(a) plots the magnetic circuit model. Each core leg is modeled as a leg reluctance ℛ L in series with a magnetomotive force source. The top and bottom core segments between two legs are lumped as a header reluctance ℛ H . The leakage flux path of each phase is modeled as a parallel leakage reluctance ℛ K . Generally, for a ladder-structured coupled inductor, ℛ K is not identical for all the phases. The ℛ K discrepancy tends to increase as the phase number increases, but for the designed four-phase coupled inductor, the difference is small enough and ℛ K can be analyzed using average values in most of the cases. Adding the leakage plate will reduce ℛ K , but it is still much larger than the core reluctances ℛ L and ℛ H . Applying circuit duality to the magnetic circuit model yields the inductance dual model, as shown in Fig. 14(b). Magnetic flux in each core segment can be calculated through probing the current in the inductance dual model and dividing it by corresponding reluctance. Detailed derivations of the magnetic flux density are provided in Appendix II. Fig. 15 demonstrates the optimization process for the ladder-core coupled inductor (without the leakage plate) under the conditions of 125-A average current (31.25-A/phase) and  500-kHz switching frequency. Given a specific inductor height H tot , the optimized inductor geometries are obtained from the inductor loss contour plot by sweeping X leg and H leg , as shown in Fig. 15(a). The optimized inductor loss versus H tot is plotted in Fig. 15(b). Weighing the tradeoff between inductor loss and height, H tot is selected as 2.9 mm. Key parameters for the optimal coupled inductor design are listed in Table I      The two coupled inductor structures are verified by both FEM and SPICE simulations. Fig. 18 shows the FEM magnetic field simulation in ANSYS. In Fig. 18(a), a magnetostatic simulation is performed to display the dc flux distribution when each phase conducts 31.25-A dc current (125 A in total). The dc flux density in the core leg is 0.066 T if not using the leakage plate. After installing the leakage plate, it increases to 0.28 T, but it is still much lower than the saturation flux density (0.5 T) of the magnetic material used. Therefore, both the two coupled inductors can support 125-A dc current, which is sufficient for the MSC-PoL converter designed in this article. Although adding the leakage plate will reduce the saturation current limit, it is acceptable in most cases because the current rating of a coupled inductor is usually constrained by unbalanced phase currents and semiconductor devices. In Fig. 18(b), a transient magnetic field simulation is conducted for one switching cycle (2 μs), displaying the ac flux density at t = 1 μs when it reaches its peak in the middle core header and the third core leg. Detailed simulated ac flux density versus time is provided in Appendix II. As shown in Fig. 18(b), the ac flux density is similar with or without using the leakage plate. This indicates that the core losses of the two coupled inductors are comparable, though they might be influenced by the dc bias. The dc flux distribution and the ac flux density waveforms (in Fig. 37) are relatively balanced across the four phases, indicating that the ladder-structured coupled inductor is quite symmetric. Fig. 19 shows the SPICE simulation of the 48-to-1-V MSC-PoL converter when using different coupled inductor designs as well as discrete inductors of equivalent L ss and L tr . Simulations with coupled inductors are based on the extracted inductance matrix from ANSYS. Simulated steady-state inductor current ripples and transient output voltages during a duty ratio step change are plotted in the figure. Since the transfer function G dv o in (4) is a second-order system, its maximum percent overshoot (M p ) and 2% settling time (t s ) of a step response are Lower L tr results in faster transient with less t s , but M p is not necessarily smaller, for it is also related to other circuit parameters. Therefore, as implied by Fig. 19(a), the ladder-core coupled inductor can achieve as fast transient speed as using small 17-nH discrete inductors while maintaining as low current ripple as using large 140-nH discrete inductors. If adding the leakage plate with 1-mm extra thickness, the coupled inductor can further reduce current ripple to an extremely low level [see Fig. 19(b)], significantly decreasing switching-related loss and improving converter efficiency. The disadvantages of adding the leakage plate are slower transient speed, lower saturation current limit, and larger thickness.   are used for S 0X -S 1X in the SC cell to undertake high voltage stress; silicon MOSFETs with lower voltage ratings are used for S 2X -S 8X in the SCB cells to undertake high current stress. The hybrid GaN-Si switch combination maximizes the advantages of material characteristics and state-of-the-art performance of GaN FETs and silicon MOSFETs. Fig. 20 plots the detailed gate driver and bootstrap circuit design for one MSC-PoL module. Supporting by an external voltage rail V drive (V drive = 8 V), the bootstrap chain creates multiple floating dc voltages referenced to floating switch source terminals. In each SCB cell, half-bridge gate drivers (UCC27282) are used to drive S 2X -S 4X and S 5X -S 8X , and low-side gate drivers (LM5114) are used to drive S 5X . In the H-bridge SC cell, high-side gate drivers (LTC4440-5) and 5-V low-dropout (LDO) regulator are utilized for driving the GaN switches S 0X -S 1X . The PWM input side of each gate driver is ground referenced and powered by V drive . The driving output side is powered by the bootstrap chain for the floating switches or by V drive for the grounded switches.

B. Gate Driver Circuits and 3-D Stacked Packaging
Detailed PCB layout and 3-D stacked packaging of the MSC-PoL VRM are plotted in Fig. 21. The VRM measures 31.9 mm × 26.6 mm in area, and the overall height is only 6 mm (7 mm if including the leakage plate). All the power devices are placed on the top side of the PCB, while the coupled inductors and gate drivers are stacked on the bottom side. Placing all the power components on one side simplifies the cooling requirements by enabling single-sided heat dissipation. The bootstrap circuit chain is laid out in the center of the converter, and on its two sides symmetrically located are the H-bridge SC

cell as well as the two four-phase SCB cells (cells A and B).
To minimize both converter height and onboard area, a 3-D stacked inductor-driver packaging is implemented, as shown in Fig. 21(b). At the bottom side of the PCB, the coupled inductors are stacked on top of the gate drivers with a copper backbone inserted in between to draw the high output currents out. Winding structures of the two inductors are in symmetry to bring all the output currents to the middle, which helps to shorten the layout length of PCB traces and reduce the conduction loss of the overall system. All components including power stage, bootstrap chain, gate driver circuits, and coupled inductors are packaged into a 1 16 -brick module with 0.31-in 3 ultracompact size and 6-mm ultrathin thickness. Only PWM pins, a voltage rail V drive , and an optional heat sink are needed to operate the MSC-PoL VRM.

A. Prototype and Testbench
A 48-to-1-V/450-A MSC-PoL prototype comprising two parallel-connected MSC-PoL modules is fabricated and tested.    Fig. 22(a), each MSC-PoL module is enclosed within a 31.9 mm×26.6 mm×6 mm box volume, which is comparable to a U.S. quarter. The step-by-step packaging procedure of the stacked inductor-driver structure is plotted in Fig. 22(b). With the ultracompact size and the ultrathin thickness, the MSC-PoL VRM can be embedded into an FCLGA-3647 socket to power an Intel Xeon Platinum 8280 CPU (205 W), enabling PwrSiP voltage regulation, as demonstrated in Fig. 23. Fig. 24(a) plots the block diagram of the full prototype power stage, which contains 16 output phases. Fig. 24   16-phase interleaving, leading to 16× ripple frequency and reduced ripple amplitude for the output current. The general operation principle as well as the steady-state and dynamic modeling of each MSC-PoL module is the same as in Section III. Fig. 25 shows the complete hardware prototype including the power stage, the signal interface board, and two F28388D controllers. A heat sink (SKV38538514-CU) equipped with a dc fan (9GA0312P3J001) is placed on top of each MSC-PoL module through the thermal interface. The heat sink covers all the power devices placed on the top side of the PCB. Benefiting from the single-side heat dissipation, the heat sink can easily take away most of the heat generated by the power devices. Other advanced heat dissipation solutions (e.g., liquid cooling) can also be applied to bring the heat out from a compact PwrSiP system. Fig. 26 shows the experimental testbench. Four digital multimeters (Agilent 34401A) are utilized in combination with the BenchVue software platform to set up an automatic efficiency measurement system. Two current shunts (Rideon RSN-50 and RSC-1000), calibrated by Agilent 34330A, are connected in series at the input and output for precise current measurement. A dc power source (BK Precision 9117) is used to provide the 48-V In the following experiments, the MSC-PoL prototype is tested based on the component parameters in Table III and phase-shift strategy in Fig. 24(b), unless otherwise specified. Measured experimental results when using different coupled inductor designs in Table II are compared and discussed.

B. Steady-State Operation
This subsection demonstrates the steady-state operation of the MSC-PoL prototype when delivering power from 48 to 1 V and switching at 400 kHz. The leakage plate is installed on the coupled inductor for lower current ripple. Fig. 27 shows the measured waveforms of switch drainsource voltages and two intermediate rail voltages. The maximum switch voltage stresses are labeled aside the waveforms, which are 24 V for S 0X , 30 V for S 1A/C , 18 V for S 1B/D , 12 V for SCB high-side switches (S 2X -S 4X ), and 6 V for SCB low side switches (S 5X -S 8X ), consistent with the analysis in Fig. 9. Two intermediate rail voltages V Rail1A and V Rail1B refer to the voltages of positive and negative terminals of the flying capacitor C fly . V Rail1A is shifting between 24 and 48 V, while V Rail1B is alternating between 0 and 24 V. By turning ON S 1X , each SCB cell will be switched into the corresponding voltage rail when it turns 24 V.  Fig. 28 shows the measured waveforms of switch node voltages and output voltage ripples. The phase-shift strategy in Fig. 24 is applied: 1) the phase shifts among four SCB cells are 202.5 • between cells A and B, 112.5 • between cells B and C, and 202.5 • between cells C and D; and 2) neighboring phases within each SCB cell are shifted by 90 • . As shown in Fig. 28(b), the applied phase-shift scheme enables 16-phase interleaving, yielding greatly reduced ripple amplitude with 16f sw ripple frequency for the output voltage. The peak-to-peak steady-state output voltage ripple is less than 10 mV. Fig. 29 shows the measured capacitor dc voltages and ac voltage ripples when delivering 400-A load current. As indicated by Fig. 29(a), both the flying capacitor and the blocking capacitors can maintain stable voltages at heavy load, functioning like a dc source with expected dc values. As shown in Fig. 29(b), the capacitor ac voltage ripples can remain less than 0.8 V at 400-A load current (i.e., 89% of the full load).  The settling time of reaching within 5% error band of the final voltage is 26 μs for using the leakage plate and 18 μs for not using the leakage plate. As discussed in Section IV-A, adding the leakage plate will reduce the current ripple but also slow down the transient speed due to larger leakage inductance, resulting in longer settling time. However, one MSC-PoL module contains eight output phases in parallel. This narrows the transient performance difference between the two coupled inductor designs since both of them have a very small total output leakage inductance, which is comparable to the parasitic trace inductance. Therefore, after adding the leakage plate, the MSC-PoL VRM still maintains a fast transient speed. Besides, the flying capacitor and the blocking capacitor voltages remain stable during the open-loop duty ratio step change. Fig. 31 shows the measured waveforms of closed-loop transient experiments. A typical voltage-mode feedback control with proportional-integral compensator is implemented, which changes the duty ratio based on the error between reference and output voltages. The output load current is programmed to step between 50 and 150 A with 4-A/μs downslope. As indicated by the figure, the maximum voltage overshoot is less than 80 mV during this 100-A load step (44% of the full load). The flying capacitor and blocking capacitor voltages also remain stable in the closed-loop transient test. Demonstrating the extreme transient performance of the converter is beyond the scope of this article. The maximum load current slew rate in the test is limited by the electronic load and the output capacitance. The transient performance can be further enhanced by increasing the control loop bandwidth (e.g., reducing the delay of controller and gate drivers), minimizing the output capacitance, or by using advanced nonlinear controls (e.g., constant-ON-time control).

D. Efficiency Measurement
The efficiencies of the MSC-PoL prototype with and without using the leakage plate are measured at multiple switching frequencies. The gate drivers and the bootstrap chain are powered by an auxiliary dc-dc converter, and the gate losses are estimated by Q g V drive f sw . V drive is the voltage of the auxiliary power rail, and V drive = 8 V in all the experiments. Q g is the summed gate charge of all switches during one switching cycle. Q g V drive f sw captures all the losses of the entire bootstrapping and driving circuitry, including the losses on the bootstrap diodes, the LDOs, and the switch gate drive. Figs. 32 and 33 summarize the 48-to-1-V efficiencies of the MSC-PoL prototype with and without using the leakage plate, respectively. Efficiencies of different switching frequencies excluding and including the gate losses are collected and compared. As shown in the figures, the MSC-PoL prototype with the leakage plate has a higher efficiency than without using the leakage plate. As the switching frequency increases, there is a tradeoff between the decreased ac conduction losses and the increased switching related losses (including switching losses, deadtime losses, parasitic loop inductance losses, etc.). When using the coupled inductor with the leakage plate, the inductor current ripple is already very small. Increasing switching frequency does not have a significant reduction in ac conduction losses, so the increased switching related losses will dominate. In this case, a higher switching frequency yields a lower efficiency. As for using the coupled inductor without the leakage plate, the inductor current ripple is large. Increasing switching frequency can greatly reduce ac conduction losses. The decreased ac conduction losses dominate the frequency impacts at light load, but at heavy load, the increased switching-related losses are predominant. Consequently, a higher switching frequency leads to a higher efficiency at light load but a lower efficiency at heavy load. At full load where the current ripple amplitude has little influence on the total power losses, the MSC-PoL prototype of using different coupled inductor designs has a similar efficiency for the same switching frequency. The efficiency measurement results indicate that, if excluding the gate losses, the MSC-PoL prototype with the leakage plate can achieve 93.1% peak efficiency at 140 A/400 kHz and 86.2% full-load efficiency at 450 A/400 kHz. In contrast, the MSC-PoL prototype without using the leakage plate can achieve 91% peak efficiency at 150 A/602 kHz and 84.6% at 450 A/602 kHz. The gate drive losses are estimated as 2.48 W at 400 kHz, 3.10 W at 500 kHz, and 3.74 W at 602 kHz. Fig. 34 shows the thermal image of the MSC-PoL prototype under dc fan and heat sink cooling. After operating at 450-A full load for more than 10 min, the hot-spot temperature of the heat sink remains around 45 • C when the ambient temperature is around 20 • C. Featuring single-side heat dissipation, the MSC-PoL prototype greatly simplifies its cooling design, enabling long-term operation at heavy load while keeping a cool temperature.

VI. PERFORMANCE DISCUSSIONS AND COMPARISON
The 48-to-1-V MSC-PoL CPU VRM is a combination of many state-of-the-art technologies, including multistack SC architecture, soft charging technique, hybrid GaN-Si switch combination, coupled magnetics, and 3-D stacked packaging.    It achieves an ultracompact size with both a small area and a low z-height. The overall VRM height is only 6 mm (7 mm if adding the leakage plate), making it an extremely attractive PwrSiP solution for CPU voltage regulation from 48 V.
Appropriate coupled inductor design can be selected based on specific application requirements. Adding the leakage plate can reduce the inductor current ripple, and the resulting smaller RMS and peak current values decrease conduction loss, switching loss, and parasitic inductance loss, yielding a higher efficiency. The tradeoff is the increased VRM height and slower transient response. However, with eight-phase (or 16-phase) interleaving, the coupled inductor that uses the leakage plate can still achieve a fast transient speed, as demonstrated in Section V-C. Although the light-load efficiencies for the two coupled inductor designs are quite different, their heavy-load efficiencies are very close given the same operation frequency.
Detailed loss breakdown of the 48-to-1-V/400-kHz MSC-PoL prototype (with the leakage plate) is plotted in Fig. 35. The power loss breakdown contains: 1) losses of the H-bridge SC stage including switching and conduction losses of the GaN switches (S 0X -S 1X ) as well as ESR loss of the flying capacitors (C fly ); 2) losses of the SCB stage including switching and conduction losses of the MOSFETs (S 2X -S 8X ), ESR loss of the blocking capacitors (C 1X -C 3X ), core loss and winding loss of the coupled inductors; 3) parasitic loop inductance loss estimated by 1 2 L loop i 2 L f sw ; and 4) deadtime loss, PCB trace conduction loss, and gate loss estimated by Q g V drive f sw . The PCB trace conduction loss at each load condition is estimated based on the measured trace resistance (measured by BK Precision 2840) and the simulated RMS current value. At light load, gate loss, core loss, and switching loss are predominant. When load current increases to 170 A where the peak efficiency is achieved, the major power losses are relatively evenly distributed among switching loss, conduction loss, and gate loss. As load current keeps rising, the low-side conduction loss and parasitic loop inductance loss increase dramatically and will dominate at 450-A full load. To further improve the efficiency and power density, multiple switches and gate drivers can be integrated together to reduce the parasitic loop inductance especially for the SCB stage. Table IV compares several key metrics of the MSC-PoL prototype with other state-of-the-art 48-to-1-V PoL voltage regulators. The full-load power density with and without using the leakage plate is 621 and 724 W/in 3 , respectively. A performance metric represented as the connection curve of the efficiency and power density points at full load and peak-efficiency load is introduced and plotted in Fig. 36. The MSC-PoL prototype presented in this article expands the performance boundary of PoL VRMs by pushing toward higher efficiency and higher power density.

VII. CONCLUSION
This article presents the MSC-PoL PwrSiP VRM with coupled magnetics to power ultrahigh-current CPU or chiplet systems. In the MSC-PoL architecture, many SC cells are stacked in front and connected with switched-inductor cells for soft charging and voltage regulation. It attains decreased current ripple and boosted transient speed from parallel coupling as well as reduced charge sharing loss and automatic capacitorvoltage/inductor-current balancing from soft charging. A 48to-1-V MSC-PoL topology is developed, and its steady-state and transient performance are analyzed. The 48-to-1-V MSC-PoL converter has a similar small-signal model and transfer functions as a multiphase buck. Therefore, typical buck control methods (e.g., voltage-mode and constant-ON-time controls) can be directly applied with a 25% duty ratio limit. To validate the MSC-PoL architecture, a 48-to-1-V/450-A prototype containing two MSC-PoL modules is built. Two coupled inductor designs based on a ladder-structured magnetic core are developed and compared. A leakage magnetic plate of 0.8-mm thickness is designed to adjust the leakage inductance for lower current ripple. Benefiting from the 3-D stacked inductor-driver packaging, one MSC-PoL module encloses all the circuits and components into a 1 16 -brick/0.31-in 3 /6-mm-thick package, achieving 724-W/in 3 power density. It leverages a hybrid GaN-Si switch combination for maximized benefits from the latest GaN and silicon devices. When including the gate loss, the MSC-PoL prototype with the leakage plate can achieve 91.7% peak efficiency at 170 A/400 kHz and 85.8% full-load efficiency at 450 A/400 kHz. In contrast, the MSC-PoL prototype without using the leakage plate can achieve 89.5% peak efficiency at 210 A/400 kHz and 85.6% at 450 A/400 kHz. The MSC-PoL VRM achieves both excellent efficiency and power density compared to state-of-the-art VRM designs. It can be further embedded into the CPU/chiplet socket for PwrSiP voltage regulation with extreme efficiency, density, and control bandwidth.

APPENDIX I DERIVATIONS OF THE SMALL-SIGNAL MODEL
This appendix presents the detailed derivations for the smallsignal model. According to Fig. 11, dynamic modeling equation for each phase can be obtained as (8) Here, f (î LkX ) (k = 1, . . . , 4, X = AorB) is the voltage drop across the inductor winding, R eq , and the output port at each phase f (î LkX ) = s 4 n=1 L knîLnX +î LkX R eq +v o .
By summing up the equations in (8), impacts of the flying capacitor (C fly ) and the blocking capacitors (C 1X -C 3X ) are eliminated, and the overall converter dynamic equation can be derived as shown in (3).

APPENDIX II DERIVATIONS OF THE MAGNETIC FLUX DENSITY
This appendix analytically derives the ac magnetic flux density in a ladder-structured coupled inductor based on its inductance dual model. The presented MSC-PoL converter operates the four-phase coupled inductor similarly to an interleaved multiphase buck: four windings are driven by interleaved square wave voltages shifting between (1 − 1 D )v o and v o . Denote the winding voltages as v L1 -v L4 , which can be expressed as The magnetic flux of each segment in the ladder magnetic core can be mapped to the corresponding inductor current in the inductance dual model. As shown in Fig. 14(b), the ac current of the inductor 1/ℛ L is directly determined by its parallel voltage source: di ℛ Lk /dt = v Lk · ℛ L . Accordingly, the ac flux density in the kth core leg can be derived as S Leg is the cross-sectional area of each core leg. Equation (11) can also be developed from Faraday's law. It implies that the ac flux density in one core leg is only related to its own winding voltage, irrelevant to other phases.
In Fig. 14(b), 1/ℛ H >> 1/ℛ K even with the leakage plate. Therefore, the voltage across the inductor 1/ℛ H is primarily determined by the voltage division along the series-connected 1/ℛ K1 -1/ℛ K4 . Similar to (11), the ac flux density in core headers (i.e., segments between core legs) can be derived as (12) S Head is the cross-sectional area of each core header; ℛ K1 -ℛ K4 can be obtained from the extracted inductance matrix in AN-SYS simulation. In Section IV-A, to simplify the calculation, ℛ K1 -ℛ K4 are treated as identical in the inductor optimization process, as their differences are small. Fig. 37 compares the calculated and simulated ac flux density for the two coupled inductor designs. As indicated in the figure, the ac flux density is almost the same with or without using the leakage plate. The calculated and simulated results match well, validating the theoretical analysis.

APPENDIX III COMPARISON WITH PUBLISHED WORK
This appendix provides a topological comparison between the presented MSC-PoL converter and the previous published LEGO-PoL and VIB-PoL converters. Each of the topologies has its own pros and cons. Normalized switch power stress is utilized as one performance metric Normalized switch power stress = V ds I ds.rms V ds is the maximum switch voltage stress, and I ds.rms is the switch RMS current (ignoring both the capacitor voltage ripple and inductor current ripple). Topology with a lower normalized switch power stress tends to have lower switch power losses and smaller switch package. Topological comparison of the three PoL converters is summarized in Table V. The MSC-PoL and VIB-PoL converters focus on reducing the switch power stress. The LEGO-PoL converter can offer better transient performance due to the decoupled two-stage operation and the absence of duty ratio