On-Orbit Implementation of Discrete Isolation Schemes for Improved Reliability of Serial Communication Buses

—Serial communication buses are used in electronic systems to interconnect sensors and other devices, but two of the most widely used protocols, I 2 C and SPI, are vulnerable to bus-wide failures if even one device on the bus malfunctions. For aerospace applications demanding increasingly more distributed processing and sensing capability, the compounding risk to system reliability as device count scales becomes a limiting factor in mission scope, performance, and lifetime. We propose a simple external circuit to be added to each node on a communication bus that automatically isolates the node in the event of device failure. By automatically isolating failed devices, the integrity of the bus is preserved without requiring additional signals or processing overhead from the host controller. In this article, I 2 C and SPI isolation circuits are simulated, fabricated, and experimentally veriﬁed to be effective at preserving bus integrity in the event of peripheral device failure. Generalized reusable circuit blocks were designed and integrated into three spacecraft systems for the successful NASA V-R3x mission deployed in January 2021. The addition of serial bus isolation signiﬁcantly improved system reliability for the V-R3x mission by eliminating single-point fail- ure modes of the I 2 C and SPI buses interconnecting sensors and radios necessary for mission success. The developed protection schemes are a new tool for decoupling system reliability from serial bus device count and can readily be integrated into existing aerospace systems.


I. INTRODUCTION
M ANY robotic, automotive, and aerospace systems rely on serial communication buses to integrate the multitude of distributed devices necessary for system operation [1]. As more commercial-off-the-shelf (COTS) electronics find their way into these applications, so do many COTS practices and protocols [2]. Two of the most prevalent COTS serial communication protocols, Inter-Integrated Circuit (I 2 C) and Serial Peripheral Interface (SPI), are inherently vulnerable to single-point failures capable of disabling entire branches of the distributed system [3]. However, the embedded systems literature is lacking a concise discussion on how these serial communication buses can fail, the impact bus failure poses to the system, and how to readily protect I 2 C and SPI buses from their inherent single-point failure modes without custom device or controller-level modifications to the system [4].
Their simplicity and ubiquitous hardware support make I 2 C and SPI communication buses an integral part of modern embedded systems applications requiring distributed sensor and computing nodes. For example, consider the design task of selecting digital temperature and inertial sensor components needed at multiple locations along a robotic arm. Of the 109 digital temperature sensors currently sold by Texas Instruments, 93% rely on either I 2 C or SPI as a communication interface [5]. The prevalence of I 2 C and SPI buses spans most major electronics manufacturers as well. Of the inertial measurement sensors currently available from COTS suppliers such as Digi-Key Electronics and manufactured by Analog Devices, Bosh Sensortec, or STMicroelectronics, all 31 products (100%) rely on either I 2 C or SPI as an interface [6].

A. Motivation
As illustrated in Fig. 1, isolation along a bus can provide significant improvements to system-level reliability by eliminating the single-point failure modes inherent to these protocols. However, component manufacturers do not currently offer product solutions to meet this need, nor would a single-component solution meet the specific size and reliability requirements for individual applications. Products marketed as "digital isolator" or "hot-swap controller" might sound applicable, but these products do not pertain to fault isolation and would be unable to prevent a misbehaving signal from dominating the logic state of the bus [7]. One could imagine a custom isolation circuit solution using digital switches, but this is rarely feasible since it would require a prohibitive amount of dedicated input/output (I/O) signals as well as intricate computing routines to locate and isolate a failure.
Therefore, as aerospace applications continue to scale the number of COTS I 2 C and SPI devices in their systems, the need for serial bus isolation grows since commercial solutions do not yet exist and thus far embedded systems literature remains focused on custom device/controller-level solutions [4]. This article introduces a new approach for improving reliability of existing COTS serial buses and provides the community with the tools necessary to implement them in their own systems. To the best knowledge of the authors, this work is the first reported autonomous isolation circuit solution for the protection of serial communication buses.

B. Contributions
The key contributions of this work are as follows. 1) Summary of modern I 2 C and SPI communication bus architecture, failure modes, and the impact they have on overall system reliability. 2) Two discrete protection circuits effective at autonomous node isolation in the event of peripheral device failure are proposed. 3) A means of simulating multi-node I 2 C and SPI bus performance using LTspice for characterization and validation of the proposed serial bus protection schemes. 4) The development, testing, and successful implementation of the proposed isolation circuits on-board three spacecraft deployed to Low-Earth Orbit. 5) The schematics, PCB layouts, and component lists necessary for general application of the the proposed serial bus protection circuits.

II. BACKGROUND INFORMATION
This section provides an overview of I 2 C and SPI protocols necessary for understanding their impact on system-wide reliability and the resulting constraints imposed on potential isolation schemes.

A. I 2 C Protocol
An I 2 C bus is comprised of two signal lines: clock (SCL) and data (SDA), to achieve bidirectional communication between a host controller and up to 128 peripheral devices along the bus [8]. Hardware implementation of the I 2 C protocol relies on an "open-drain" I/O scheme, in addition to buffer and control logic, for bidirectional data flow. Figure 2 illustrates the basic topology of an I 2 C bus with emphasis on the open-drain transistor architecture. The external pull-up resistor (R PULL−UP ) plays a vital role in the I 2 C implementation by ensuring a logic-high condition, while a logic-low condition is achieved by enabling the internal NMOS FET and pulling the signal to ground. I 2 C communication speeds are predefined and range from 100 kHz (Standard-mode) to 400 kHz (Fast-mode), and 5 MHz (Ultra Fast-mode). The protocol also defines "clock stretching" which permits the peripheral device to forcefully decrease the SCL frequency, thereby making both SCL and SDA signals bidirectional. Typical communication exchanges are punctuated with special start, stop, acknowledge (ACK), and no-knowledge (NACK) conditions. After a start condition, the host controller uses the next 7 bits to address a specific device along the I 2 C bus using a previously established device ID. The data signal is sampled on the falling-edge (transitioning from logic-high to logic-low) of the clock signal [9].

B. SPI Protocol
In contrast to I 2 C, SPI is a unidirectional protocol, illustrated in Fig. 3, that requires three signal lines: clock (SCK), serial data out (SDO), and serial data in (SDI), plus an additional chip select (CS) signal dedicated to each peripheral device communicating with the host controller [10]. An SPI bus requires 2 + n (where n is the number of devices) more signal lines than I 2 C, but is also capable of clock speeds exceeding 50 MHz. At the integrated circuit level, a "pushpull" transistor architecture, as illustrated in Fig. 3, is used to implement the SPI protocol. Additionally, it is necessary for the SDI line to exhibit tri-state behavior (logic-high, logic-low, and high-impedance states) to ensure each peripheral on the bus is able to properly drive the host's SDI input.
Unlike I 2 C, the SPI protocol is capable of various modes that define the clock polarity and phase on which to transmit data. The result is a single SPI bus able to service peripheral devices with a clock signal that idles at a logic-low condition alongside peripheral devices expecting a clock signal that idles at a logic-high condition. Peripheral addressing is performed using the CS signal and not a 7-bit address such as I 2 C.

C. Serial Bus Reliability
The ease of implementing and using an I 2 C or SPI bus comes at the cost of bus-wide fault tolerance. The open-drain and push-pull transistor architectures used in these serial protocols are inherently vulnerable to single-point failures capable of disabling the entire communication bus. For example, any disruption resulting in a data/clock signal shorting to ground or V DD will prevent the host and any additional peripheral devices from operating that signal line. Although these inherent trade-offs are tolerable for small device count systems, the risk becomes far greater for high device count or high reliability applications looking to utilize the vast ecosystem of sensors and devices reliant on I 2 C or SPI communication.
As the number of devices on a serial bus increases, so does the risk of bus-wide failure. The probability of I 2 C or SPI bus failure can be modeled as a series system where probability of failure is a product of the individual component probabilities: where P i is the probability that component i fails [11]. For systems comprised of components with the same probability of failure, system failure can be described as: where P is the probability of failure (same for all components) and n is the number of components in the system [11]. Consider an electronic system containing six temperature sensors relaying data to a host controller via a single I 2 C bus. If reliability analysis determines this sensor has a 10% failure rate during the designated lifetime of the system, then there is a 47% probability that at least one sensor within the system fails during the lifetime of the system. If the failure condition compromises either SDA or SCL pins, then there is a 47% likelihood of compromising the entire I 2 C bus. The likelihood of I 2 C or SPI bus failure is further increased for electronic systems operating in harsh environments (e.g. increased temperatures, thermal cycling, ionizing radiation). Any environmentally-induced degradation of semiconducting devices resulting in increased device failure rate has the potential to affect communication buses as well [12]. This is especially relevant for aerospace systems, where the cumulative degradation of microelectronics in ionizing radiation environments is well studied and known to cause n-type metaloxide-semiconductor field-effect transistors (n-MOSFET) to eventually fail "on" and p-MOSFET devices to fail "off" [13]. Therefore, it can be reasoned a radiation-induced failure of the n-MOSFET device inherent to the open-drain I 2 C architecture will cause the device to fail "on," thereby shorting the specific signal to ground and disable the remainder of the I 2 C bus. This reasoning holds for the SPI architecture as well, although the voltage of the idle logic condition (i.e. SPI mode) will likely play a role in the rate of radiation-induced degradation as a result of different charge transport behavior depending on the bias condition of the transistor [14].

III. ISOLATION CIRCUIT DESIGN AND THEORY OF OPERATION
An autonomous device isolation and bus protection scheme was developed separately for I 2 C and SPI communication buses in order to accommodate the different electrical behavior of the two protocols. Functional requirements for the isolation circuits were as follows: 1) Capable of detecting a failed peripheral device and electrically isolating its signals from the remainder of the bus 2) Comprised of minimal components that are more reliable than the isolation target 3) Require no additional I/O pins to operate 4) Minimal impact on bus performance 5) Minimal increase in power consumption Figure 4 illustrates the desired isolation behavior (idealized) in the context of a digital timing diagram for an I 2 C bus containing one host and at least two peripheral devices. As shown, a failure on Device 1 results in device-level shorts of SDA and SCL to ground. With isolation, the host is able to successfully continue operation with the remaining peripherals on the bus as depicted by the proper data and clock signals reaching Device 2.

A. I 2 C Isolation Design
A digital buffer circuit with timeout capability was developed for autonomous isolation of failed I 2 C devices sharing a common bus. Shown in Fig. 5 is the resulting schematic for a single I 2 C isolation block required for both SDA and SCL lines to be placed in series between the host controller and isolated peripheral device(s). Signal names with subscript "Bus" are all connected to the host controller, whereas signals with subscripts containing "Dev" connect individually to device(s) that the isolation circuit is responsible for monitoring. Upon powering the system, the I 2 C isolation circuit will begin charging C1, requiring about 1.5 ms (as configured in Fig. 5) before the bus is fully functional and can be used as normal. During standard operation, C1 will begin discharging each time the signal is pulled low. Under normal circumstances, the pulse length for the logic-low condition is too short to trigger the timeout function of the circuit. However, in the event of a failed peripheral device pulling its SDA or SCL line to ground, C1 will continue discharging (for about 1.5 ms) until the voltage at node "A" drops below the or SCLDev1 threshold of Q4 and turns it off. With Q4 off, R5 is able to pull up the voltage at node "B" which turns Q3 on. With Q3 on, the Q1/Q2 transistor pair is forced to remain in cutoff despite any subsequent activity on the bus, resulting in strong isolation of the failed device line from the remainder of the bus for as long as the line remains shorted to ground. If the isolated device is able to recover from its failed state, connection will automatically be restored with the bus after 1.5 ms. The timeout behavior of this isolation circuit can be approximated as a simple resistor-capacitor time constant dictated by the values of C1 and R6: where V A is the voltage at point "A" in volts, V S is the supply voltage, t is time in seconds, R is the resistance in ohms, and C is the capacitance in farads [15]. In the context of the where V A is the threshold voltage of Q4 (1.5 V), and we assume: V S is 3.3 V, there is adequate current sinking capability of the host and peripheral devices, and idealized transistor behavior.
The autonomous isolation capability comes at a cost of increased power consumption beyond that of a standard I 2 C bus. During normal operation at V DD = 3.3 V, the primary power draw is through Q4, which remains on, drawing V DD

R5
(about 33 µA). If the circuit enters isolation mode, Q4 is disabled and Q3 turns on, drawing V DD R2 (about 330 µA) for the duration of the failure. In practice, if a peripheral failure happens to occur during a transaction on the I 2 C bus, the host should wait at least 1.5 ms seconds before attempting the transaction again. Note this is not the exact behavior originally depicted in the Fig. 4 timing diagram. This compromise in response time in exchange for the autonomous isolation behavior was chosen to make the scheme generally applicable to a variety of applications and achieved by adjusting the value of C1 and R6 to be tolerant of transient anomalous behavior.
The capacitance and series resistance of every I 2 C bus implementation is inherently application specific. In some cases, it may be necessary to adjust the pull-up resistors (R1 and R3 in Fig. 5) to achieve the necessary rise times for proper I 2 C bus operation. A lower resistance value for R1 and R3, such as 4.7 kΩ or 3.3 kΩ, will produce faster rise times at the expense of increased power consumption.

B. I 2 C SPICE Simulations
The proposed I 2 C isolation circuit was simulated in LTspice [16] under various conditions of peripheral device failure. The simulated performance of an SCL signal before, during, and after inducing a failure of a peripheral device is illustrated in Fig. 6 for an I 2 C bus operating at 100 kHz containing one host (black curve) and one peripheral "Device 1" (red curve). As shown in Fig. 6, the simulation begins with an initial SCL signal burst ("A") driven by the host (SCL Bus ) which correctly propagates to Device 1 through the isolation block as represented by the black and red curves overlapping each other. Before the SCL burst is finished, a failure is induced ("B") in Device 1 by enabling a switch pulling SCL Dev1 to ground. As expected, the entire bus is pulled low for 1.35 ms until ("C") Device 1 isolation is triggered and the remainder of the bus returns to its logic-high idling state. A second SDA burst is then performed by the host on SCL Bus to demonstrate full functionality of the bus during peripheral failure. Next, the induced failure of Device 1 is removed ("D"), allowing SCL Dev1 to return to idling at a logic-high state. Finally, the full functionality of the bus is verified by performing another SCL burst ("E"), which shows both signals behaving properly.

C. SPI Isolation Design
Fault isolation for an SPI bus was achieved with less complexity than I 2 C due to the inherent unidirectional nature of the SPI protocol. Series resistors positioned near the peripheral device are sufficient for autonomous isolation of SCK and SDO signals in the event of a failed peripheral, but special care is needed for SDI to ensure tri-state behavior is preserved for the remainder of the bus. A set of example SPI isolation circuits are shown in Fig. 7 using a single series resistor for SCK and SDO signals, and a PNP transistor controlled by the existing CS signal for SDI.
The SPI isolation scheme presented here can accommodate any combination of the four SPI modes. During normal operation, the series resistors (R1 and R2 in Fig. 7)  logic level of the input signal to the peripheral [10]. However, in the event of one or more failures affecting a peripheral's SCK or SDI signals, a voltage drop will form across the series resistor and become an effective means of isolation between the affected signals and the bus. This technique relies on the host controller to source/sink enough current to maintain the voltage drop across the resistor, and is therefore dependent on a series resistance sufficient to minimize current draw while preserving signal speed. A separate isolation scheme is necessary for SDI since it is driven by the peripheral, and therefore, of opposite directionality than SCK and SDO.
To prevent a failure from disrupting the necessary tri-state behavior of SDI, the existing (peripheral specific) CS signal is used to enable/disable a PNP transistor connecting SDI DEV1 with the remainder of SDI. In the event of a failure affecting a peripheral SDO, this isolation scheme prevents the device from dominating the bus when the host is communicating with other peripherals. These schemes provide effective isolation from both possible fault scenarios: signal short to ground, or signal short to V DD .
A consequence of relying on the device-specific CS signal for SDI isolation is the need for a separate SDI circuit for every device on the bus regardless of the bus topology. Additionally, depending on the size of the SPI bus, the performance requirements, individual driving power of each peripheral on the bus, and the transistors used in the isolation circuit, it may be necessary to adjust the series resistance to meet the power requirements of the host controller and and speed of the bus. Values of 1 kΩ to 10 kΩ are sufficient for R1 and R2 for buses with 24 devices or less operating below 1 MHz for the 10 kΩ case, and below 5 MHz for the 1 kΩ case. Finally, it is important to place the series resistors close to the peripheral device to minimize the amount of disruption (such as RC filtering or early termination) to the transmission line behavior of the signal.  mission success relied on operation of 45 I 2 C devices and 24 SPI devices across three 1U-sized CubeSats. The likelihood of at least partial mission success was significantly improved by applying the developed serial bus isolation techniques, but only approximations of the reliability improvements can be made without performing extensive accelerated aging studies for each device along the bus. To illustrate the reliability improvement from serial bus isolation, consider each of the 45 I 2 C devices used across the V-R3x spacecraft were assumed to have a 1% failure rate over the duration of the six year mission. Given this arbitrary failure rate, (2) shows the probability of serial bus failure (and therefore mission failure) compounds to over 36% without isolation, and approaches 100% as device failure rate approaches 16%. Whereas with serial bus isolation, the likelihood of total bus failure as a result of I 2 C device malfunction becomes negligible as compared to the reliability of other elements in the system (i.e. host controller, isolation components, etc...). Figure 9 illustrates top and bottom views of a PyCubed avionics board used in each of the three V-R3x flight units. The insets of Fig. 9 highlight the assembled isolation circuits which include an SPI bus containing five devices (top) and an I 2 C bus with three devices (bottom). Schematics, parts lists, and PCB layout examples for general application versions of these circuits are provided in Appendix A with the intention of enabling others to readily apply them to their own systems.
The specific components listed in Appendix A and used in the isolation circuits on-board V-R3x were carefully selected to maximize radiation tolerance using a "careful COTS" approach to avoid requiring costly and long lead-time radiationhardened parts [18]. The component selection thought-process is discussed below for the relevant LEO radiation environment typical of small spacecraft missions. However, many other commercially available components are also suitable for these circuits and future applications should leverage the discrete nature of the circuit design by choosing parts that best fit the size, availability, and environmental needs of their system. Discrete component radiation performance was considered when selecting parts for the I 2 C and SPI isolation circuits used in the V-R3x spacecraft. Commercial components were chosen with experimentally-determined radiation performance necessary for the isolation circuits to exceed our mission duration requirements for a small spacecraft orbiting Earth at an altitude of approximately 500 km. These parts will generally offer improved radiation performance over alternative COTS offerings. However, radiation-induced degradation rates are heavily dependent on application design, operation, and orbital environment.
An exhaustive component selection discussion based on radiation performance criteria is beyond the scope of this work; however, an overview of the selected transistors offers good insight into the process. Selections focused on the popular 2N2222 (NPN) and complementary 2N2907 (PNP) bipolar transistors which have been thoroughly studied by the radiation effects community [13], [19]. The MBT2222 (dual NPN) from ON Semiconductor and MMBT2907 (single PNP) from Diodes Incorporated were chosen based on total ionizing dose (TID) performance of at least 10 krad accumulated dose before parameters deviated beyond manufacturer specification [20], [21]. A dual N-channel MOSFET, BSS138, also manufactured by Diodes Incorporated, was chosen based on the demonstrated TID tolerances [22] and heavy ion performance [23] for the BSS1xx family of power MOSFETs.
Assembled isolation circuits were characterized during development to ensure the serial protocols remained functional with and without a failure present on the bus. Measurements were conducted using a 2-channel KeySight DSOX3012T oscilloscope with two 10:1 500 MHz probes. Figure 10 contains the resulting traces of a 400 kHz SCL signal of an I 2 C bus probed at the input (SCL Bus , black) and output (SCL Dev1 , red) of the implemented isolation circuit. The marked regions within Fig. 10 identify a series of activities performed during an evaluation of the I 2 C bus before, during, and after an induced failure at SCL Dev1 . The exercise begins with a full 128-address bus scan ("A," in Fig. 10), which can be seen successfully propagating through the isolation circuit. Shortly after completing the bus scan, a failure of Device 1 is induced ("B") using an external switch to manually pull SCL Dev1 low. As seen in Fig. 10, both signals remain low for 1.24 ms until isolation is automatically triggered and SCL Bus properly recovers ("C"). Next, a second scan is successfully performed along the bus ("D") despite SCL Dev1 remaining pulled low. Finally, the induced failure of Device 1 is removed ("E"), to demonstrate the isolation circuit returning to normal operation in the event of a failed peripheral device recovering. The rise times for this specific implementation of an I 2 C bus containing four total devices (one host, three peripherals), each with individual isolation circuits, were measured to be 230 ns for SCL Bus and 110 ns for SCL DEV . The measured bus performance is well within the I 2 C-bus specification maximum rise time requirement of 300 ns for Fast-Mode [8].

V. CONCLUSION
Two isolation schemes were developed for I 2 C and SPI communication buses effective at preventing bus-wide failure in the event of peripheral device malfunction without requiring additional I/O or processing overhead. Isolation designs were simulated in LTspice before implementation and characterization of hardware which verified I 2 C operation up to 400 kHz (Fast-mode) and SPI speeds of 5 MHz before, during, and after inducing device failure on the bus. Reusable design blocks are provided, and the successful application of the design blocks is shown for V-R3x, a three-spacecraft mission successfully deployed in January 2021. Serial bus isolation was found to significantly reduce the likelihood of system failure for the V-R3x mission. The developed isolation schemes provide aerospace applications and many other electronic systems a means of significantly improving system reliability using readily available commercial components.

A. Reusable Design Blocks
Complete isolation circuits for groups of one or more peripheral devices are schematically shown and provided with an example 2-layer PCB layout in  Table I for I 2 C and Table II