Power Management to Meet Thermal Safe Power in Fault-Tolerant Embedded Systems

—Thermal Design Power (TDP) as the chip-level power constraint for a specific chip has been exploited in fault-tolerant embedded systems. TDP, as the chip-level power constraint of the system, could be either pessimistic or thermally unsafe. Employing TDP as a pessimistic constraint can increase the rate of missing real-time constraints because of triggering Dynamic Thermal Management (DTM) more frequently. If TDP as a chip-level power constraint is not a pessimistic constraint, TDP can be thermally unsafe and can lead to thermal violations. Employing Thermal Safe Power (TSP) as the core-level power constraint, which is defined as a function of the number of simultaneously operating cores, can result in improving the efficiency and the schedulability. This comment improves the efficiency and the schedulability rate of one of the proposed methods in the literature by employing TSP.


OVERVIEW
HERMAL Design Power (TDP) is introduced as the chip-level power constraint by chip manufacturers for a specific chip [1]. TDP, as the chip-level power constraint of the system, could be either pessimistic or thermally unsafe. When TDP is a pessimistic constraint, meeting it increases the rate of missing real-time constraints. This is because Dynamic Thermal Management (DTM) is triggered more frequently [2]. Otherwise, TDP is thermally unsafe and can lead to thermal violations. Consequently, in this paper, we employ Thermal Safe Power (TSP) instead of TDP. TSP is an abstraction that provides thermally safe power constraint in the core level as a function of the number of simultaneously operating cores [2]. Meeting the TSP constraint ensures that DTM is not triggered and guarantees to prevent thermal violations [2][3] [6].
In addition to meeting power constraint of the system, real-time requirements must be considered in multicore real-time embedded systems [4][5] [6]. The real-time requirement refers to meeting the task deadlines. Moreover, the peak-power-aware real-time embedded systems should have high reliability features in the presence of different types of faults. Consequently, employing a fault-tolerant technique for such systems is mandatory [1][3] [5]. Some fault-tolerant techniques like Checkpointing and task re-execution can just tolerate transient faults, but the N-Modular Redundancy (NMR) technique is a common fault-tolerant approach to deal with both transient and permanent faults. However, exploiting the NMR technique incurs power, time, and temperature overhead. Since in realtime embedded systems, meeting deadlines is one of the requirements, using the NMR technique for TDP-constrained systems in a straightforward way may not be a feasible solution. Therefore, the NMR technique should be exploited intelligently because otherwise, it imposes extra time and power overhead. Note that in real-time embedded systems, where the set of tasks is known in advanced, there is a need to have an efficient solution able to satisfy the timing constraints and reliability requirements. Therefore, we can observe from the literature of real-time embedded systems that there is a specific mapping/scheduling policy proposed for each task model. This paper proposes a Thermal-Aware N-Modular Redundancy technique (called TA-NMR) that exploits the TSP constraint instead of a fixed TDP constraint to improve the efficiency and the schedulability rate of the proposed method in [1] for multicore platforms.

MOTIVATIONAL EXAMPLE
This example shows the advantage of using TSP instead of TDP in fault-tolerant systems. We consider a 4-core system with TDP=5W. In this example, the system executes a task graph with five dependent tasks {T1, …, T5} with a shared deadline D=80ms. Fig. 1a represents dependencies between the tasks. Their worst-case execution time of the tasks at the maximum supply voltage and the maximum operational frequency is shown in the Table of Fig. 1b. The maximum power consumption of the tasks on the cores is shown in Fig. 1b. Fig. 1 presents two possible schedules to meet different power constraints, while the proposed method in [1] misses the timing constraint. Fig. 1c shows the scheduling of tasks by applying the proposed method in [1]. In Fig. 1d, we consider the Thermal Safe Power (TSP) constraint as the core-level power constraint [2]. Fig. 1d shows our proposed method such that it meets the TSP and timing constraints simultaneously.

THERMAL-AWARE N-MODULAR REDANDANCY TECHNIQUE (TA-NMR)
In this section, we introduce the proposed method that meets TSP to improve the efficiency and the schedulability rate. To do this, we replace the following equation with the equation 12 and 13 of [1]. Indeed, we add meeting the TSP constraint to the constraints of the problem definition in [1]. Therefore, the power consumption of each underlying core at each time slot t should be less than the core-level power constraint, i.e., TSP. ∀ : ! ( ) ≤ "#$,! (# ) at each time t (1) where ! ( ) is the power consumption of core k at time t. Then, we apply meeting TSP in the algorithm of [1]. To do this, instead of meeting TDP, we consider TSP values for each configuration of cores.

EVALUATION
In this section, the effectiveness of TA-NMR is evaluated. We used a tool chain of [1], HotSpot [7], QUILT [8], and TSP [2]. We ran real-life applications of [1]. Fig. 2 shows the schedulability of the two mentioned methods. As seen in this figure, the schedulability of TA-NMR is 100%, while TP3M [1] meets the deadlines on average by 93.2%.

CONCLUSIONS
This comment improves the efficiency and the schedulability rate compared to one of the proposed methods in the literature when we employ TSP instead of TDP. We have evaluated our modified method under various system configurations and workloads. Our experiments show that schedulability of the TA-NMR technique is higher than the proposed method in the literature in realistic scenarios.