Goal-Oriented Data-Driven Control for a Holistic Thermal Management System of an Electric Vehicle

This work presents a goal-oriented data-driven control using Bayesian Optimization (BO) to train an MPC in a cascade control scheme. Unlike most works focusing on control parameters, this work focuses on system matrices. The control training scheme is used to develop a controller for a complicated holistic thermal management system (TMS) that provides cooling and lubrication for the electric and mechanical components in one circuit. The control goal is to reduce the total energy consumption instead of reference tracking. Compared to a basic controller, the goal-oriented trained controller reduces the energy cost by up to 1.8% in simulation studies. Thanks to the reduction of the operating temperature, the thermal lifetime of one EM is extended by 5.31 times and that of the other EM by 5.38 times. The TMS can be theoretically analysed so that the systems matrices can be restricted in certain search spaces during BO to avoid contradiction to the physics of the real system. In contrary to the literature, the restriction neither speeds up BO nor achieves a better control goal. Therefore, goal-oriented control with restrictions based on prior knowledge does not guarantee better control for similar systems. Note to Practitioners—A holistic TMS is a complex system to be identified or controlled. Goal-oriented data-driven control makes it possible to design a high-performance controller for it and complex systems alike with less complication. Important to practitioners, the resulting controller reduces energy consumption and extend thermal lifetime of the EM at the same time. Simulation results also show that restricting the search space of BO based on theoretical analysis does not guarantee a controller with better control. It’s, therefore, unnecessary for practitioners to spend extensive effort to analyse the system and provide such restrictions.


I. INTRODUCTION
T HERMAL management systems (TMS) and their controls are essential for the reliability and the efficiency of the battery, the electric motor (EM) and the power electronic (PE) in an electric vehicle [1].Most researches focused on the TMS of batteries but not on that of powertrains [2], [3].The authors are with the Institute for Mechatronic Systems, Technical University of Darmstadt, 64287 Darmstadt, Germany (e-mail: yikao.tao@tu-darmstadt.de;jason931014@163.com;gao@ims.tu-darmstadt.de;liu@ims.tu-darmstadt.de;rinderknecht@ims.tu-darmstadt.de).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TASE.2023.3304521.
Digital Object Identifier 10.1109/TASE.2023.3304521Moreover, the transmission fluid circuit was often neglected, though regulating the lubricant temperature is essential for transmission efficiency and against the degradation of the lubricant itself [4].The TMS in this paper is a holistic system for EM, PE and transmission.It was developed for a pioneer project "Speed4E", which introduced a two-motor multi-speed electric powertrain [5].Fig. 1(a) shows the powertrain assembly, in which two EMs, controlled by two PEs, propel the vehicle through two sub-transmissions (ST).It has improved the power density and the efficiency, especially in partial loads (see Appendix I).A water-containing fluid [6] developed by company FUCHS SCHMIERSTOFFE GMBH provides cooling for the PEs and EMs and lubrication for the STs.Fig. 1(b) illustrates the holistic TMS, which is further elaborated in Section III-A.
Model-based methods were commonly used to develop advanced thermal management controllers for the purpose of temperature tracking: A Lyapunov-based nonlinear controller was developed in [7] and [8] to regulate the cooling system of an internal combustion engine.For a multiple loop TMS in [9] and [3], several model-based controllers were developed to regulate the temperatures of the engine, the EM and the battery.
System identification is crucial for Model-based controllers.Often, the full physics of the system can be too complicated or inaccessible to be accurately modelled by a fully parametric physics based model [10].On the other hand a fully data-driven model can be considered uninterpretable [11].Nevertheless, an identified model providing the least output prediction errors may not be the best model for control [12].Since the control is the goal of modelling, why not to train a model-based controller w.r.t. the control goal, i.e. to develop a controller with a goal-oriented data-driven approach?Bayesian Optimization (BO) is competent for such task, thanks to its ability to perform gradient-free efficient searching [13].Goal-oriented control concept was applied to tune control parameters in recent years: the cost matrices of a linearquadratic-regulator were trained by BO in [14] to regulate a quadrotor; Parameters of model predictive control (MPC) were tuned in [15] and [16] by BO, considering estimated noise in objective and minimum worst case, respectively.However, model parameters are still necessary to be identified.Applying goal-oriented control concept on model parameters is rare to see.Besides, the above-mentioned works only reported the results of the systems with low dimension, which also makes it easier to identify the model parameters.
This work proposes a goal-oriented MPC as the outer controller in a cascade control scheme to regulate the holistic TMS.In particular, different from the existing works of goaloriented control, the system matrices of the MPC were trained by BO.Similar to the advantage of a grey-box model [17], the structure of the state-space model provides the possibility to interpret the trained model.Naturally, prior knowledge based on the theoretical analysis of the plant can be applied to the goal-oriented control with the intention to rule out the search space that contradicts physics of the plant [12].This work showcases one approach of such idea and evaluates its effect, which were both never discussed in existing works.Instead of tracking reference values as in the existing works of thermal management, the MPC regulates the operating temperatures of components, so that the total energy consumption considering both the powertrain and the TMS is optimized.This work investigates the effect of the controller on thermal lifetime of the EMs, which was never discussed in the context of thermal management.To the best knowledge of the authors, a holistic TMS providing cooling and lubricant in one circuit is a novel use and a simulative showcase of its control is unseen.
This paper is organized as follows: Chapter II formulates the problem.Background knowledge is presented in Chapter III.In Chapter IV, the structures of BO and simulation are given.The model of each component in the TMS is discussed with a focus on the power loss.MPC controller and BO are presented in detail.Theoretical analysis of the model in MPC is conducted, which provides restrictions on search space of BO.In Chapter V, the results on two scenarios are presented.The performance of the trained controller is compared to a basic controller.It is also compared to the controller trained with the restricted search space.A conclusion is presented in Chapter VI.

II. PROBLEM FORMULATION
Consider an unknown, discrete-time, nonlinear system where k = 0, 1, 2, . . ., N − 1; x k ϵR n x and u k ϵR n u denote the state and the input at time k respectively.The input is determined by a feedback control law C by minimizing the cost value J ∈ R defined by the cost function J with given initial state x 0 , i.e. min subject to (1).X N 0 = (x 0 , x 1 , x 2 . . ., x N ) denotes a sequence of the state, while U N −1 0 is similarly defined.The dynamics in (1) describe the holistic TMS shown in Fig. 1(b).Solving this problem is equivalent to achieving the control goal, as the cost function calculates the total energy consumption of the powertrain and the TMS (see Section IV-E1).
To solve (3), a Linear Time Invariant (LTI) MPC parameterized by a design vector θ determines the control law as The design vector is trained by BO so as to solve min θ ∈

J
(5.a) subject to (1) and ( 4), where is the design space; J θ : R n θ → R maps the design vector to the cost.J θ is unknown, while the cost value can be obtained by (3.b).
An LTI model is chosen to reduce the dimension of the design vector.For regulation, a controller with linearized models was found often sufficient to control a nonlinear system [18].

III. BACKGROUND A. Holistic Thermal Management System
Fig. 1(b) depicts the holistic TMS.The total flowrate is distributed by a 3-way valve (V1) to both PEs.The fluids in the two paths subsequently flow through the cooling jackets of the respective EM.A common-rail gathers and distributes the fluid to the injection paths for injection lubrication and the rotor shafts of both EMs, which are connected to the input shafts of the STs [19].Dip lubrication, additional to injection lubrication, takes place, where churning causes foaming.Two dry sump pumps (DSP) pump the foamed fluid into the surge tank, where a degassing module settles the fluid to prevent air bubbles from entering the circuit.The primary task of both DSPs is to keep the fluid in the transmission oil sump at a low level, so that the power loss due to dip lubrication is minimized [19].Two centrifugal pumps (CP) are placed upstream of the radiator and the V1 respectively.The fluid temperature is measured at different positions by the temperature sensors (TS).A differential pressure sensor is used to measure the pressure drop in the radiator to calculate the flowrate.The fluid level in the surge tank is measured and should not exceed its limit.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. MPC
MPC is a feedback control algorithm that is suitable to deal with multi-input multi-output systems with constraints [20].The state-space model in an LTI MPC is given by in which y k ϵR n y is the system output, k is the discrete time step and t is the time interval.The state transition matrix AϵR n x ×n x , the input matrix BϵR n x ×n u and the output matrix CϵR n y ×n x are time invariant.The system is subject to a set of linear inequalities (7), where 1 = (1, 1, . . ., 1) T ∈ R n con and n con is the amount of the constraints.The matrices FϵR n con ×n x and GϵR n con ×n u are chosen to describe the constraints such as x min ≤ x ≤ x max and u min ≤ u ≤ u max elementwise.
At time step k, the input is determined by solving min subject to ( 6) and (7).Here, ∥v∥ 2 S denotes the quadratic form v T Sv for any v ∈ R n v and symmetric S ∈ R n v × R n v .The positive semi-definite matrices Q y ϵR n y ×n y and Q u ϵR n u ×n u are weighting matrices.k + k p | k denotes the predicted value at k p -th time step in the prediction horizon N p given current time k.y ref and u ref denote the reference output and control.

C. BO With a Nonparametric Regression Model
Since the mapping J θ in (5.b) is unknown, BO is performed with a nonparametric regression model to solve (II).This approach consists of learning stage and optimization stage.In the learning stage, Gaussian Process (GP) is used to approximate the distribution p(J = J θ (θ) | θ).In the optimization stage, exploitation and exploration are balanced by maximizing an acquisition function [13], which considers not only the prediction (the expectation) but also the uncertainty (the variance) of the GP regression.These two stages are performed iteratively.
1) Learning Stage With GP: The observations J , in our context the cost in (3.b), are assumed to be random variables, and the distribution of any finite number of observations jointly Gaussian.One simulation or experiment generates a pair (θ, J ). N D pairs of it form a dataset: D = {(θ 1 , J 1 ), (θ 2 , J 2 ), . . ., (θ N D , J N D )}.The joint distribution of the observations can be formulated as where θ * is the test point -the new design vector in our context-and Ĵ * is the predicted observation of θ * .µϵR N D +1 is the expectation and ϵR (N D +1)×(N D +1) is the covariance matrix, which describes the correlation between observations.Squared exponential kernel is used to form where k i j is the i-th row and j-th column element of matrix K ϵR N D ×N D .The vector Derived from ( 9) and ( 10), the conditional probability of p Ĵ * |θ * , D , i.e. the posterior distribution, is given by (11.c) 10 to 50, depending on the problem [13], of randomized simulations or experiments form an initial data set D ini .
2) Optimization Stage: Given the posterior distribution in (11), The expected improvement (EI) acquisition function is maximized over θ to suggest the next design vector [21].It is given by where J − is the smallest observation so far, i.e.J − = inf J 1 , J 2 , . . ., J N D .(12.a) can be analytically evaluated as under the GP framework, where and ψ are the cumulative density function and the probability density function of a standard normal distribution respectively [12].
The suggested design vector and the corresponding observation are added into data set D. With an adequate number of iterations, the minimum value of the function J θ can be found.

D. Thermal Network
Thermodynamics of each component is modelled by thermal network (TN) modelling method.A thermal system is divided into a number of sub-volumes called nodes.The thermal properties of each node are considered concentrated at their central nodal point.Each node represents two TN elements, a temperature T and a capacity C (thermal mass), as shown in Fig. 2. The thermal dynamics of a node with finite capacity is given by where Q is the rate of heat generation; Qin is the rate of heat transfer into the node; t is the time over which the heat is flowing; m is the mass and c p is the specific heat capacity.
Nodes are connected to each other with thermal resistance R.
For instance, node 2 in Fig. 2 can be modelled as where h is the heat transfer coefficient; A is the surface area.
IV. GOAL-ORIENTED CONTROL A. System Structure Fig. 3 illustrates the structure of the goal-oriented control training.The whole system is developed in Matlab/Simulink software with a TMS model from the software Cruise-M.
The plant S consists of the TMS model and the vehicle longitudinal dynamics (VLD) model.The VLD model, see Appendix I, determines the required output power to follow a Markov-chain-based stochastic synthesized cycle based on the Common Artemis Driving Cycles (CADC).Synthesized driving cycles maintain the original stochastic characteristics and can be extended according to requirements [23].A predictive energy management strategy (EMS) from a preliminary work [5] determines the gear selection of ST2 and the torques of both EMs based on the required output power and the current speed.These variables are contained in "measurements vehicle" and fed into the TMS model, where the power losses of all components, the thermodynamics of the powertrain and the TMS are modelled.The "measurements TMS" contains the temperatures of both EMs and PEs as well as the signals mentioned in Section III-A.
The cascade control scheme consists of an outer controller, which is an MPC controller based on Section III-B parameterized by a design vector θ, and an inner controller, consisting of several PI controllers for the actuators (see Section III-A).The estimator provides estimations of several unmeasured signals.
The MPC is evaluated with the cost function J. Its value J is stored together with θ as a pair in the data set D. BO suggests θ * as the design vector for the next simulation.

B. Modelling of the TMS
The TMS model, whose topology follows Fig. 1(b), was reported in [24].The modelling of the pipes is omitted here, but the TN model of each component is introduced with a focus on the temperature-dependent power losses.Additionally, the estimation of the EM thermal lifetime is presented.
In each TN model, the mass and surface come from the CAD of each component.Specific heat capacities are determined by materials.The heat transfer coefficient between the TMS and each component is determined by the CFD-simulation results from the project partners mentioned later respectively.
1) Thermal Model of PE: Both PEs are modelled with a lumped-element approach according to ( 13) and ( 14), in which the power loss is the heat generation rate.The power loss is modelled as look-up table based on the simulation results with an operating temperature of 65 • C from the project partner Lenze SE with the speed and the torque as inputs.According to the manufacturer, the power loss (P L) doubles itself every 60 • C as 2) Thermal Model of EM: Fig. 4 shows the structures of the TN models of EM1 (an induction motor), and EM2 (a permanent magnetic motor).The temperature distribution is assumed to be rotational symmetric.The TN model does not discretize in the axial direction to further reduce the model complexity [25].The capacities along the axial direction concentrate at a single point.The TN model considers copper losses, core losses and mechanical losses as the rate of the heat generations [26], which are listed in Table I.The power loss of each part is modelled as a look-up table based on the simulation at 120 • C from the project partner Institute for Drive Systems and Power Electronics, Leibniz University Hannover.The copper loss grows linearly with the temperature as where temperature coefficient α is 0.00393 for copper.The copper loss of an EM reduces by 3.93% with a temperature drop of 10 • C. The core loss and the mechanical loss are assumed not dependent on the temperature.
3) Estimation of the Thermal Lifetime of EM: Insulation protects windings against thermal, mechanical and electrical stresses.The high temperature environ determining its aging lifetime, which is considered as the thermal lifetime of EMs [27].Both EMs use class F insulation systems, whose aging lifetime can almost double itself with a temperature drop of 10 • C (see Fig. 5).
To estimate the thermal lifetime under various temperatures, it is assumed that the aging incurred at different temperatures Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I LOSSES IN THE THERMAL MODELS OF THE EM
are linearly additive [29], which is given by where t T i denotes the duration of an operating temperature T i and the L(T i ) is the thermal lifetime at T i determined by Fig. 5.

4) Thermal Model of ST:
The ST1 is a planetary gearbox, whose TN model is beyond the scope of the system level modelling of this work.It is modelled with a lumped-element approach according to (8) and (9), in which the total power loss of ST1 is the heat generation rate.The project partner Gear Research Center (FZG), Technical University Munich provides the simulated power losses at different lubricant temperatures, which are modelled as several look-up tables.The surface area and the heat exchange coefficient are determined by the CFD simulation [19].
ST2 is a three-stage two-speed spur gearbox, which is modelled with a light-grey-box approach (see Appendix II).
Fig. 6 shows the simulated power losses of both STs with different lubricant temperatures.Between -15 Nm and 15 Nm, the intervals of the simulations are smaller.In this range, ST1 operates with a scarcely higher power loss, when the lubricant temperature drops from 70 • C to 50 • C. Outside of that range, the power loss of ST1 decreases, when the lubricant is cooled down.In the whole torque range, ST2 operates with a higher power loss, when the lubricant temperature drops.
5) Surge Tank, Radiator, Pumps and Valves: The surge tank is modelled as a simple volume according to its design.The radiator is modelled as a 3-D lookup table (input: temperature difference of air and coolant, coolant mass flowrate and air mass flowrate.Output: rate of heat flow) with data from the project partner BMW Group.Air mass flowrate is determined by the air temperature and the vehicle speed.
The CPs and DSPs are modelled according to the characteristic curve (input: mass flowrate; rotational speed.Output: pressure increase) provided by suppliers BÖHLER-UDDEHOLM Deutschland GmbH and Pierburg Pump Technology GmbH.Their efficiencies are assumed to be 50%.
The 3-way valve is modelled as a perpendicular T-shape intersection with two valves controlling the flowrates according to the given signal.Its pressure loss is given by [30].

C. MPC Controller in the Cascade Control Scheme
The MPC controller provides the reference values for the inner controller to regulate both CPs and the valves.
1) Formulation of the State-Space Model: The state vector x = [T 2 , T 3 , T 4 , T E M1 , T E M2 , T P E1 , T P E2 ] T consists of the temperatures of the fluid and the components.T i denotes the fluid temperature measured by the sensor TS i .The input u consists of: a) the manipulated variables (MV) controlling the actuators which are speed n and torque Tq of both EMs and vehicle speed v.In MV, V represents the total flowrate, which will be converted to n C P (speed of CP) in the inner controller; ϕ V 1 denotes the position of V1; r V 2 and r V 3 denote the bypass ratios of V2 and V3.The power consumption of the powertrain, including the mechanical output power and the power losses of all components, should be included in the output y, as it is part of the total power consumption that is to be minimized.The output power can be ignored, since it is not influenced by the TMS.Therefore, only the total power loss of the powertrain, i.e.
is included in the output y = x T P L pt T .
Before the formulation of the dynamics, all variables are normalized so as to maintain the elements in the system matrices under similar order of magnitude.The normalization and its inverse are given by x, y and u denote elementwise normalized state, output and input (see Appendix III for z max and z min of each variable).The dynamics after the normalization follow where t is set to 0.5 s; I 7 is an 7 × 7 identity matrix and c 8 * represents the 8-th row of matrix C.
2) Optimization: The MV are determined by solving min subject to (21), and where Q y and Q u 1 are weighting scalars, since only one element from y and u 1 is considered, respectively.A linear transformation F 1 constrains the temperatures in constrains the temperatures in x k in their respective operational ranges.A linear transformation F 2 constrains each variable in u 1 in [0,1].The second part of (22.a) indicates the power consumption of the pumps, when Q u 1 is properly trained by BO, given the linear correlation between the power consumption of pumps and the flowrate [31].Before entering the inner controller, u 1 is converted to u 1 by (20.b).
3) Theoretical Analysis of the TMS: By analysing the thermodynamics of the TMS, the correlation between the leftand the right-hand sides of (21.a) can be obtained, which determines the signs of the elements in A, B and C.
Move t in (21.a) to the denominator of the left-hand side and obtain a vector of normalized rate of temperature change on a discretized time space x k+1 −x k t .By the first law of thermodynamics and the heat transfer theory, consider transfer theory, consider constant specific heat capacity, the rate of temperature change dT dt of a mass is positively correlated to its rate of heat generation, the heat transfer coefficient and the temperature of the other object, while it is negatively correlated to the temperature of the mass.Ergo, the elements of the matrices in (21.a) can be restricted as in (23), where the signs indicate positive or negative elements.A star indicates unclear sign.For instance, consider the first row of ( 23) and (21.a): the changing rate of T 2 increases/decreases, if any of T 2 , V and v decreases/increases; it's not clear, in the case of φV 1 , r V 2 and r V 3 ; the changing rate of T 2 increases/decreases, if any of the rest increases/decreases.
The deduction for matrix A: the diagonal elements are negative, since they correspond to the temperature of the mass.The rest are positive, since they correspond to the temperature of the other object.The deduction for matrix B 1 : u 1 manipulates the flowrate in the TMS, which is positively correlated to heat transfer coefficient.The correlation is positive if the local flowrate increases as the variable in u 1 .Vice versa.Unsure cases leave the entry unrestricted.The deduction for matrix B 2 : the speed and torque of the EM are positively correlated to the rate of heat generation.The vehicle speed is positively correlated to the heat flow rate of the radiator and therefore negatively correlated to the rate of temperature change.
Based on the temperature dependency of the power losses discussed in Sections IV-B2, III and IV, the last element of the output in (21.b) is The first two elements of c 8 * are zero, since the fluid temperatures measured from the sensors TS2 and TS3 are not directly influencing the power loss.
Remark: The analysis in this subsection is based on thermodynamics and heat transfer.The resulting positive and negative correlations between the rate of temperature change and the variables remain true for the real system with nonlinearity.

D. Inner Controller and Estimator
An inner controller regulates the actuators with separate PI controllers, as illustrated in Fig. 7: in (a) and (b), V and ϕ V 1 from u 1 are the reference values to regulate the target n CP,target and ϕ V1,target with the help of measured total flowrate V act and the actual position of V2 reported by its controller; In (c), the reference bypass flowrate V V2,ref is calculated based on the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ratio r V 2 from u 1 .The target angular position is regulated with the help of the estimated bypass flowrate Ṽ V2 .The control of V3 is same as V2; In (d), the speeds of both DSPs are regulated, so that h Trans follows a constant reference value [19].All PI controllers are parametrized manually, so that all actuators respond to the target values fast enough but without oscillation.
In the estimator block, a nonlinear Kalman-filter estimates h Trans based on the measured fluid level of surge tank h Tank , the flowrate into the transmission and the geometry of the oil sump.The flowrates through V2 and V3 are estimated by a pipe network analysis model [32].
E. Bayesian Optimization 1) Design Vector and Cost Function: The design vector θ consists of the elements from matrices A, B, the last row of matrix C, length of prediction horizon and two weighting scalers as where B = [B 1 B 2 ] and b i * is the i-th row of B. The search space for each element from the system matrices is [−2 2].The search space for N p is set to {1, 2, 3, . . ., 50} and that of the weighting scalers are [0, 10].The cost function of the BO is formulated considering two aspects: a) the total energy consumption of the powertrain and the TMS; b) the penalty to constrain the TMS and the components in their operational ranges.
The total energy consumption is given by where t denotes the length of a time step and i ∈ {1, 2, . . ., N sim } is the time step in a simulation with N sim time steps.The valves are assumed to consume constant power and therefore omitted from the optimization.The total power of the powertrain consists of the mechanical output power and the power losses.The output power can be omitted from the cost function for optimization, since it is not influenced by the TMS.Therefore, the energy cost, i.e. the energy consumption aspect in the cost function, is formulated as the total power losses of all components and the power consumption of the pumps as Other than the temperature constraints on x, the fluid levels, i.e. h T ank and h T rans , should also remain in a certain range.The penalty is given by where g evaluates the over-and undershoot of its input w exceeding its maximum and minimum respectively by their average value over the simulation; A log-function helps to smoothen the value while maintaining its monotonic properties, which makes the sampling process in BO more efficient [18].
The total cost is given by where a log-function for the energy cost is also adopted analogous to the penalty; the coefficients α P = 0.1, α h = 1000 and α T = 10 are chosen according to the order of magnitude of the respective variables and adjusted after several trials.
2) Restricting the Search Space: Search spaces of the entries in θ can be restricted based on the analysis from Section IV-C3 with (0, 2] for positive entries and [-2, 0) for negative entries.The performance of the MPC controller trained with restricted and unrestricted space are compared in Section V-C.

V. SIMULATION RESULTS
This chapter discusses the simulation results of the goal-oriented MPC with and without restricted search space.

A. Basic Controller
A basic controller developed in the preliminary work [32] serves as a reference.Its primary goal is to ensure the TMS and the components in the powertrain in their operational ranges, while the pumps should not consume excessive energy.The whole simulation workflow in Fig. 3 is taken over without BO block and the basic controller replaces the MPC.
The basic controller determines the reference total flowrate based on the temperature limits of all components and a  In order to keep any of the components under its temperature limit, the maximum operational temperature T max is offset by a buffer value T bu f f er .All individual required flowrates enter the sum-up logic in Fig. 8(b).A PE and its respective EM are located on the same path.Therefore, a bigger value of their required flowrates is chosen.The sum of both paths is the reference total flowrate.Besides, the ratio between the flowrates in two paths is used to determine V1 angular position.Overall, the basic controller demands minimum pump power, as long as the temperatures of all components are not close to their limits.

B. Goal-Oriented MPC
The MPC is trained with a synthetic driving cycle based on CADC following Section III-C with 10 initial simulations of randomized design vectors.Fig. 9 illustrates the iterations of BO without restricted search spaces, in which the black stars mark the total cost of each iteration, the red line the lowest total cost until current iteration, and the green square the lowest total cost of 700 iterations.The BO returns a total cost of 15 if the design vector forms an inadmissible MPC.As BO proposes every iteration according to the acquisition function, the total cost does not necessarily converge.It is also partly due to the high dimensionality of the design vector.
The trained MPC is tested on a testing driving cycle generated by the same Markov-chain model.Fig. 10 shows the vehicle speed.The initial temperature of all components is set to 70 • C, and that of the fluid to 50 • C. Fig. 11 shows the energy cost (see Section IV-E-I) of the basic controller and the deviation of the trained MPC from it.A negative deviation means a reduction of the energy cost brought by MPC.Controlled by the basic controller, the energy cost is 4.11 kWh (8.21 kWh/ 100 km), while the MPC controller reduces the value by 0.07 kWh (1.8 %).Furthermore, the change of operating temperatures (see Fig. 12(b)) extends the thermal lifetime of EM1 by 5.31 times (from 3.04 × 10 8 hours to 1.94 × 10 9 hours) and that of EM2 by 5.38 times (from 5.43 × 10 6 hours to 3.43 × 10 7 hours) based on the estimation model from Section IV-B-III.
A higher total flowrate is demanded by the goal-oriented MPC, as shown in Fig. 13.Fig. 14 shows that it distributes higher proportion of the flowrate to PE2 and EM2, since these two components provide more power, commanded by the EMS, and hence operating with higher temperatures (red curves with circles are higher than the ones with squares in Fig. 12(a) and (b)).As the flowrate in the radiator increases, the fluid temperature upstream of V1 as well as the fluid in the oil sump are lower compared to the case with basic controller (see Fig. 12(c)).The Lower fluid temperature and the higher flowrate through both PEs and both EMs reduce their temperature compared to the case with basic controller.
The areas in Fig. 15 show the energy cost deviation caused by each component.Both CPs demand more energy to provide a higher flowrate.The reduction of the lubricant temperature in the oil sump causes a higher power loss of ST1 (blue area) and a lower power loss of ST2 (light green area).Overall, the reduction of the lubricant temperature raises the energy cost in both STs.The deviation caused by both DSPs is neglectable.The deviations of both EMs and both PEs are negative, since the operating temperatures are reduced.

C. Comparison Between MPC Controllers Trained With and Without Restricted Search Space
A second MPC controller is trained by BO under same conditions as in previous subsection but with restricted search space on the elements in the system matrices (see Section IV-E2.The iterations are presented in Fig. 16.The total cost Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 9 occurs at the 353-th iteration with a value of 6.72.With restricted search space, the total costs of all iterations condense in a smaller value range Fig. 9.However, it does not reduce total cost or speed up BO in this study.
Fig. 17 compares the energy cost between the two BO trained controllers.The TMS regulated by the restrictedly trained MPC causes a higher energy cost with a deviation of 1.1 × 10 −2 kWh.

D. Results in Summary
To be verified in more scenario, the goal-oriented MPC is additionally trained with the same conditions as the previous subsection on a synthetic driving cycle based on Worldwide Harmonized Light Vehicles Test Cycle (WLTC), which is    II.
Since the average speed of the WLTC synthetic driving cycle is less than that of the CADC, the energy cost in WLTC synthetic driving cycle is smaller.Compared to the basic controller the unrestrictedly trained MPC reduces the energy cost by 0.04 kWh (1%) in the test of WLTC synthetic driving cycle.The operating temperatures of EM1 and EM2 are reduced, so that their estimated thermal lifetimes are extended by 4.68 times and 7.41 times, respectively.Similar to the case in CADC synthetic driving cycle, the restrictedly trained MPC does not reduce the total cost and energy cost.

VI. CONCLUSION AND FUTURE WORK
This work presented a BO based goal-oriented data-driven MPC training scheme, particularly with focus on the system matrices of MPC.The goal-oriented MPC serves as the outer controller in a cascade control scheme for the holistic thermal management of a two-drive electric vehicle to reduce the total energy consumption considering both the powertrain and the TMS.To ensure safe operation, penalty is introduced in the cost function of BO to prevent the operational conditions of the powertrain and the TMS outside of their operational ranges.
The training scheme is trained and tested in two synthetic driving cycles.Compared to a basic controller, the goal-oriented MPC has reduced the energy cost by increasing the total flowrate and also regulating the local flowrates.Meanwhile, the thermal lifetime of EM1 and EM2 are extended drastically.
Based on the knowledge of Thermodynamics and Heat Transfer, the signs of some elements in the system matrices can be determined, which can be used as the restrictions of the search space of BO.A second MPC controller with such restrictions was trained.However, it has neither reduced cost function measuring the control goal nor sped up the optimization, as suggested in literatures.Therefore, no restriction on the search space of BO is recommended for similar systems.
In the follow-up project of Speed4E, the goal-oriented controller training scheme is to be tested on the test bench.APPENDIX A Fig. 19 shows the topology of the Speed4E powertrain.The vehicle dynamics are calculated as in P r eq = e i ma + c r oll mg + 1 2 c aer o ρ air A f v 2 veh v, where P r eq is the requested power;e i is the mass factor, which is an effect of rotating components in the powertrain; m is the total mass of the vehicle; a is the vehicle acceleration; rolling resistance is calculated according to coefficient c r oll and gravity force mg; aerodynamic resistance is calculated according to coefficient c aer o , density of the air ρ air , frontal aero A f and the vehicle velocity v.The output power of the powertrain is P r eq = T rq r eq • ω wheel , T rq r eq = i ST 1 T rq 1 + P L ST 1 ω 1 , T rq 1 ω 1 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where T rq r eq is the requested output torque of the powertrain; T rq E M is the output torque of an electric motor; ω is the angular velocity; i is the gear ratio (fixed for ST1 and depending on gear selection for ST2); P L is the power loss of ST depending on the speed and the torque as well as the gear selection for ST2.The parameters can be found in [5].
All the power is provided by the battery, P batt , as in The TN model of ST2 is modelled as in [4].Such a TN model considers 1) convection between housing and lubricant; lubricant and gear / pinion; lubricant and shaft; lubricant and gear / pinion; lubricant and bearings.2) conduction between shaft and gear / pinion; shaft and bearings.Fig. 20 illustrates the TN model of a pair of spur gears.Temperature dependent heat sources are mechanical power losses of bearing P V L , provided by project partner IMKT and meshing losses P V Z , provided by FZG.They are modelled as several look-up tables.As Fig. 20 shows, the lubricant is considered as a thermal mass and its internal temperature distribution is neglected.Several TN models are connected according to the topology of ST2 illustrated in Fig. 19.Each shaft is simplified as a thermal mass.The heat exchange rates are calculated according to [4].

APPENDIX C
Table III lists the parameters used in (20.a) and (20.b).The parameters of n E M and T q E M are chosen according to operational range; the maximum speed is set to 200 km/ h, as available speed profiles are mostly under such limit; the temperatures of fluid and of all the components are considered under 200 • C; the maximum volume flowrate with two CPs according to simulation is 18 min; the minimum allowed volume flowrate is set to 6.3 L/ min; the parameters of ϕ V is set according to the valve specification; maximum bypass rate is limited to 0.4; the maximum total power loss is set according to the lookup table of all components.

Manuscript received 30
June 2023; accepted 3 August 2023.Date of publication 17 August 2023; date of current version 8 August 2024.This article was recommended for publication by Associate Editor S.-L.Chen and Editor Q. Zhao upon evaluation of the reviewers' comments.This work was supported by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) under Project Speed4E.(Corresponding author: Yikai Tao.)
* , . . ., k N D * T describes the interrelation between the test point and each point in the dataset, where k i * = k(θ i , θ * ).The hyperparameters σ 0 and λ 0 are the amplitude and length scale of the kernel function respectively.∥•∥ 2 represents the Euclidean norm.(10.a) expresses the property that the closer two design vectors are, i.e. smaller Euclidean norm, the more correlated two observations are.

Fig. 6 .
Fig. 6.Power losses of ST1 and ST2 at 20,000 rpm with the lubricant of different temperatures.

Fig. BO
Fig. BO total cost vs iteration: MPC trained with restricted search space.

TABLE II TEST
RESULTS IN CADC, AND WLTC SYNTHETIC DRIVING CYCLES

TABLE III PARAMETERS
FOR NORMALIZATION