Falsifying Cyber-Physical Systems – a Hybrid Optimization-Free and an Optimization-Based Line-Search Approach

—Cyber-physical systems (CPSs) are complex and ex- hibit both continuous and discrete dynamics, hence it is di ﬃ cult to guarantee that they satisfy given speciﬁcations, i.e., the properties that must be fulﬁlled by the system. Falsiﬁcation of temporal logic properties is a testing approach that searches for counterexam- ples of a given speciﬁcation, which can be used to increase the conﬁdence that a CPS does fulﬁll its speciﬁcations. Falsiﬁcation can be done using random search methods or optimization methods. In this paper, a method based on combining random parameters together with considering extreme combinations of parameter values is proposed. Evaluation results on benchmark problems show that this method performs well on many of the problems. Optimization methods are needed when optimization- free methods do not perform well in falsiﬁcation. The e ﬃ ciency of the falsiﬁcation is a ﬀ ected by the optimization methods used to search for inputs that might falsify the speciﬁcations. This paper presents a new optimization method for falsiﬁcation, Line-search falsiﬁcation , where optimization is done over line segments through a vector of inputs in the n -dimensional parameter space. The evaluation results on the benchmark problems show that using this method improves the falsiﬁcation performance by reducing the number of simulations necessary to falsify a speciﬁcation.


I. Introduction
Cyber-physical systems (CPSs) bridge the cyber-world of communications and computing to the physical world. CPSs are widely adopted in many areas like autonomous systems and smart grids. These systems exhibit both continuous and discrete dynamics. CPSs are often safety-critical systems, e.g., autonomous cars and medical devices. Two commonly used methods for assessing the correctness of CPSs are formal verification and testing. Formal verification of the systems exhibiting a combination of discrete and continuous dynamics is an undecidable problem, in general, as discussed in [1]. Thus, testing is a used approach to the design process of complex systems.
CPSs are often developed using a model-based development (MBD) paradigm [2]. For many industrial applications, there are no mathematical models to analyze, and it is only possible to simulate the system under test (SUT). Thus formal verification is not a viable option.
Testing as a process of software verification and validation can only prove the presence of faults, not prove the absence of faults. Testing consists of a set of activities that tries to detect faults that can be fixed later. If a fault has not been identified during the testing runs, it does not mean a fault did not occur.
Simulation-based approaches [3] are methods that can be used to test CPSs, like the falsification of temporal logic specifications. Falsification of temporal logic properties is an approach to find counterexamples for given specifications of CPSs that can be used to increase the confidence that a CPS does fulfill its specifications. The falsification process is performed over a quantitative semantics of the specification. Quantitative semantics uses an objective function that models the distance to falsifying the specification. An objective function value is calculated to guide the falsification process towards an input that falsifies the specification. A numerical optimiser searches for an input that minimises the objective function, thus falsifying the specification if possible. An objective function is determined by the definition of quantitative semantics using temporal logic formalisms [4]. Metric Interval Temporal Logic (MITL) [5] and Signal Temporal Logic (STL) [6] are common temporal logic formalisms that have been used in the literature.
For testing purposes, SUTs are considered to be black box where only the input-output behavior of the SUT can be observed. S-TaLiRo [7] and Breach [8] are two MAT-LAB/Simulink toolboxes used for testing and falsification. S-TaLiRo finds counterexamples using MITL properties, while Breach performs the falsification using STL. In this paper, Breach is used.
In [9], the concept of valued booleans (VBools) was introduced to express two different quantitative semantics, Max and Additive semantics. In the Additive semantics, the robustness of each clause in conjunction can affect the final robustness of the whole formula where the robustness is a value assigned in the VBools. Disjunction and temporal operators are defined in terms of conjunction for the Additive semantics.
The objective functions and the optimization methods affect the performance of the falsification, as discussed in [10]. When gradients are not available, reliable, or practical to obtain, simulation-based approaches are often used for falsification, hence gradient-free optimization methods are used during falsification. In [11], different gradient-free optimizations are reviewed by a systematic comparison and evaluation of some problems. The results show that there is no single optimization method that outperforms the others. Moreover, all the evaluated optimization methods provide the best solution possible for at least some of the test problems.
A direct-search method is a gradient-free method that uses and compares only function values to directly determine the candidate points which are vectors of input parameters, and it does not require the explicit calculation of derivatives [12]. Thus, this method can be applied to non-smooth optimization problems [13]. Nelder-Mead (NM) [14] is one direct-search method.
Model-based search methods are another class of gradientfree methods that construct and use a surrogate model of the objective function to guide the search process [15]. Bayesian optimization [16] is a sequential model-based approach and a global optimization method for black-box functions [17] that is able to deal with stochastic noise in the function evaluations. Bayesian optimization is best suited for continuous domains and a relatively low number of dimensions. For falsification the number of parameters is typically quite high; industrial problems can have 20 to 100 different optimization variables during the falsification and include both continuous and discrete domains. Thus, Bayesian optimization methods are not directly applicable to falsification problems.
Two optimization methods that are suitable for falsification and were evaluated in [10] are NM and SNOBFIT [18]. Both are gradient-free optimization methods. NM starts with a simplex set of n + 1 points where here n is the number of dimensions. In each iteration, the points are sorted from the lowest to highest objective function value. Then, NM attempts to replace the point with the highest objective function value with a new point obtained by reflection, expansion or contraction. If all of these fail, the entire simplex shrinks towards the point with the lowest function value, when n new points are generated by NM.
SNOBFIT works by building surrogate models around each evaluated point. The optimization proceeds by processing of parameter values and the corresponding function values and recommending a new set of parameters values to evaluate. SNOBFIT is a method that typically has a fast local search and contains parameters that controls the balance between the local and global search of the optimization. The reader is referred to [18] for a detailed description of the inner working of SNOBFIT.
The three semantics evaluated in [10] were Max, Additive, and constant quantitative semantics. The latter semantics works by assigning a constant positive value if the specification is fulfilled and a constant negative value if it is falsified. A constant objective value does not hold any information for the optimization methods about how far away from or close to falsifying the specification the current point is. Surprisingly, SNOBFIT using a constant objective value performed better than NM with Max and Additive semantics. This interesting and unexpected result leads us to ask: how can SNOBFIT perform better than other optimizations without any information from the objective values? The answer seems to be that when SNOBFIT does not get any information from the constant semantics, it tends to drive the new parameter values towards the corner points, i.e., the combination of extreme values of the allowed parameter ranges of the SUT. This was the primary inspiration for us to in this paper suggest a new optimization method that features the benefits of SNOBFIT but is less complex. This method will be suggested as a free-optimization method for falsification and it is not a general gradient-free optimization.
Evaluation results on the benchmark problems [19], [20] in this paper show that some specifications are easily falsified by the extreme values of the parameter ranges, i.e., the corner points, or with simple random searches for falsification. In these two cases, the falsification process can be done without using any optimization method. Hence, to combine the advantages of both corner points and random methods, a combination of the corner point and random approach, called Hybrid Corner-Random (HCR), is evaluated in this paper. This method works well for specifications where at least one of the corner points or random points in the parameter space is successful in the falsification of the specifications. This leads us to introduce three algorithms: (i) Corners, (ii) Uniform-Random, (iii) Hybrid Corner-Random. In our evaluation, the latter method surprisingly outperforms the NM method on the benchmark problems, irrespectively of which semantics is used.
For specifications and examples where HCR does not work, an optimization-based approach for falsification is better. In this paper, we propose an alternative optimization algorithm, called Line-search Falsification (LSF), for falsification that has a performance on par with, or better than, SNOBFIT but with a less complex design. The algorithm is inspired by the crawling procedure of NM for local optimization but can also get out of local optima and continue the optimization from a new but related point. Evaluation results on the benchmark examples in this paper show that the LSF method performs better than NM and SNOBFIT.
The main contributions of this paper are: • The Hybrid Corner-Random method which uses a combination of the corner points and random methods. • Line-search Falsification, a gradient-free optimization method. • An evaluation of the proposed methods on the ARCH benchmark [19] and additional benchmarks problems [20]. This paper is organized as follows: Section II introduces the quantitative semantics and different ways to define the objective functions used by the falsification process. Section III introduces the requirements and definitions. Section IV proposes the suggested HCR method. Section V introduces the LSF method. Section VI introduces the evaluated benchmark problems in this paper. Section VII evaluates the performance of the suggested methods on the chosen benchmark problems. Finally, Section VIII summarizes the contributions.

II. Quantitative Semantics and Objective Functions
In this paper, STL [6] is used to model the specifications. The syntax of STL is defined as follows: where the predicate µ is µ ≡ µ(s) > 0 and s is a signal; ϕ and ψ are STL formulas; [a,b] denotes the globally operator between times a and b (with a < b); ♦ [a,b] denotes the finally operator between a and b.
The satisfaction of the formula ϕ for the discrete signal s at the discrete-time instant k is defined as: Instead of only checking the boolean satisfaction of an STL formula, the notion of quantitative value, i.e., an objective value, will be defined to measure how far away a specification is from being falsified. A Valued Boolean (VBool) [9] (v, y) is a combination of a Boolean value v (true , or false ⊥) together with a real number y that is a measure of how true or false the VBool is. This value will be used as a measure of how convincingly a test passed, or how severely it failed, respectively. In the original VBool definition, y is defined to always be non-negative. However, in this paper, we use the convention that y is negative when v is false, and positive otherwise.

A. Quantitative Semantics
Using VBools, we define two quantitative semantics: Max, which is essentially the same as the standard STL quantitative semantics, and Additive. For these semantics we define the respective and, or, always, and eventually operators.
For conjunction, the semantics differ only in the two cases where the truth values are the same: Using the de Morgan laws, the or operator can be defined in terms of and, as: where VBool negation is defined as ¬ v (v y , y) = (¬v y , −y).
For the Max semantics, the always operator over an interval [a, b] is straightforwardly defined in terms of and, as [a,b] where ϕ is a finite sequence of VBools defined for all the discrete time instants in [a, b].
For the Additive, though, always is a bit more involved: where δt is the simulation step size that makes the quantitative value independent of the simulation time, and # is: Furthermore, the eventually operator is for both semantics defined over an interval [a, b] in terms of always, as: The quantitative semantics are used to assign an objective function value given a set of n parameters. The formal relations between the satisfaction of a specification ϕ and value of the function f are [4]: is a signal, and f (x), for the function f : R n → R, refers to the objective function value.

III. Requirements and Definitions
For the falsification process of a SUT, each input parameter, x i , to the system is selected within a finite range, (l i , u i ), with a lower l i and upper bound u i , where l i < u i . Since we have n input parameters, there is a lower bound l = (l 1 , l 2 , . . . l n ) T and an upper bound u = (u 1 , u 2 , . . . u n ) T . A vector of inputs x = (x 1 , x 2 , . . . x n ) T are defined within the range (l, u).
To clarify the meaning of a corner point, an example is given here. Assume a system with three inputs x 1 , x 2 , x 3 . Each input is defined in a specific range, In this paper, it is assumed that we can only simulate the system and do not have access to the internals of the system, thus a gradient-based optimization method cannot be used. Instead, we have to use a gradient-free optimization method.

IV. Hybrid Corner-Random Method
Falsification can be done with or without using an optimization method. Optimization-based approaches are more complex than optimization-free methods, since it is necessary to associate objective function values to simulation traces together with a strategy to choose new parameter combinations to explore based on the objective function values. To motivate the extra complexity in optimization-based approaches we start by investigating how far we can get by using a simple strategy based on using random-search together with exploring extreme parameter values.
So in this section, we introduce the HCR falsification method, an optimization-free method that explores corner points, i.e., extreme values within the allowed parameter ranges, in combination with evaluating random parameter values also within the allowed parameter ranges.
We have observed that a successful strategy in many cases is to focus on the corner points. However, if more simulations are allowed a random strategy will eventually falsify more specifications than a corner-based strategy. A corner-based strategy performs well on some of the models and specifications, while a random strategy performs better on some other problems. We thus propose combining the two strategies. The HCR falsification method is presented in Algorithm 1. Simulate system with x as input. 5: curr simulation ++; 6: Evaluate spec at x.

7:
if the spec is falsified then 8: Return Falsified, x .  In Algorithm 1, it does not matter which semantics is used. The algorithm just evaluates whether the specification is falsified or not at the currently selected point.

A. Output
The output of Algorithm 1 is the tuple Falsified/Not Falsified, x , where the first item tells if the specification was falsified or not and the second parameter x is an n-dimensional parameter point where 1) If the specification is falsified, f (x) < 0.
2) If the maximum number of simulations is reached and the specification is not falsified, x is the last evaluated point.

B. Hybrid Corner-Random Process
Algorithm 1 starts with a corner point, x, in line 2. In each iteration, while the number of simulations, curr simulation is less than the max simulations, the algorithm picks a corner or random point x. Then, the system is simulated at the current point x to evaluate the specification. If the specification is falsified, the algorithm terminates, lines 7-8. Otherwise, if the specification is not falsified, for the next iteration, a new random or corner point x will be picked, lines 9-15.
It should be mentioned here that the number of corners depends on the number of system inputs. Since the number of corners is constant, if there are no new corners to select the algorithm will continue working with only random points, lines 10-14 until the maximum number of iterations have been executed.
Although this method, in general, performs better than the corners and random methods used individually, as will be shown in Section VII, for many of the benchmark problems, the method could not falsify in a reasonable number of iterations. Hence, for the examples where using Corners, Uniform-Random (UR) and HCR is not successful, an optimization method can be used. In the next section, a new method, LSF, will be introduced.
V. Line-search Falsification LSF, proposed in this paper, is a gradient-free optimizationbased approach that combines a local search with the ability to explore new areas of the parameter space in case the local optimization is unable to falsify the specification. This method needs to initially pick a random line and start optimizing over that line to find a local minimum on that line. The method is presented in Algorithm 2, and has been implemented for the falsification of CPSs using Breach [8]. In the next sections, the method will be compared with two optimization methods, NM and SNOBFIT, and also HCR.

A. Output
The output of LSF is the tuple Falsified/Not Falsified, x , where the first item tells if the specification was falsified or not and the second parameter x is an n-dimensional parameter point where 1) If the specification is falsified, f (x) < 0.
2) If the maximum number of simulations is reached and the specification is not falsified, x is the point with the minimum positive objective value, f (x) ≥ 0.

B. Line-search Falsification Process
LSF needs three points during the optimization process, and a new point is generated in each iteration. All points have to be inside the parameter range, or on a lower or upper bound.
The LSF method is presented in Algorithm 2. First one point has to be identified, line 1. This point is the middle point x = l+u 2 in each of the n dimensions. Also, note that this point could instead be chosen randomly within (l, u). The SUT is then simulated with x as input and the corresponding objective function value f (x) is computed. This is handled by the call to Eval(x), which returns f (x). If f (x) < 0, then the specification is falsified and the algorithm terminates. Otherwise, if curr simulations is larger than one, the heuristic H 1 is called, lines 7-9. When curr simulations is not larger than 1, H 1 is not called, as will be discussed later. Then, SelectPoints(x) is called, line 10.
1) SelectPoints: The three points, x M , x L , and x R , are defined and generated in the SelectPoints(x) function. SelectPoints(x) picks a random line that goes through x and computes end-points x L and x R , lines 23-24. There are four options for how to pick these end-points, which we now describe. if curr simulations > 1 then 8: 11: x old = x; 12: 15: map(x, f (x)) evaluations; 16: if x is not in the evaluations map then 17: Simulate with x as input, compute f (x) using the quantitative semantics. 18: Store (x, f (x)) in the evaluations map; 19: curr simulations++; 20: end if 21: Look up the value, f (x), of x in the evaluations map. 22 23: Pick a random line that goes through x. 24: Compute the end-points x L and x R by using one of Options 1-4.
Option 1: x R is a point where one of the dimensions is on the boundary of l or u.
x R is a point where half of the dimensions are on the boundary of l and u. Option 3: x R is a corner point. Option 4: x M = x, x R and x L are the points where at least one of the dimensions are on the boundary of l or u, or can be a corner points. 25: 27: x = be the second point with lowest objective function value of the last call to FalsifyLine. 28 Let z + k ∈ R >0 be the smallest positive value such that for at least k distinct dimensions, x = x new 52: 60: Let z − k ∈ R <0 for 1 ≤ k ≤ n be the smallest negative value such that for at least k distinct dimensions, In the sequel, the end-points defined by z + 1 as Option 1, z + n 2 as Option 2 and z + n as Option 3. Let z j,k = (z − j , z + k ), 1 ≤ j, k ≤ n for Option 4.
For options 1-3, define end-points for a given z + k as x L = x, . For Option 4, define end-points for a given z j,k as , and x M = x. Note that which option to use is predetermined before Algorithm 2 starts, and the same option is used during the entire execution. Fig. 1 exemplifies these four options in two dimensions n = 2, i.e., two input parameters with respective lower and upper bounds. We are allowed to take points within the input parameter box (l, u). Thus, the four corner points are:  To get a clear understanding of how to choose a line according to Option 4, figures 1c and 1d are given. In Fig. 1c where k = 1 and j = 1, we work with a line segment where x R is on the upper bound of the first dimension, u 1 , and x L is on the lower bound of the first dimension, l 1 . The line can be a line that passes through x, but it can also be two line segments that connect x to x L and x to x R as is shown in Fig. 1d where k = 2 and j = 1. In this graph, x L is on the lower bound of the first dimension, l 1 and x R is a corner point 3) Comparison among the four options to choose a line: Which option performs best for a given SUT and specification is heavily application dependent, and thus comparing them has no clear outcome.
In Option 1 and Option 2, we work with lines that cut off at the upper or lower bound in one and half of the dimensions, respectively. In these options, one of the points, x R , is on the bound of the box. In Option 3, one of the points, x R , is on a corner point. On the other hand, in Option 4, x R and x L are either on a bound of the box or on corner points. Option 1 and Option 2 search more points on the boundaries, while Option 3 searches more corner points and towards them if the optimization process guides in that direction. Since lines are chosen randomly, options 1-3 may lead to using short lines that do not guide the process towards falsification or might get stuck for some iterations in a local area. On the other hand, Option 4 works with long lines, extending between the boundaries, which has a higher chance of finding a better point. A comprehensive comparison from the practical perspective based on the evaluated examples in this paper will be given in Section VII.
Admittedly, there are many possible ways to pick the lines. Heuristics for how to choose a line is an open question for future research.
4) Algorithm's main loop in FalsifyLine: Now it is time to perform the falsification process over a line by calling FalsifyLine, line 12. Before calling this function, we need to save the current point x in a variable x old in line 11. FalsifyLine tries to find a new point with lower objective function value. The ultimate goal being to find a point with negative objective value.
The FalsifyLine algorithm is the main loop of the LSF approach. SUT is simulated and evaluated at the three parameter points x M , x L , and x R by calling the Eval function, lines 32-34. If the specification is falsified, i.e., the objective function value is negative, the algorithm terminates. Otherwise, if the specification is not falsified at any of the points x L , x M , or x R , the algorithm starts searching for new points.
In this algorithm, we work with the chosen line and three points generated in SelectPoints function inside FalsifyLine. A iterations without improvement is used to count the number of iterations that we stay with a line inside FalsifyLine until the iterations without improvement has reached max iterations.
There are three cases, depending on which of the points x M , x L , and x R has the smallest objective function value: is searched, and one of the old three points will be replaced with the new point.
If we can improve the point x and find a better point with lower objective function value, the iterations without improvement resets to zero, line 67. Otherwise, iterations without improvement is increased by 1, line 69. When iterations without improvement reaches max iterations, we get out from the current chosen line and pick a new one.
When iterations without improvement has reached max iterations, the FalsifyLine process is finished and it returns a value for x. Now we are in line 3. If the specification is not falsified yet and curr simulation has not reached max simulations and curr simulation > 1, then Heuristic H1 is called in line 8.

5)
Heuristic H 1 : In function Heuristic H 1 , we check the objective function of x old with x. If f (x old ) < f (x), it means that FalsifyLine could not give a better point with less objective function than x old . Hence, to force the algorithm to never return the same x as was given as to FalsifyLine, we consider x be equal to the second point with lowest objective function value of the last call to FalsifyLine. This heuristic helps the algorithm to work better by helping it to avoid getting stuck in a local minimum. For this purpose, first we save the x old = x in line 11. This procedure is presented in Heuristic H 1 . It should be mentioned here that we did not present Heuristic H 1 earlier because it can not be applied until the FalsifyLine function has been called at least once.
This algorithm works until a falsified point is found, i.e., a point with a negative objective function value or the curr simulation reaches max simulations. If the falsified point is not found, the algorithm returns the point with the lowest objective function value.
The next section describes the benchmark problems that will be used to evaluate the suggested methods.

VI. Benchmark Problems
In this paper, we consider the simulation-based falsification for all benchmark examples of the ARCH workshop [19] and some additional benchmark problems [20]. The paper [19] describes a friendly competition in the ARCH19 workshop for the falsification of temporal logic specifications over CPSs. Table I demonstrates the STL specifications for all the example models. These examples are briefly introduced in the following.
For more details about the two variants of input signals, see [19].

1) Automatic Transmission (AT):
The controller of this model selects a gear 1 to 4, depending on inputs throttle and brake; rotations per minute (ω); and car speed (v) [21]. Two different control inputs are considered for this example: • Instance 1: 0 ≤ throttle ≤ 100 and 0 ≤ brake ≤ 325, where both can be active at the same time. • Instance 2: Constrained input signals with discontinuities at most every 5 time units.

2) Chasing Cars (CC):
The model is a simple model of an automatic chasing car [22]. The CC model consists of five cars, where the first car is driven by inputs (throttle and brake), and the other four cars are driven by Hu et al.'s algorithm. The location of the five cars y 1 , y 2 , y 3 , y 4 , y 5 is the system output. The inputs are: • Instance 1: The input specifications allow any piece-wise continuous signals. • Instance 2: The input specifications constrain inputs to piece-wise constant signals with control points for every 5 seconds, 20 segments.

3) Fuel Control of an Automotive Power Train (AFC):
This system is modeled on [23]. The constrained input signal fixes the throttle θ to be piece-wise constant with 10 uniform segments over a time horizon of 0 with two modes and the engine speed ω to be constant with 900 ≤ ω < 1100.

4) Neural Network Controller (NN):
This benchmark is based on the neural network of the NARMA-L2 [24] controller. It is designed for a system that levitates a magnet above an electromagnet at a reference position. A reference value Ref for the position, where 1 ≤ Ref ≤ 3, is the only input of this model. The current position of the levitating magnet Pos is the output. The two variants of inputs considered for this model are: • Instance 1: The input specification requires discontinuities to be at least 3 time-units long. • Instance 2: An input signal with exactly three constant segments is required.

5) Aircraft Ground Collision Avoidance System (F16):
The F16 aircraft and its inner-loop controller are modeled for Ground Collision avoidance. 16 continuous variables with piecewise nonlinear differential equations are modeled [25]. The system is required to always avoid hitting the ground during its maneuvers, starting from all the initial conditions for roll, pitch, and yaw in the range [0.2π, 0.2833π] × [−0.5π, −0.54π] × [0.25π, 0.375π].
6) Steam Condenser with Recurrent Neural Network Controller (SC): This is a dynamic model of a steam condenser based on energy balance and cooling water mass balance, controlled with a Recurrent Neural network in feedback [26]. The input can vary in the range [3.99, 4.01], and the input signal should be piece-wise constant with 20 evenly spaced segments.

7) Wind Turbine (WT):
A simplified wind turbine model from [27] is considered. The input is the wind speed v, while the outputs are blade pitch angle θ, generator torque M g,d , rotor speed Ω, and demanded blade pitch angle θ d . The wind speed is constrained by 8.0 ≤ v ≤ 16.0.

1) Automatic Transmission (AT ):
The inputs to the model are the throttle and brake of a vehicle. The outputs of the model are the vehicle speed v, the engine speed ω, and the gear, see [28] for details. This example has different specifications from the ARCH example presented in Section VI-A1.
2) Third Order ∆ − Σ Modulator: The third order ∆ − Σ modulator is a model of a technique for analog to digital conversion. It has one input U, three states x 1 , x 2 , x 3 , and three initial conditions x init 1 , x init 2 , x init 3 , see [29] for details. 3) Static Switched (SS) System: The static switched system is a model without any dynamics that is included as a simple case. The model is inspired by [30].

VII. Experimental Setup and Results
In this section, the Corners and UR methods are compared with the HCR approach. The LSF method are compared with NM and SNOBFIT and also, HCR. Finally, All methods are compared together. The two semantics Max and Additive are considered in this paper. The results are shown in tables II to XI. Each falsification is set to have max simulations = 1000. There are 20 falsification runs for each method and objective function to account for most algorithms' random nature.
Using Corners, UR, or HCR, it does not matter which semantics is used, thus they are evaluated by only one semantics.
1) Corners Implementation Setup: Each specification and example has a different number of corners. For some examples, the number of corners is less than the maximum number of simulations. Then, the Corners method terminates when all corners have been evaluated.
2) Uniform-Random Implementation Setup: For each of 20 falsification runs, uniformly distributed random points are generated from different seeds, i.e., 20 different seeds are considered.
3) Hybrid Corner-Random Implementation Setup: The HCR method starts with the Corners point, and the next point is the UR point. It switches between the Corners and UR until the maximum number of simulations, 1000 here, is reached or a falsified point is found. The number of corners is limited, and it depends on how many corners points the SUT has. If the maximum number of corners is reached, the HCR algorithm continues using only the UR points. 4) Nelder-Mead Implementation Setup: This algorithm is implemented as fminsearch [31] in Matlab. Before starting NM, 100 random sampling points for all examples are generated. After evaluating 100 random sampling points, the point with the lowest objective value is picked as a starting point if none falsifies the specification. NM needs a simplex of n + 1 points for n-dimensional vectors, while we have one point. The n points will be generated in the fminsearch of MATLAB using that minimum point. The minimum point is similar to x 0 when calling fminsearch ( f, x 0 ) in MATLAB, where it starts at the point x 0 and tries to find a local minimum.
Some parameter values are used by fminsearch. The first one is the maximum number of iterations. The default is 200 × n. The second one is the maximum number of evaluations of the function. The default is 150 in this paper.
Unless falsification occurs, NM exits when one of the following conditions hold: • The maximum number of iterations has been reached. • The maximum number of evaluations of the function has been reached.
tolerance. If one of the above conditions is fulfilled, 100 new random points are generated, and the NM process restarts. 5) Line-search Falsification Implementation Setup: The value of max iterations is constant and equal to 3.
6) Tables Setup: The first column denotes the specifications, semantics, optimization method, and free-optimization methods; the next columns show the Max, and Additive semantics. For each method and semantics, two values are presented. The first value is the relative success rate of falsification in percent. There are 20 falsification runs for each parameter value and objective function; thus, the success rate will be a multiple of 5%. The second value, inside parentheses, is the average number of simulations per successful falsification. The presented results in the tables for LSF are related to Option 4.

A. Hybrid Corner-Random vs. Corners and Uniform-Random
First, we compare the Corners, UR, and HCR methods are given. According to tables II-XI, for many specifications, at least one of Corners and UR work well. Now, we go through all examples in detail.
The AT example, Instance 1 Table II, is one of the examples where for all specifications, one of Corners and UR, or even both successfully falsify the specification with rate 100%. ϕ AT 1 is falsified with Corners. Both Corners and UR work well for ϕ AT 2 . For the other specifications, UR works well. Thus, if we only have one of Corners or UR, we cannot get a falsified point for all specifications, but the HCR method is fully successful (100%), for all specifications of this example. As can be seen, for Instance 2 of AT example, the specifications ϕ AT 1 , ϕ AT 7 , ϕ AT 8 , ϕ AT 9 are hard to falsify using Corners and UR. Thus, the HCR method does not perform as well. In this situation, optimization methods are needed, as will be discussed later. On the other hand, the specifications ϕ AT i where i = 2, . . . , 6 are falsified by both Corners and UR. Thus, HCR is also good for these specifications.
In the CC example for both Instances 1 and 2, Table III, all specifications (except ϕ CC 4 where an optimization method is needed) can be falsified easily. Also, for ϕ CC 5 of Instance 1, UR beats Corners, thus HCR improves on Corners.
In the AFC example in Table IV, the Corners method is the best method for both specifications of this example requiring fewer simulations than to UR, Table IV. HCR helps here to improve UR and beats UR, improving falsification.
In the NN example, Table V, Instance 1 for specification ϕ NN 1 , HCR improves the falsification and beats the Corners method. In Instance 2 of this specification, both Corners and UR work well, thus HCR performs well. On the other hand, for Instance 1 of ϕ NN 2 , none of the optimization-free methods can falsify. For Instance 2 of this specification, Corners is successful thus HCR works well.
For the F16 example, Table VI, using an optimization-based approach improves the falsification process. UR finds only one out of 20 runs of testing; in this case, it is not a good method and does not help HCR to be more efficient. In this example, the number of corners is 8 so that most iterations of HCR are done using random points.
The SC example, Table VII, is one example where it does not matter which semantics or optimization method is used, since none of them can falsify the specification. [26] used a Simulated Annealing global search in combination with an optimal control based local search on the infinite-dimensional input space to successfully falsify this specification.
The WT example, Table VIII, is falsified easily by both Corners and UR. Compared to the first three specifications, ϕ WT 4 is hardest and needed more simulations, but still, Corners, UR, and HCR are all successful in falsifying the specification.
For the AT example, Table IX, all specification ϕ AT i for i = 1, . . . , 5, 8, could be falsified using Corners while UR failed for ϕ AT 1 (T = 30), (T = 40) and ϕ AT 4 . Thus, HCR is successful here. On the other hand, UR performs better than Corners and HCR in ϕ AT 6 (T = 12) and ϕ AT 7 . HCR is more successful than Corners here. Only for ϕ AT 6 for both (T = 10), (T = 12) and ϕ AT 7 , it is better to use an optimization-based approach. For the modulator example, Table X Table XI, is falsified on the corners, so HCR performs well.
For a summarized comparison between Corners, UR, and HCR, a cactus plot is shown in Fig 2. The results presented in this plot relate to all examples, even those where optimization methods were required to be able to falsify. As can be seen in Fig 2, while UR is more successful than Corners, for those cases that are falsified on the corner points, Corners falsifies much faster and with less number of simulations. Thus, HCR manages to falsify more examples than either of the other two.

B. Hybrid Corner-Random Vs. Optimization Methods
In this section, we discuss the advantages of HCR, and why we should first try to falsify our examples with HCR before using any optimization-based method. We also compare HCR with NM, SNOBFIT and LSF.
For ϕ AT 2 , Table II, NM is not 100 percent successful, and SNOBFIT and LSF need more simulations than Corners and HCR. On the other hand, HCR manages to falsify the property with just a few simulations. Similarly, as can be seen in Table III, Instance 2, ϕ CC 2 ; Table IX Table XI, NM, SNOBFIT and LSF need more simulations than Corners and HCR.
To compare Corners, UR, and HCR with NM, SNOBFIT and LSF, a cactus plot is shown in Fig 3 for those examples that using an optimization-based approach to improve the falsification process is not needed. As can be seen in this figure, HCR beats all methods and manages to falsify more examples than either of the optimization methods.
As can be seen in Fig. 2, HCR beats all methods requires fewer simulations than all other methods. As a result, using HCR is a viable method for falsifying those examples where the optimization-based methods do not perform well. Since HCR can falsify many examples with relatively few simulations, and hence take a short time to do this, it is useful to first try HCR on any example, and only if it does not manage to falsify the property, an optimization-based approach can be used to improve the falsification process.     C. Line-search Falsification using the four options Section V introduces four different options for LSF to pick a line. A cactus plot comparing these options is given in Fig 4. Option 1, where we work with the lines that cut off at the         more successful in the falsification process because it uses the combination of three options. Hence, in the tables, we just quote the results using Option 4 for LSF.

D. Line-search Falsification vs. Nelder-Mead and SNOBFIT
This part compares the results of the new LSF method with NM and SNOBFIT for those specifications and examples that it is not possible to falsify by Corners and UR. Most of the available benchmark problems in this paper are easily falsified using the Corners or UR methods. It is only for a subset of the specifications and models that optimization methods are necessary. These are specifications 1 and 7-9 of the AT example for Instance 2, Table II; the fourth specification of both instances 1 and 2 of the CC example, Table III; the second  specification of the NN example for Instance 1, Table V; the  F16 example Table VI; specifications 6 for both T = 10, T = 12 and 7 of AT , Table IX; and the first specification of the  Modulator example, Table X. For some specifications and systems, there is a clear tendency that for a specific optimization method and semantics to perform better. Instance 2 of the AT example, Table II, is an excellent example to show the performance of LSF. Except for ϕ AT 1 , which can be falsified using a method not considered by this paper [19]. LSF manages to falsify 100% of the other specifications, while NM cannot falsify any of the ϕ AT i with i = 7, 8, 9. There is also a significant advantage for LSF relative SNOBFIT for these specifications.
In the CC example Instance 1, Table III, ϕ CC 4 while NM is not successful in the falsification process, SNOBFIT and LSF are successful in some runs where SNOBFIT works better than LSF. In Instance 2 of this example, a 10% improvement happens in CC, Instance 2 using LSF, it gives this result that it is possible to falsify this specification where none of NM and SNOBFIT are successful.
In the NN example, for Instance 1 of specification ϕ NN 2 (Table V), LSF and SNOBFIT perform similarly and better than NM, respectively. In the F16 example, Table VI, LSF and SNOBFIT work similarly and better than NM, respectively.
In the AT , Table IX, only in the two specifications ϕ AT 6 (T = 10), and (T = 12), NM performs better than two others. Compared to SNOBFIT, which is not successful for ϕ AT 6 (T = 12), LSF performs better and it manages to falsify this specification in some runs. ϕ AT 7 is the only specification of this example where SNOBFIT works better than LSF.
In the ∆ − Σ Modulator example, Table X, LSF manages to falsify to 100%, which is a slight improvement over SNOBFIT, and a big improvement over NM.
An aggregated comparison is shown in Figure 5. As can be seen, and is evident from the discussion above, LSF outperforms NM. LSF can falsify using fewer simulations, which is important as simulations are costly. Also, LSF also works better than SNOBFIT while being simple. Comparing the semantics, Additive works slightly better than Max.

VIII. Conclusions
In this paper, two optimization methods have been proposed to enhance the falsification of CPSs. The first method is the combination of Corners and Uniform-Random, Hybrid Corner-Random. For systems that are falsified on the parameter corners or by randomly choosing parameters, this method manages to falsify in the fewest number of simulations. On the other hand, there are specifications and models for which neither Corners nor UR are efficient, and then optimization-based methods are needed.
The second proposed method is a gradient-free optimizationbased method, called Line-search Falsification, that optimizes over line segments through a vector of inputs in the ndimensional parameter space. This method is compared to the NM and SNOBFIT methods. The proposed method has been evaluated on standard benchmark problems. An aggregated comparison among all methods presented and discussed in this paper for all examples is shown in Figure 6. As can be seen, HCR not only manages to falsify more examples than Corners and UR but also has a better performance than NM A key strength of using model-based development methods is that the models can be used for various purposes, including testing. Testing benefits from running a large number of simulations. If only software models are needed to run the simulations, testing can take advantage of computing clusters to run the models in parallel, using thousands of computing nodes. Since, in industry, there is the possibility to run the different methods in parallel using a computing cluster, many more examples in a shorter time can be run using optimizationfree methods, thus raising the confidence in the correctness of the SUT.
For future work, it would be interesting to evaluate the suggested LSF method on industrial applications as well as extending the number of benchmark problems.