Local Virtual Times Analysis in PCS Model

—Efﬁcient scalability and process synchronization are critical for achieving high performance in distributed computing environments. Analysis of the scalability is usually done using intensive case studies, which give an answer only for the particular set of model parameters. We found an efﬁcient way to analyze the time evolution in models simulated with the Parallel Discrete Event Simulations (PDES) approach. The essential feature of PDES is the concept of local virtual time (LVT) associated with the evolution of each process of the model. The LVT of processes evaluates in simulations and forms a complicated proﬁle. These proﬁles remind the proﬁles of the surface growth in the physical devices. In physics, researchers use the concept of universality, which helps to divide the different regimes of the class’s surface growth—each class is described by some universal laws and does not depend on the details of the model. We demonstrate the applicability of this concept and present a model of LVT proﬁle evolution in Personal Communication Service (PCS) model. The PCS network consists of a square grid of radio ports that serve users in their zone (cell). We build the LVT-PCS model, which describes the evolution of the LVT proﬁle associated with the PCS model. We simulate the PCS model using the ROSS simulator (optimistic PDES) and compare results with those simulated by our LVT-PCS model. We found the proﬁle demonstrates property, which is known in physics as roughening transition. We estimate the values of “critical” exponents for two models, which seem to belong to the same universality class. We believe that the similarity we found can be helpful for the preliminary analysis of the model scalability, process desynchronization, and possible deadlocks.


I. INTRODUCTION
Parallel discrete event simulation (PDES) is a promising approach for the simulation of extreme-scale systems with a wide range of applications. Developing a new parallel simulation system or building it on top of an existing system is challenging due to the large set of parameters that affect the resulting performance. To achieve a parallel simulation performance, the developers should solve various fundamental problems: synchronization, load balancing, error recovery, and network overhead. Moreover, the PDES performance depends on the simulation environment, such as programming language, computer platform and infrastructure, processor type, and other characteristics. All these circumstances are biased, which makes prediction almost impossible. Each particular system needs deep analysis and proper parameter selection before its large-scale simulation.
To gain more insight into the behavior of a simulation, developers commonly use tracing. Tracing records events of interest during program execution and store the history for later analysis. The method provides detailed information about a particular system running in a specific environment. However, tracing is not free from drawbacks. Firstly, tracing is memory-consuming. Each event must be recorded and analyzed while weighing terabytes of data in the typical simulation. Secondly, the problem is that tracing events adds perturbation to the simulation program and significantly slows down the simulation. In other words, tracing makes it hard to achieve an unbiased observation of the execution.
There are dozens of researches dedicated to enhancing and optimizing tools and simulation techniques [1], [2], performance analysis, and description of visualization tools [3], [4], [5], [6], [7]. The testing of tools with various sets of parameters and comparing them is a time-consuming and expensive technique.
In the paper, we present an approach that may predict the performance of a PDES system under given circumstances and without the direct investigation of the model behavior using the above-mentioned techniques. The approach is sometimes referred to as "simulation of simulation", or meta-simulation. We describe a general model of local virtual time evolution (LVT model) in optimistically synchronized parallel discreteevent simulation, capturing the essential properties of the actual simulation models.
For a given simulation PDES model, the LVT model may predict possible deadlocks, degree of scalability, and degree of desynchronization between parallel processes. It is important to stress that our approach gives a piece of complementary information to other performance analysis techniques rather than provides a very detailed view of the model under consideration.
The method of performance analysis using LVT profile has several advantages: 1) the LVT model describes a pattern of LVT profile growth in conservative or optimistic synchronization algorithm on a given communication topology, which make the model-independent on the context of the particular implementation of algorithms and simulation environment; 2) it makes possible to establish functional dependencies between the parameters of the model, such as means of distributions, number of logical processes (LPs), network topology, and the resulting efficiency of the computations. The theoretical predictions in the limit of an infinite number of LPs are reached for typical PDES models already for the size of 10 3 -10 4 LPs.
The LVT models demonstrate similarities with the interface growth models in physics. The evolution of the LVT profile in the conservative PDES algorithm belongs to Kardar-Parizi-Zhang universality class [8], and the LVT model's profile belongs to the universality class of directed percolation [9]. The present here the LVT-PCS, which belongs to the class of universality of random deposition [10] process, The theoretical predictions of such an approach can be highly applicable for a large class of existing models in PDES simulations.
The paper is organized as follows. A short description of the LVT is given in Section II. Works related to the efficiency analysis of the PDES models are reviewed in Section III. In Section IV, we present the validation of the LVT model by comparing simulation results in LVT and the model of Personal Communicational Service (PCS) for a widely used PDES simulator named ROSS [11]. Finally, the discussion and conclusion are presented in Section V.

EVOLUTION
In the PDES framework [12], the model system is simulated as a set of independent logical processes that evolve their states along the time axis. The LPs generate time-stamped events and exchange them with each other. The events are stored in queues of receiving LPs, with time ordering. The LP processes an event, updating its Local Virtual Time (LVT) to the time of the processed event so that the LVT profile grows during the simulation. In the optimistically synchronized models, the time profile grows not only in the time direction but also goes back in time, using the rollback mechanism [13].
We use the following assumptions in the model of LVT profile evolution [9]: • events are Poisson arrivals, so the time between them is exponentially distributed, • graph of possible LP communications (dependences) is known in advance, • number of rollbacks is given as a simulation parameter. The simulation starts with a flat profile, τ i = 0, i = 0..N − 1, where N is a total number of LPs, τ is a local virtual time. Firstly, we increase LVTs of every LPs by exponentially distributed random value η i with unity mean: t is a simulation time step. At this stage, we simulate the profile's growth, assuming that no causality violation has occurred. After this step, we simulate a rollback. The parameter b is a mean rollback depth (number of rollbacks). We draw exponentially distributed random value k with mean b, and then randomly choose kN LPs, which local time will rollback. We simulate rollbacks as follows: if the LVT of an LP is less than the LVT of one of its neighbors (randomly chosen), then the LVT is reduced to the neighbor's time.
where r -the index of neighbouring LP.
After each time step we calculate the observables: 1) The average LVT τ (t): 2) The average speed of the profile v(t): 3) The average width of the profile w 2 (t): The final results are calculated as the average over R independent runs of the random process: In the language of computer science, the average speed of the profile v is associated with the efficiency of the simulation, which is the average load of the parallel processes. The second observable, the average width of the profile w 2 reflects the desynchronization between the processing elements.
The model allows the analysis of the essential fundamental properties of the PDES algorithm. For example, the zero speed of the LVT profile illustrates the zero utilization of processing time or possible deadlocks. The growing width of the profile shows imperfect synchronization between LPs, which also decreases the efficiency. The more significant difference between LP's time is, the more rollbacks may occur. Divergent width may also illustrate an insufficient load balance (some LPs are always ahead of the others). The LVT profile model enables computation of the functional dependency between the number of rollbacks and parameters of communication topology between LPs. In [14], it was shown that the additional rare long-range communications between LPs in conservative PDES scheme significantly decreases the desynchronization.
As mentioned above, the model's distinctive feature is similar to the models of growing interfaces in physics. Interestingly, the model with entirely different rules forms socalled universality classes, which are characterized by the set of universal critical exponents. These exponents describe the model behavior near the critical point. The models of LVT profiles are known to belong to such universality classes [15], [16]. One may suggest that most of the variations of the Time Warp algorithm will not change the universality behavior of the systems.

III. RELATED WORK
Various approaches have been employed to study the performance of parallel discrete event simulations or other HPC systems. All of them may be classified into three groups: 1) Performance analysis based on tracing and visualization, 2) Performance analysis of modified PDES algorithms (f.e., GVT computation, optimism control, cancellation strategy), 3) Meta-simulation and analytical models.

A. Performance analysis based on tracing and visualization
One approach to the analysis of parallel program performance is to analyze execution traces and provide visualization of collected log files. Various tools have been developed to perform such analyses. Usually, they support major parallel programming methodologies, not only PDES programs. Examples of such tools are VAMPIR [3], Jumpshot [17], TAU [18], Scalasca [5], and EXPERT [19]. They may build state diagrams, activity charts, timeline displays, and other statistics by a given trace file. Some of the tools, e.g., Jumpshot and TAU, may generate event traces that can be displayed with other special tools.
Some of the software, as Projections [6] or special visualization tool for ROSS [7], [11], is designed specially for PDES applications. They provide an additional view of the PDES program's execution, which may be compared with the capabilities of our LVT model. For example, one of the capabilities of Projection is to draw real-and virtual-time active LPs (see Figs.12-15 in [6]). Our model also measures the number of active LPs, assuming that active LP is a process that increases its local virtual time at a given step of the simulation. In [7] Figures 7 and 8 show the difference between virtual time and GVT for best and worst efficiency configurations. These pictures are similar to the LVT profile in our model. Moreover, it is seen that the width of the profile (or the spreading near the mean value) is larger for inefficient configuration and negligible for the best efficiency configuration. So, this illustrates that the observables we study in our models are interesting for PDES developers.
Most of the described tools collect statistics of events in terms of real-time. As an alternative, Ravel traces an event history transferred into logical time, inferred from happenedbefore relationships [4]. Such an approach leads to a much scalable representation of PDES applications without loss of communication and dependency relationships between processes. Our approach, however, is closer to a real-time tracing rather than logical, even though it stimulates the growth of local virtual (logical) times.
Some tools focus on particular aspects of parallel programs, such as memory consumption or load balancing. ScalaMem-Trace collects memory traces in order to detect memory inefficiencies [20]. Such analysis is also quite crucial for PDES efficiency. However, we do not consider any questions regarding memory, load balancing, and robustness in our model.
We refer readers to a review [21] for more information on the visualization tools. As stated in the paper, one of the challenges of visualization approaches is scalability. The scale of collected performance data grows exponentially, and it is essential and challenging to create scalable performance visualization on the contrary. Our LVT model aims to provide a scaled view on the growth pattern of LVTs, and it does not require large memory consumption.

B. Modification of PDES algorithms
Another group of researches aimed to increase the performance of PDES computations focuses on developing new mechanisms and modifications of the original Time Warp algorithm. Implementation of such aspects as computing of GVT, cancellation strategy, and optimism control may differ in different simulators.
Optimism control is an approach of adding the restrictions into message communication or event schedule to reduce the number of rollback events (using a simulated time window, limiting the number of events each LP may execute beyond GVT, and sending only guaranteed correct messages). In [22] information about future incoming events is collected via additional communication between neighboring LPs (what restricts the natural parallelization) and then used for better synchronization. Another examples of control mechanism are Moving Time Window [23], Breathing Time Window [24], Probabilistic Cost Expectation Function Protocol [25], PADOC (Probabilistic Adaptive Direct Optimism Control) [26], and Switch Time Warp mechanism [27], [28]. Additionally, to the time window, it is possible to use speculative computations. For example, in [29], in speculative computations, nodes are allowed to execute future events being in the "idle" phase. When the synchronization is complete, the speculative computation is tested for correctness and either committed or discarded.
In many PDES models, only a tiny subset of simulation threads are active at a given period. This fact is sufficiently exploited in Demand-Driven PDES [30]. The key idea behind DD-PDES is to identify threads that have no events to process and exclude them from the CPU and GVT computing until they receive a special message. DD-PDES mechanism is orchestrated by a particular "controller" thread.
Another aspect of Time Warp implementation is related to the computing of Global Virtual Time. The computation of GVT is an expensive operation because usually, during the collection of LVTs, LP should stop the simulation. With the increasing number of shared memory multicore machines, various non-blocking algorithms have become popular. For instance, using a multi-thread PDES environment, it is possible to compute GVT without blocking operations [31].
Most of the PDES modifications can be mapped on the LVT model for the optimistic algorithm. However, some mechanisms may significantly affect the appearance of the LVT profile. We do not analyze, for example, the possibility of mapping the PDES modification reported in [32], where authors developed the optimism control technique with rolling back of all processes to GVT at stochastically selected realtime intervals.

C. Meta-simulation and analytical models
Meta-simulation is a performance prediction approach based on the simulation of a PDES execution, or "simulation of simulation". This method provides a performance prediction without the need to develop and run the actual parallel discreteevent simulation. Meta-simulation allows implementing different PDES algorithms in a common environment to avoid timeconsuming testing in a realistic simulation environment. Thus, such factors as a programming language, network structure, or model properties are significant. On the other hand, as metasimulation uses many abstractions (i.e., negligible roll-back cost, zero-delay communication), the results of performance predictions may differ from real ones.
For example, SimSim [33] is a sequential simulator for parallel and distributed simulation systems. A real simulation model is executed on a set of modeled processors in Sim-Sim connected via a modeled network. Then the events are executed, as usual (considering the order of the size of the messages). Some factor then scales the resulting execution time for the baseline system. Similar technique is used in the Scalable Simulation Framework (SSF) [34], [35] and special performance analyzers [36], [37].
There exist also a purely theoretical way to analyze the PDES performance, for example, Markov models. In [38] the Markov model of the Time Warp algorithm was used in order to estimate such performance measures as the fraction of time the processors remain idle, the expected length of rollback, the expected number of processed uncommitted events, the expected number of processed events above the GVT, the effective message density, and the probability of rollback. The model results are in good agreement with PHOLD simulation; however, the model is hardly adaptable to a particular simulated system.
Another exciting and worth mentioning here reported in [39]. The goal of the work was to estimate the influence of the communication topology, the lookahead, and the computation and communication delays on the simulation performance using the analytical model of the null-message conservative algorithm. The performance metric was defined as the ratio between the simulation end time and the total run time for all LPs. The approach is similar to the one we use in this paper because it focuses on estimating the speed of the LPs' progress.
Our LVT model lies somewhere between the metasimulation and theoretical category because it simulates not the whole PDES execution but instead specifies the local virtual time growth in particular synchronization PDES algorithms, basing on the set of theoretical assumption.

IV. VALIDATION OF LVT MODEL
This section describes how to establish a relationship between the LVT model and other PDES models. We use the Personal Communication Service network model (PCS) [40], [41], [42] as an example.

A. PCS network model
A personal communication service network is a network of distributed radio ports, each having a set of radio channels and users. The users send and receive phone calls by using these radio channels. When a user moves from one cell to another during a phone call, the network attempts to reallocate the call to the new cell (a port coverage area). If all channels in the new cell are busy, the phone call terminates.
The PCS model is implemented on ROSS simulator [11]. Let us describe the system in the language of logical processes and events. In this model, logical processes simulate the work of the radio ports (one LP = one radio port). Logical processes are located on the square lattice ( Figure 1). Four different types of events are implemented in the system: 1) NextCall -a call arrival at a cell; 2) CompletionCall -a completed call at a cell; 3) MoveOut -a call moving out of the current cell (remote event); 4) Moveln, which denotes the arrival of a hand-off call at a cell. In PCS, the time between events is exponentially distributed. The following parameters set the means of the distributions: • MOVE CALL MEAN -the mean time between Move-Out events • NEXT CALL MEAN -the mean time between NextCall events • CALL TIME MEAN -the mean call duration. In the PCS model the most events are processed locally by the LPs, but some events, i.e., of type MoveOut, are generated by LPs for other LPs. Such events, traversing between LPs, are called remote events. The percentage of remote events is available in the output statistics. We denote this percentage by p . In other words, the LPs are located on the square lattice with periodic boundary conditions, and the interaction between the closest LPs occurs with the probability p .
Besides the number of remote events p , the ROSS statistics provides such data as total events processed, the number of rolled back events, and the event rate. Moreover, to compare precisely the behavior of the LVTs, it is possible to trace the LVTs during the simulation.
It is important to note that the statistics of the computations is collected not by each LP but based on specially introduced Kernel Processes (KPs), which aggregate a group of LPs in order to process event-list for those LPs as a single list, and make the GVT computation and fossil collection more effective [11].

B. Parameters mapping
In the LVT model, we have two major parameters: the growth rate q and the parameter of communication topology p, which regulates the number of interprocess communications (remote events). The growth rate q is defined by the formula: b is the average "length" of rollbacks provided that the average increase of the profile is equal to unity. The parameter may be interpreted as the number of processed events divided by the sum of processed and rollbacked events.
Most of the events are processed locally on the generated LP in the PCS network model, but sometimes LPs may send events to the neighboring cells. Such remote interactions between LPs occur naturally when and events of type MoveIn or MoveOut are generated. The percentage of remote events is not a parameter of the system but the statistical output of the simulation. We denote it by p to emphasize the connection with the parameter p of the LVT model. Fraction of remote events p depends on the number of MoveOut events, and it is inversely with the parameter MOVE CALL MEAN.
The analogue of the growth rate q, in the PCS network model we define a number q : q = 1 − number of rollbacks total number of processed events (10) Both q and p are biased variables, depending on the parameter MOVE CALL MEAN.

C. LVT-PCS model
We define the LVT-PCS model as the particular case of the LVT model described in section II. The central point in mapping the PCS model to the LVT-PCS model is identifying the parameters p, q necessary for the LVT-PCS model simulation with the parameters p , q extracted from the PCS model simulation.
The protocol of simulations is following 1) simulate the PCS network model with parameter MOVE CALL MEAN choosing from the range [300, 4500], 2) trace the LVTs during the simulation, 3) save the corresponding values of q and p , 4) simulate the LVT model with p = p and q = q , 5) compare the average speed of the profile in LVT with the simulation performance in PCS, 6) compares the average speed and the average width of the profile in both models. We simulate the PCS network model using 64 cores of a computing node with two Intel(R) Xeon Platinum 8164 2.0 GHz processors and 2x768 GB 2666MHz DDR4 onboard memory. The number of LPs and KPs is equal to 256. We study how the average local virtual time grows during the simulation. Thus, the average speed of the profile is calculated as a slope of the function LV T (t), where t is a batch of processed events equal to 256 per LP. In ROSS, the statistic is produced at the moment of GVT computation and other serving computations. The average width of the profile is calculated by formula (5) and then averaged over t after some cutoff time. The final result is also average over ten independent runs.

D. Results
The main result of our simulation is that the behavior of LVT profiles in both models is qualitatively similar. The more remote events, the lower the speed of the LVT growth, event rate, and efficiency, and the average width of the profile. The comparison of the average speed of the profile in LVT and PCS models and event rate in the PCS model is given in Figures 3 and 2. The average speed in the LVT model reflects the utilization of events in PCS models (i.e., event rate).
We simulate the PCS model with ROSS simulator [11]. The dependence of the event rate ER as a function from the value of parameter q is shown in the Figure 2. We fit event rate data using expression ER = ER 0 (q − q c ) ν + const, and estimate values of q c = 0.100 ± 0.001 and ν = 1.5 ± 0.1. We use pairs of values p , q extracted from PCS model simulation to simulate LVT-PCS model with pairs of parameters p, q equal to p , q . Figure 3 demonstrate the resulting dependence of the profile speed average as a function of the parameter q. There are three ranges of q with different speed dependence. Firstly, it is zero velocity below value q c ≈ 0.132 ± 0.005, which is the roughening critical point [43]. Secondly, we use approximation of the profile using expression v = v 0 (q − q c ) ν + const of the data in the range q c < q < 0.3. We estimate value of the exponent ν = 1.5 ± 0.01, which is smaller than in the directed percolation model, ν DP ≈ 1.73) [43].
Thirdly, it is the regime of the random deposition in close to value of 0.8, in which profile growth randomly at each LPs, and average speed reach value one, the maximum possible profile speed.
We found that value of exponent ν in PCS model and LVT-PCS model is close to 1.5, and the critical value of q c is around 0.1. Therefore, the LVT-PCS model reflects critical properties of PCS model which was now known before our analysis using similarity of the profile with those in the statistical physics of surface growth. We have to note the value of exponent is smaller than those of the physical models because of the particular topology of PCS model. We also check how values of the profile speed and event rate depend on the value of the parameter p which reflects the random selection of the neigbour cite (cell) in PCS model. Figure 4 show the dependence of the average LVT speed and event rate on the fraction of remote events p . The plots are looks similar.
So, the analysis shows that the average speed of the profile in the LVT model indeed corresponds to such popular measure of PDES efficiency as event rate [44], [45]. Slight differences in the behavior of these observables may be explained by the presence of additional correlations in the actual models, which are not taken into account in the LVT model.
The average width of the profile in both models is qualitatively similar (Figures 5, 6), even though in the PCS network mode. As was stated above, the average width of the LVT profile reflects the desynchronization between logical processes. In actual systems, it is possible to estimate the desynchronization by tracing the local virtual times. This characteristic is not very meaningful itself. However, it correlates with the efficiency of the simulation. When the average width of the profile is low, the speed of the profile (i.e., the efficiency) is high, and vise versa. Our results confirm this fact.

V. DISCUSSION
We described a possible approach to performance analysis of PDES systems based on the local virtual time profile evolution simulation. The model captures the essential properties of PDES and may predict the possible deadlocks, efficiency behavior, and desynchronization depending on the distribution means and communication topology.
The LVT-PCS model was validated by comparison with the PCS network model running on the ROSS simulator. The simulations of LVT-PCS and PCS models with the same input parameters demonstrate qualitative similarities in LVT evolution. The average speed of the profile in the LVT-PCS model indeed displays the efficiency of the PCS model or the event rate. The PCS model's efficiency approaches zero when the parameter q is close to its critical value q c . The same behavior is observed for the average speed in the LVT-PCS   b) The average width of the profile in the PCS network model as a function of q .  . His current research focused on high-performance algorithms and methods in computational physics, machine learning in physical problems, and software for high-performance computing in physics. He has co-authored more than 150 journal and conference research papers. He is a fellow of the American Physical Society.