A Memory Constrained Approximate Bayesian Inference Approach Using Incremental Construction of Clique Trees

—Bayesian Network (BN) based inference algorithms can be used for estimation of several metrics that are useful in a variety of applications, including circuit design. The complexity of inference using the belief propagation algorithm increases exponentially with clique sizes, necessitating partitioning based on a clique size constraint. Repeated recompilation of the BN to construct the Clique Tree (CT) is expensive. In this paper, we propose a novel algorithm for the incremental construction of the CT model. Nets are added to the CT as long as the clique size constraint is not violated. We show that our algorithm always produces a valid CT that satisﬁes all properties and gives a large runtime improvement over complete recompilation. When the clique size constraint is violated, we simplify the CT using a combination of exact and approximate marginalization. The approximate CT serves as a starting point for the construction of CT for the remaining nets. The algorithm gives us a set of CTs for each partition. We show that the marginals and the joint probability between variables can be computed in a consistent manner. As a result, we get better estimates of correlations between nets. The proposed framework results in a signiﬁcant reduction in error for circuits across three benchmark suites, including some very large circuits in the EPFL suite.


I. INTRODUCTION
Bayesian Network (BN) based algorithms have been used for probabilistic inference in a wide variety of applications. A popular method for inference is the Belief Propagation (BP) algorithm [1]. This method does not directly work with the BN model, but with a derived graphical representation called the clique tree (CT) model (also called join tree or junction tree), obtained using a process known as the compilation of the network. Inference using the BP algorithms is carried out over this CT network [1]. It is known to be #P complete, necessitating partitioning and simplification.
In this paper, our focus is approximate inference when there are clique size limitations due to memory constraints. The Directed Acyclic Graph (DAG) corresponding to digital circuits can be modeled as a BN [2]- [5]. Many digital circuits (even relatively small ones) have long and nested reconvergent loops, which lead to large clique sizes and pose a huge challenge for the BP algorithm. Therefore, in this paper, we discuss our algorithms and inference tasks in the context of digital circuits.
The authors in [6] propose a partitioning technique called the Multiply Sectioned Bayesian Networks (MSBN) for large domains. Exact inference can be carried out only if the network partitions follow a set of constraints, which is not possible in all cases. In practice, since the partitions can be quite large, the inference algorithms are constrained by the memory and computational resources available. A special case of this method is the hierarchical model proposed in [7]. Both methods do not guarantee any bound on the clique sizes within a partition. Simplification of the network based on queries, followed by exact inference is proposed in [1], [8], but it is query dependent and not suitable when the metric of interest is the marginal/joint probabilities. A related technique is the conditioning method [1]. This is similar to the idea of estimating signal probabilities by identifying supergates and conditioning on fanout nodes that re-converge [9].
One way to mitigate the effect of large clique sizes is to use approximate inference techniques like probabilistic logic sampling, evidence pre-propagated importance sampling, adaptive importance sampling, and likelihood weighting. In the context of digital circuits, these have been used for inference in [4], [10]- [12]. Sampling methods have also been used for approximate inference in MSBNs [13]. However, these methods are inflexible in the sense that any small changes in the circuit would need a complete re-simulation. Moreover, the estimates obtained are highly dependent on parameters like the random sample seed, the number of samples, and the sample generation algorithm. Another approximate technique is the Loopy BP method, which involves iterative message passing over cluster graphs with loops. However, in the presence of multiple loops, convergence may become difficult and, even if achieved, does not guarantee correctness [1], [14].
The authors in [15], [16] propose to cascade BN models of partitions, with an approximate BN model between partitions. There are two issues to be addressed while partitioning BNs. The first issue is to ensure accuracy. For accuracy, it is important to model correlations between variables or nets that are required for all subsequent partitions. We will refer to these nets as interface nets. Fig. 1 shows partitioning of a sample BN. Nets d, e are the interface nets that lie at the boundary. The simplest model ( [17]) assumes that all interface nets are independent, which leads to large estimation errors in many circuits. Another approach suggested in ( [15]) is to cascade the BN model of the next partition with a model that approximately captures the JPD of the interface nets via a tree network built using pair-wise correlations of selected nets. As will be seen in the results section, for many circuits, the tree network used in [15] does not approximate the JPD of the interface nets well. Moreover, finding the tree can be timeconsuming when the number of interface nets is large. The second issue is to choose nodes (nets) in a partition such that the clique size constraint is met. One possibility is to iteratively select larger subgraphs of the BN and recompile them until the size constraint is violated. However, this is a time-consuming process that involves several recomputations. In [15], the partition sizes are chosen from a predetermined set depending on heuristic-based upper bounds on the maximum clique size. This is a pessimistic approach that leads to the creation of smaller partitions, thus requiring more approximations. It is also possible to use other partitioning and approximation techniques that have been proposed in the context of signal and switching probability estimation in circuits. The method proposed by Costa et al ( [18]) uses probability polynomials, with a limit on the length of the re-convergent paths. Several local Ordered Binary Decision Diagrams (OBDD) based heuristics have been proposed that limit either the number of input nodes in each partition [19] or the number of topological levels considered for each OBDD [20], [21]. The drawback of these algorithms is that it requires us to decide apriori the number of topological levels used to model correlations. It also does not guarantee a max-clique size within a partition. Besides, BNs allow for a much richer set of approximations than these methods.
To avoid the runtime overhead associated with iterative recompilation of large subgraphs of the BN, it is possible to incrementally modify the clique trees as new nets are added. The method is also attractive since any addition or deletion of gates in parts of the circuit can be easily accommodated. Incremental compilation of the CT model has been explored in some previous works [22], [23]. In [22], incremental addition of links is performed by first forming a cluster graph using a set of rules and then converting the cluster graph into a junction tree. This conversion is performed using heuristicbased graph transformations. A difficulty is to choose a set of heuristics, so that clique size constraints are met. A method to directly obtain the modified junction tree is proposed in [23]. Here, the minimal section of the clique tree that needs modification is identified using the Maximal Prime Subgraph Decomposition (MPSD) [24] of the Bayesian Network. Based on the moralized graph, the junction tree is converted into another graphical representation called the MPD join tree. This approach requires the storage of multiple intermediate representations corresponding to the entire clique tree.

A. Contributions
Our first contribution is a novel algorithm for the incremental construction of the CT model. We show that the algorithm is sound; that is, it always results in a valid CT after the addition of nets. This method provides a significant speedup over repeated recompilation-based partitioning. In contrast to the MPSD based approach, our algorithm works directly with the clique tree graph and hence does not require multiple intermediate representations. It also reduces the computational effort required for re-triangulation. Our algorithm adds as many nets as possible to an existing CT under the input clique size constraints, thus partitioning the circuit. Unlike previous works [6], [15], we do not need to explicitly partition at the circuit level, where it is difficult to estimate clique sizes.
Our second contribution is a method to simplify the existing CT to obtain the interface CT that captures the JPD of the interface nets, using a combination of exact and approximate marginalization. Since we build the CTs incrementally, it is possible to use a CT-based interface model as a starting point for the next partition. Our algorithm results in a set of clique trees corresponding to each partition. We show that marginals are preserved across partitions; that is, the marginal extracted from any partition has the same value. We also show that the JPD between variables in different partitions can be obtained in a consistent manner by conditioning on the interface variables.
The complexity of the model can be tuned to get a tradeoff between accuracy and runtime. The proposed framework allows the flexibility of incorporating incremental addition/deletions to portions of the network. Moreover, it is compatible with existing efforts towards software and hardware acceleration of BP methods [25]- [29].
Our framework supports inference tasks used in BN models. In this paper, we validate the proposed technique by computing two metrics commonly used in circuits. The first is the signal probability which is a key metric used in many design automation algorithms, including testing [9], [30], reliability [31]- [35], and probabilistic design [36]- [39]. We demonstrate significantly better accuracies in nearly all circuits belonging to three combinational benchmark suites, including several very large circuits in the EPFL suite. We show that our technique also provides good estimates of the joint probabilities of gate inputs which are used to get rapid estimates of reliability metrics [32], [40].
The rest of this paper is organized as follows. Section II provides background on BN models and inference techniques. We describe the proposed overall algorithm in Section III, the proposed incremental approach for building clique tree models in Section IV, and a novel clique tree-based interface model in Section V. The results are presented in Section VI. Finally, we present our conclusions.

II. BACKGROUND
We briefly describe the steps involved in converting a gatelevel description of the circuit to a CT model. Fig. 2 shows the steps involved for a sample circuit. First, an undirected model called the moralized graph is formed from the BN by removing edge orientations and adding pairwise edges between parents of each net in the graph. Next, the chordal completion of the moralized graph is obtained by adding edges to break all loops of size greater than three. A fully connected component (or a clique) in a graph is a maximal clique if it is not contained within any other clique. The maximal cliques present in this chordal completion form the nodes of the CT model. Each node C i contains a set of nets denoted as Scope(C i ). The cliques are connected such that the resultant CT satisfies the Running Intersection Property (RIP) which states that if a variable belongs to two cliques C i and C j , it also belongs to the scope of all cliques in the path connecting C i and C j in the CT. For example, net d present in cliques C 1 and C 5 is also present in connecting cliques C 3 , C 4 in Fig. 2. Each edge in the CT is associated with a set of sep-set variables that comprise the intersection of scopes of adjoining cliques. The sep-sets are marked in gray in Fig. 2.
In digital circuits, each net n i is treated as a binary random variable associated with a conditional probability distribution (CPD) P (n i | P ar ni ), where P ar ni are the parents of n i in the BN model. The CPD is determined by the functionality of the gate driving the net. The CPD corresponding to each net in the BN model is assigned to a clique that contains its scope. The joint probability distribution (JPD) of all nets in the network can be written as P (n 1 , n 2 , · · · , n |N | ) = |N | i=1 P (n i | P ar ni ) Exact BP algorithm involves passing messages along the edges of the CT in two passes, an upward pass (from the leaf nodes to the root node) and a downward pass (from the root node to the leaves). The calibrated clique beliefs (β i ) and sep-set beliefs (µ ij ) are obtained after message passing. The clique beliefs can be used to compute marginal probabilities using All joint probabilities can be obtained from the calibrated CT using a dynamic programming approach [1].
If each parameter is stored using the single-precision format, a clique of size N requires 4 * 2 N Bytes to store the clique belief. Thus, both space and time complexity of BP is exponential in the maximum clique size of the CT model. Table I tabulates the max-clique size and the memory required to store the calibrated CT beliefs for various benchmark circuits. The space complexity is very high even for some very small circuits like int2f loat, x1, and c1908. Hence, this method becomes intractable for circuits with large cliques and requires approximations. while Nact = ∅ do 10: Ni ← Nact.pop nets with min. topological level 11: CT F, Na ← ModifyCTF (CT F, Ni, mcsp) 12: IN .add parents of nets in Ni \ Na The inputs to Alg. 1 are a circuit and the associated DAG, G, and the clique size constraints. The circuit could have disjoint DAGs, therefore, we use the term Clique Tree Forest (CTF) to denote the collection of corresponding CTs. We perform some pre-processing steps described further in Sec. VI. The CTF is built incrementally, starting with the primary inputs (line 2). If inputs are independent, CTF contains disjoint cliques. Once inputs are added to the CTF, they are removed from G (line 3). At any point in the construction, nets whose parents are present in the CTF are ready for addition (zero in-degree in G). These nets are called active nets (N act ). Lines 4-15 describe how the CT is built incrementally. Starting with set of active nets that have the lowest topological level, we add nets to the CTF under a maximum clique size constraint mcs p using the function ModifyCTF (described in Alg. 2). Nets that are successfully added to the CTF (N a ) are removed from G. Parents of the deferred nets are added to the set of interface nets (IN ). The list of active nets is updated with fanouts of N a if both parents are present in CTF. The iteration continues until N act is empty. BP is then used to obtain the calibrated CT beliefs (β, µ) (line 18). The marginals are obtained after marginalizing appropriate clique beliefs (line 20). If the set of interface nets is not empty, the CTF for the construction of the next partition is obtained by approximating the CTF of the current partition using the function BuildInterfaceCTF (described in Alg. 3) (line 22). The algorithm ends once all nets have been added. In order to allow for inference of queries other than the marginals, we store the CTF corresponding to each partition in a list (L CT F ) and the set of cliques present in each interface CTF (LC im ).

IV. ALGORITHM FOR INCREMENTAL CONSTRUCTION OF CTF
In this section, we describe the methods used to incrementally construct clique trees. As in all previous works [2], [15], [32], [41], we pre-process the network so that each net has two parent nodes. We first describe the algorithm and then show that the algorithm will always result in a valid CT.

A. Modifying existing CTF to add a single net
Let us first consider the addition of a single net n with parent nodes p 1 and p 2 to a CTF. This will introduce a moralizing edge between p 1 and p 2 and an additional clique C n with scope equal to [p 1 , p 2 , n]. The CPD of the new net P (n|p 1 , p 2 ) forms the factor associated with the new clique C n . The existing CTF now needs to be modified so that this clique can be added while ensuring that the CTF remains valid.

1) Algorithm:
The modifications required depend on the location of the parent nets in the existing CTF. We identify three possible cases and discuss the algorithm used. Case 1: If both p 1 and p 2 are contained in the same clique, C, we connect C n to C as shown in Fig. 3. In case C is contained in C n , C is replaced by C n . Case 2: If p 1 , p 2 belong to cliques C 1 , C 2 in disconnected CTs, C n gets connected to both C 1 and C 2 as shown in Fig. 4. If either C 1 or C 2 are non-maximal cliques, they are replaced by C n . Case 3: p 1 , p 2 belong to different cliques in the same CT. 3. Find cliques in the path that contain nets ∈ S. Such cliques are called retained cliques. In the example, C 3 is a retained clique. 4. Find the modified clique tree ST using the following steps.
For the example, ST is highlighted in teal in Fig. 5d. 4.1 Construct an elimination graph for nets in S as follows.
Corresponding to each path clique C in ST , add a fully connected component between nets in Scope(C) ∩ S and introduce a moralizing edge between the two parent nets. Fig. 5c shows the elimination graph for the example. This is the portion that needs retriangulation. 4.2 Triangulate the elimination graph and obtain the cor- The factors associated with cliques that are not retained are reassigned to cliques in ST containing the entire scope of those cliques. 5. Remove the impacted subgraph ST from CTF. 6. Connect ST to CTF via the set of cliques adjacent to ST in the existing CTF. Cliques in this adjacency set are reconnected to cliques in ST that contain the corresponding sep-set in the input CTF.
2) Soundness of the algorithm: We now argue that if the input CTF is valid, incremental modification using the algorithms described in Sec. IV-A1 results in a valid CTF.
In Cases 1,2, any clique C ⊂ C n is removed from the CTF. Similarly, in Case 3, the CT obtained from the elimination graph has only maximal cliques by construction.
ST is obtained by finding the CT corresponding to the elimination graph and connecting new and the retained cliques to it. be the set S a \ S. The chordal graph corresponding to the existing CTF has a perfect elimination order such that no fill-in edges are introduced on elimination. Even after the addition of the moralizing edge between the parents (p 1 , p 2 ), nets in S c can be eliminated in this order without adding any fill-in edges. Therefore, cliques containing these nets are retained as is in the final CTF. Elimination of nets in S could potentially introduce fill-in edges as they are a part of chordless loops introduced by the moralizing edge. The graph corresponding to the nets in S is precisely the elimination graph, which is re-triangulated in step 4. if (vi ∈ Case1) or (vi ∈ Case2) then 4: Modify CT F as described in Sec. IV-A1 5: Ni.remove(vi) 6: Na.add(vi)  Ns = Choose a subset of Ng for addition 22: Group nets in Ns with overlapping paths and add to Lg Alg. 2 shows the main steps involved in modifying the CTF to add a set of nets under a given maximum clique size constraint. The addition of nets belonging to the first two cases does not change the size of the existing cliques. Therefore, these nets are sequentially added to the CTF (lines 3-6). For nets belonging to Case 3, we identify and store the path cliques on the shortest path connecting the parents (line 7-8). If there is an overlap in the identified paths, sequentially adding the nets can result in large clique sizes. Therefore, the new nets are divided into subsets, each composed of nets with overlapping paths (line 12). This is achieved by breaking the subgraph of CTF over all path cliques into disjoint subtrees. Each subtree corresponds to a subset of new nets. All nets in a subset N g are added together. ST is the subgraph of CTF corresponding to the union of path cliques for nets in N g . The modified subtree ST is obtained using steps 2 -4 in Sec. IV-A1 (lines 14-16). The set S will now contain all sep-set nets present in ST and parents for all nets in N g . If the maximum clique size in ST is less than mcs p , ST is replaced by ST using steps 5 -6 in Sec. IV-A1 (lines [17][18][19]. Otherwise, we choose a smaller subset N s for addition and defer the remaining nets to the next partition. Nets in N s are re-grouped based on overlapping paths, and the new groups are added to the list of subsets (lines [20][21][22]. The algorithm ends when the list becomes empty.
We use a hash table to perform these operations efficiently. As the CTF is constructed, we build a hash table that is indexed by the net and contains all the cliques in which the net is present. As a consequence, many of the steps involve a simple look-up and finding set intersections. These operations are fast as both nets and cliques are represented using integer identifiers. All graph operations were performed using the NetworkX library [42].

V. INTERFACE CTF
Once the maximum clique size constraint is reached, the joint probabilities of the cliques and sep-sets (β, µ) are inferred from the CTF using the BP algorithm. At this point, we need an approximate CTF that serves as the starting point for the next partition. We call this approximate CTF as the interface CTF. To allow for the addition of nets to the next partition, we need to leave a margin. Therefore, we restrict the clique size in the interface CTF to a threshold (mcs im < mcs p ). By definition, the model must contain all interface nets, IN (parents of nets that have not yet been added to the CTF). It must also contain a minimal subset of non-interface nets required to ensure that all CTs in CTF remain valid and the JPD between the interface nets is well approximated.

A. Approach to obtain the interface CTF
Alg. 3 describes the proposed methodology to obtain the interface model. It has four main steps, which we now discuss in detail.
Step 1: Identify the minimal sub-graph connecting interface nets: We first identify the set of cliques that contain one or more interface nets and find the subgraph of CTF that connects these cliques (line 2). Though this subgraph in the present form can be used to infer the exact JPD over the interface nodes, it is not the minimal subgraph. This is because the same set of interface nets could be present in multiple cliques that can be removed from the subgraph without violating the RIP property. Starting from the leaf nodes, we recursively remove cliques that contain the same set (or subset) of interface nets as their neighbor (line 3).
Step 2: Collapse cliques and marginalize non-interface nets: In this step, we attempt to marginalize the non-interface nets present in the interface CTF obtained after Step 1.
First (lines 4-7), the non-interface nets present in a single clique are removed from the clique scope, and the corresponding clique beliefs (β) are marginalized. If the resultant clique is a non-maximal clique, then it is removed from CTF, and its neighbors are connected to the containing clique. In case there are multiple containing cliques, we choose the one with the least degree for connection.
On the other hand, if a non-interface net (n) is present in multiple cliques, we form a new clique by collapsing all cliques containing n (lines [8][9]. The branches of the collapsed cliques are re-connected to the new clique. The belief corresponding to the new clique is formed by finding the joint probability over the collection of the collapsed cliques and then marginalizing n. While this step preserves the JPD, the size of the new clique can be very large. Step 1: Identify the minimal subgraph connecting IN 2: CT F ← Subgraph of CT F connecting IN 3: Prune CT F by recursively removing redundant leaf nodes 4: Step 2: Collapse cliques and marginalize non-interface nets 5: N I ← N ets ∈ CT F \ IN Non-interface nets 6: Marginalize subset of NI present in only a single clique 7: Remove resultant non-maximal cliques, reconnect neighbors 8: N Im ←Subset of N I that can be marginalized within mcsim 9: Collapse cliques and marginalize out N Im from CT F 10: Step 3: Trim large cliques with size > mcsim 11: Lc ← List of cliques with size > mcsim 12: Compute metrics maxMI, MLMI for nets in all cliques in Lc 13: N ← Non-interface nets present in cliques in Lc 14: while Lc = ∅ do CT F has cliques with size > mcsim 15: if N == ∅ then N ←Interface nets present in cliques in Lc 16: n = N.pop net with the least maxMI 17: Lcn ← List of cliques containing n.

19:
Lr = ∅ List of cliques where n is retained 20: if Ls = ∅ then

21:
Lst ← Disjoint subtrees in CT F over Ls
Step 3: Trim large cliques with size > mcs im : Next (lines 10-30), we trim large cliques while ensuring that the CTs remain valid and correlations between signals are preserved as much as possible. We use the Mutual Information (MI) metric to estimate correlations between nets. This metric is a measure of the average distance between the JPD (p XY ) and the product of the marginals (p X · p Y ), defined as below.
While the JPD of signals belonging to the same clique can be obtained easily, computation for signals belonging to different cliques in a CT requires variable elimination, which is both memory-intensive and time-consuming. Therefore, we trim cliques based on M I between nets that belong to the same clique. We define the following metrics to guide our clique reduction technique.

a. Maximum Local Mutual Information, M LM I
For each clique in the CTF, we compute the MI between all nets and the interface nets present in the scope. M LM I of a net n contained in clique C is computed as the maximum MI using: In this step, the interface CTF is simplified by locally marginalizing out nets from individual cliques. Lines 10-29 contain the steps used to trim cliques. We first identify the set of large cliques with size > mcs im (L c ) and compute the metrics maxM I, M LM I for all nets contained in them. Then, we find the set of non-interface nets present in these cliques and add them to a list N . We start with the net n that has the least maxM I. In case of a tie, we prefer the net present in fewer cliques. We find the list of cliques containing n (L cn ) and add cliques with size ≤ mcs im to a list L s . Next, we identify the subgraph of CTF corresponding to cliques in L s and divide it into a set of disjoint subtrees (L st ). Net n is retained in the subtree that contains the clique with largest M LM I (L r ). In case of a tie, we choose the subtree with the largest number of cliques. Net n is locally marginalized out from all other containing cliques and sepsets. Any resultant non-maximal clique is removed, and its neighbors are connected to the containing clique.
If there still exist large cliques after removing non-interface nets, we add the interface nets present in those cliques to N . Nets are locally marginalized until the maximum clique size constraint is met. An interface net must be contained in atleast one clique in the interface CTF. Therefore, if an interface net is locally marginalized from all containing cliques, an independent clique is added to CTF.
Step 4: Re-assignment of clique factors: For each CT in the CTF, a root node is chosen at random. The factor for the root node is the same as the clique belief (lines [32][33][34]. All other nodes are assigned factors by iterating through them in preorder, i.e., from the root node to leaves. An unvisited neighbor C j of a node C i is assigned the conditional probability P (C j |C i ) as the factor, which can be computed as ψ j = βj µij (lines 35-39).

B. Properties of the interface CTF
The following properties are satisfied by the CTF obtained after the simplification steps described in Sec. V-A. Property I: All CTs in the interface CTF are valid. Interface CTF obtained after the first step is valid since it is a subgraph of the existing CTF that connects the cliques containing nets in IN . The CTs in CTF obtained after the subsequent steps are valid because of the following reasons.
• It contains only maximal cliques. Any non-maximal cliques generated while simplification of CTF are removed. (lines 7, 27 in Alg. 3). • It contains disjoint trees.
In steps 2 and 3, neighbors of the collapsed and nonmaximal cliques are reconnected to CTF such that connectivity of the CTs is preserved. . Therefore, the subgraph of CTF over any net is connected.
• In Step 4, the re-assignment of factors is done using the calibrated CTF beliefs such that the product of factor beliefs is a valid JPD. Property II: JPD of nets present within a clique in the interface CTF is the same as that in the partition from which it is derived. The interface CTF is built from calibrated CTs where each clique belief is the JPD of nets present in the clique. After step 2, the entire JPD is preserved because it involves collapsing all containing cliques before marginalization. In Step 3, nets are locally marginalized from individual cliques without collapsing cliques containing the net. Therefore, in this step also, the JPD of nets contained within a clique is preserved.
In contrast, only the marginals are preserved across partitions in the partitioning approaches described in [15], [16].
Note: The JPD of nets present in different cliques in the interface model may not be preserved across partitions. This is because nets that are locally marginalized from a clique are also locally marginalized from the corresponding sep-sets.
Property III: Marginals are preserved across partitions. The marginal probabilities can be computed from any containing partition or interface CTF. This directly follows from Property II.
Property IV: JPD of nets can be approximated both within and across partitions. JPD of nets present in the same CT can be estimated by performing variable elimination on the CT [1]. Approximate JPD of nets a, b belonging to adjacent partitions can be obtained by conditioning on a subset of interface nets (S in ) present within a single clique in the interface CTF via P (a, b) = Sin P (a, S in ) * P (b|S in ). This will always result in a valid JPD since joints within any clique in the interface CTF are consistent in adjacent partitions (by Property II). Similarly, JPD of nets belonging to non-adjacent partitions can be obtained using the chain rule by conditioning on cliques in subsequent interface CTFs.

VI. RESULTS
We evaluated the algorithm by computing two commonly used circuit metrics, namely the signal probability and the joint probability between gate inputs. We used circuits in three combinational benchmark suites, namely, ISCAS'85 [43], LGSynth'91 [44], and the recent EPFL'15 [45] benchmarks. The benchmarks were synthesized using the Cadence Genus v15.2 tool using the Faraday 55-nm technology library. All experiments were carried out on a 3.7-GHz Intel i7-8700 Linux system with 64-GB memory. As a proof-of-concept, the proposed algorithms were implemented in Python3 with the NetworkX library. The error is estimated by comparing the inferred signal probabilities with those obtained after zero delay simulation of 100,000 uniformly distributed random vectors. For all experiments presented in this section, we assume independence among the primary inputs.

A. Pre-processing circuit netlist
As in all previous works [2], [15], [32], [41], gates with fanin greater than two are replaced by equivalent combinations of two-input gates. The signal probability at the output of inverters and buffers is a function of only the input probability. However, chains of inverters introduced while timing optimizations could increase the re-convergence depth. Thus, we remove all such gates from the circuit and connect their fanout nodes to the source node driving the chain. Even though this results in an increased fanout, it can potentially reduce the re-convergence depth. To maintain functional equivalence, we alter the CPD of the fanout gates to take care of inversion.

B. Formation of the clique tree model
The variable elimination algorithm [1] is used for triangulating the elimination/moralized graphs. The elimination order is found using the 'min-fill' metric, and the metric 'minneighbors' is used in case of a tie [1]. Re-computing the number of fill-in edges each time a variable is eliminated increases the execution time. Therefore, we adopt the methodology suggested in [46] to compute only the change in the number of fill-in edges.
To evaluate the performance of the proposed method for incremental clique tree construction, we compare the results of our algorithm with recompilation based partitioning. In each iteration, a set of nodes is added by incrementally modifying the existing CT in our approach. The augmented BN subgraph is recompiled from scratch to find the CT in the second   approach. Nets are added until the clique size constraint is violated. Figure 6 compares the percentage of nets that can be added to the first partition using both approaches as the maxclique size threshold is varied. The results are shown for two sample benchmarks but it holds good for other benchmarks as well. For vda, our method compares very well with complete recompilation. In c6288, there are significant differences for larger thresholds. However, such clique sizes are impractical for use owing to the associated space complexity. For, mcs p of 20-30 the difference in percentage of nets added is reasonable. Table II compares the runtime required and the percentage of nets added to the first partition using both approaches with mcs p = 20. We show the results for a subset of benchmarks; the results for others are similar. Using the incremental approach results in an average of 89% reduction in runtime with an average 4% reduction in the percentage of nets that can be added to the partition while satisfying the max-clique size constraint. Clearly, we pay a very small penalty in the size of the partition for a substantial decrease in the runtime.

C. Impact of clique size constraints
As discussed in Sec. III, the proposed framework uses parameters mcs p and mcs im as maximum clique size constraints for the partition and the interface CTF, respectively. Table III compares the required runtime and the maximum estimation error in signal probabilities obtained using the proposed approach for various values of mcs p and mcs im .
Owing to memory constraints, we limit our experiments to clique sizes of 25. We have shown results for a subset of benchmarks. The results for others are similar. If mcs p is large, more nets are included in each partition, reducing the error due to approximation. On the other hand, the runtime increases since the number of computations required for calibrating the CT is exponential in the clique sizes. A larger mcs im reduces the number of approximations required to obtain the interface model from the clique tree corresponding to the previous partition. However, it does not leave room for the addition of many nets to the next partition, leading to more partitions. If mcs im is very small, the estimation error is large. As a compromise, we choose mcs im to be 5 less than mcs p . Based on runtime considerations, we choose mcs p = 20 for further experiments.

D. Comparison with existing interface models
We now compare the performance of the proposed interface model with two existing models. Hard Partitioning (HP) ( [16]) is the simplest interface approximation. It assumes that all interface nets are independent. In the second model (referred to as ApproxCBN in [15]), the JPD of the interface nets is modeled via a Tree Distribution (TD) obtained using the pairwise MI between interface nets that have common parents, children, or grandparents. All other interface nets are assumed to be independent.
In [15], partition sizes are quite small due to limitations in computational resources then. For a fair comparison with the proposed model, we use Alg. 1 and modify the BuildInterfaceCTF function to obtain the HP and TD interface models. The first partition is the same in all three cases. Since the interface model is different, the subsequent partitions will be different. Signal probabilities can be inferred exactly for circuits like parity, decod, adder, alu_ctrl etc., where the maximum clique size is less than 20. In our results, we have only included circuits where partitioning was required. Table IV compares the number of partitions, the error statistics, and the runtime required for signal probability estimation using the different interface models. Overall, we an achieve average reduction of 46%, 48%, 61% in RMSE, maximum error, and percentage of nets with error > 5% respectively, when compared with the TD method. Looking at the results in more detail, we have divided the circuits into two categories.
• Category I: For all circuits in this category, the proposed method results in a substantial reduction in maximum error and percentage of nets with large errors compared to both HP and TD. In some of the benchmarks like c1908, c7552, bar (128-bit barrel shifter), max (128-bit), voter, and int2f loat converter, there is a dramatic drop in error. Significant reduction in the maximum error is also seen for some very large arithmetic EPFL circuits like divisor (64-bit) and multiplier (64-bit). There is a reduction of error in log2, but the maximum error continues to be large. • Category II: In terms of error, all methods give similar results. However, we get a reduction in the percentage of nets with large errors and a significant improvement in runtime compared to TD. There are a few circuits (sqrt, mem_ctrl) where the maximum error is still large.
The first six circuits have short reconvergent paths, which are typically taken care of by all methods. Other methods like the Correlation Coefficient Method (CCM) [41], the Weighted Averaging Algorithm (WAA) [47] also work well for these circuits. OBDD based methods [20], [21] can also easily handle these circuits. On the other hand, circuit mem_ctrl has long and nested reconvergent paths. Most of the short paths are taken care of, as evidenced by the low RMSE, but there is not much reduction in the maximum error.
As expected, the runtime for the proposed model is larger when compared to HP, and the number of partitions is higher than both TD and HP models. The runtime is lower than TD for nearly all circuits. Even though it is implemented in Python, the runtime is of the order of a few seconds for most circuits. The maximum runtime is for circuits log2 and mem_ctrl but it is substantially lower than that required for TD. This is because, in the proposed model, we find corre- lations only between variable pairs that belong to a common clique. On the other hand, TD requires variable elimination to compute joint probabilities for structurally related variable pairs. The runtime increases with the number of variable pairs for which correlations need to be computed. An important application of signal probability estimation is reliability analysis and the analysis of probabilistic circuits. Signal correlation coefficients are often used to get rapid estimates of reliability [32], [40]. These methods basically require the JPD of gate inputs. Table V shows the errors obtained for ISCAS'85 benchmarks. For most circuits, the error is small. While the maximum error is 0.1 in the circuit c3540, the RMSE is relatively low, indicating that the error is small for most input pairs. Therefore, the proposed methodology can be used in conjunction with existing techniques to obtain accurate reliability estimates.

VII. CONCLUSIONS
We propose an algorithm for incremental construction of the CT model under clique size constraints. We show that the resultant CT is valid. This approach gives a significant runtime improvement over complete recompilation-based partitioning. We also propose a method to find the interface CT such that the marginals are preserved, and the beliefs within cliques are consistent across partitions. Thus, a valid JPD can be extracted, which is needed for inference of evidence-based queries. The input max-clique size constraints can be altered to trade-off accuracy and runtime. The proposed framework provides a significant improvement in the accuracy of signal probabilities. It also gives good estimates of joint probabilities of gate inputs which are used to get rapid estimates of reliability metrics.
Some circuits still pose a challenge in terms of the maximum error achieved. Constructing separate BNs for different outputs or a cluster of outputs can help achieve better estimates. Other possibilities that can help reduce the space complexity are compression of clique beliefs and use of approximate message passing algorithms.