Learning restricted Boolean network model by time-series data

Restricted Boolean networks are simplified Boolean networks that are required for either negative or positive regulations between genes. Higa et al. (BMC Proc 5:S5, 2011) proposed a three-rule algorithm to infer a restricted Boolean network from time-series data. However, the algorithm suffers from a major drawback, namely, it is very sensitive to noise. In this paper, we systematically analyze the regulatory relationships between genes based on the state switch of the target gene and propose an algorithm with which restricted Boolean networks may be inferred from time-series data. We compare the proposed algorithm with the three-rule algorithm and the best-fit algorithm based on both synthetic networks and a well-studied budding yeast cell cycle network. The performance of the algorithms is evaluated by three distance metrics: the normalized-edge Hamming distance μhame, the normalized Hamming distance of state transition μhamst, and the steady-state distribution distance μssd. Results show that the proposed algorithm outperforms the others according to both μhame and μhamst, whereas its performance according to μssd is intermediate between best-fit and the three-rule algorithms. Thus, our new algorithm is more appropriate for inferring interactions between genes from time-series data.


Introduction
A key goal in systems biology is to characterize the molecular mechanisms governing specific cellular behaviors and processes. This entails selecting a model class for representing the system structure and state dynamics, followed by the application of computational or statistical inference procedures to reveal the model structure from measurement data. The models of gene regulatory networks run the gamut from coarse-grained discrete networks to the detailed description of stochastic differential equations [1]. They provide a uniform way to study biological phenomena (e.g., cell cycle) and diseases (e.g., cancer) and ultimately lead to systems-based therapeutic strategies [2].
Boolean networks, and the more general class of probabilistic Boolean networks, are one of the most popular approaches for modeling gene networks. The inference of gene networks from high-throughput genomic data is an ill-posed problem. There exists more than one model that can explain the data. The search space for potential regulator sets and their corresponding Boolean functions generally increases exponentially with the number of genes in the network and the number of regulatory genes. It is particularly challenging in the face of small sample sizes, because the number of genes typically is much greater than the number of observations. Thus, estimates of modeling errors, which themselves are determined from the measurement data, can be highly variable and untrustworthy. Many inference algorithms have been proposed to elucidate the regulatory relationships between genes. Mutual information (MI) is an information-theoretic approach that can capture the nonlinear dependence between random variables. REVEAL is the first information-based algorithm to infer the regulatory relationships between genes [3]. However, a small MI does not necessarily mean that no regulatory relationship exists between genes (false negative). Conversely, a large MI does not necessarily mean a real regulatory relationship. 'False-positive' relationships often result from indirect interactions between two genes. The data processing inequality (DPI) and conditional mutual information (CMI) are two methods used to reduce the problem of false positives [4,5]. Another information-based method is the minimum description length principle (MDL), which achieves a good trade-off between model complexity and fit to the data [6][7][8][9][10]. The coefficient of determination (CoD) selects a set of predictors whose expression levels can be used to better predict the expression of a target gene relative to the best possible prediction in the absence of observations [11,12]. The best-fit extension incorporates inconsistencies generated from measurements or other unknown latent factors by constructing a network that makes as few misclassifications as possible [13,14]. Any prior knowledge about the network structure or dynamics likely improves inference accuracy, especially for small sample sizes. Theoretical considerations and computational studies suggest that gene regulatory networks might operate close to a critical phase transition between ordered and disordered dynamical regimes [15,16]. Liu et al. proposed a method to embed such a criticality assumption into the inference procedure. Such regularization of the sensitivity can both improve the inference and move the inferred networks closer to criticality [17].
A restricted Boolean network is a simplified Boolean model that has been used to study dynamical behavior of the yeast cell cycle [18][19][20][21][22][23][24]. In this model, the regulatory relationship between genes is either upregulation or downregulation. The output of the target gene is mainly dominated by the summation of its input genes. When the input summation is zero, the output state will remain as the current state of the target gene. The inference algorithm mentioned above generally cannot deal with this situation, and thus may not be appropriate to infer such network models. Recently, Higa et al. proposed a 'three-rule algorithm' to construct a restricted Boolean network from time-series data [25]. Their idea is that the consecutive state transitions of the system must be driven by some constraints, which can be induced from the small perturbations between two similar system states (detailed rules are provided in Section 3.1). However, the perturbations in microarry data sometimes may be caused by stochastic biological randomness or measurement process instead of real changes in gene expression level. This makes the three-rule algorithm inevitably lead to some incorrect constraints. In this paper, we propose a systematic method to infer a restricted Boolean network based on the state transitions of the target gene. Results of simulated networks and a modeled yeast cell cycle show that the proposed algorithm is more robust to noise than the three-rule method.
This paper is organized as follows: Background information and definitions are given in Section 2. Section 3 presents a brief introduction to the three rules; after which, we systematically analyze the regulatory relationships between input genes and their target gene and propose an inference algorithm. Section 4 and Section 5 present results for the simulated networks and for the cell cycle model of budding yeast. Concluding remarks are given in Section 6.

Boolean networks
A Boolean network G(V, F) is defined by a set of nodes V = {x 1 , …, x n }, x i ∈ {0, 1} and a set of Boolean functions F = {f 1 , …, f n } and f i : 0; 1 f g k i → 0; 1 f g. Each node x i represents the expression state of gene x i , where x i = 0 means that the gene is off, and x i = 1 means it is on. Each node x i is assigned a Boolean function f i x 1 ; …; x k i ð Þ with k i specific input nodes, which is used to update its value. Under the synchronous updating scheme, all genes are updated simultaneously according to their corresponding update functions. The network's state at time t is represented by a binary vector x(t) = (x 1 (t), …, x n (t)). In the absence of noise, the state of the system at the next time step is The long-run behavior of a deterministic Boolean network (BN) depends on the initial state, and the network will eventually settle down and cycle endlessly through a set of states called an attractor cycle. The set of all initial states that reach a particular attractor cycle forms the basin of attraction (BOA) for the cycle. Following a perturbation, the network in the long run may randomly escape an attractor cycle, be reinitialized, and then begin its transition process anew. For a BN with perturbation probability p, its corresponding Markov chain possesses a steady-state distribution. It has been hypothesized that attractors or steady-state distributions in Boolean formalisms correspond to different cell types of an organism or to cell fates. In other words, the phenotypic traits are encoded in the attractors [1]. There are two ways to define the perturbation probability p. One is that each gene can flip its state according to an i.i.d random perturbation vector γ = (γ 1 , ⋯, γ n ), where γ i ∈ {0, 1}, the ith gene flips if and only γ i = 1, and p = P(γ i = 1) for i = 1, 2, ⋯, n. The other is each state x(t) can transit to any other state with the same probability p. In this situation, at each time step, state x(t) will transit to the next state according to F with probability 1 + p − 2 n * p and other states with probability p. In this paper, we adopt the later definition of the perturbation probability p.

Restricted Boolean networks
Restricted Boolean networks are simplified Boolean networks in which the regulatory relationships between genes obey the following convention: a ij = 1 represents a positive regulation from gene x j to x i (activation); a ij = − 1 represents a negative regulation from gene x j to x i (inhibition); and a ij = 0 means that x j has no effect on x i . The Boolean function f i x 1 ; …; x k i ð Þis defined as [18] x i t þ 1 ð Þ¼ This model is 'restricted' in the sense that functions satisfying formula (2) constitute a subset of the class of all Boolean functions. The number of restricted functions decreases dramatically as the input degree k i increases. For example, there are 12 (< 2 2 2 ¼ 16) restricted functions for k i = 2, and only 60 functions ( << 2 2 3 ¼ 256 ) for k i = 3. The restricted model significantly reduces the model space, which is beneficial for inference, given a limited number of noisy high-throughput data.

Three-rule method
A time-series observation can be treated as a trajectory (or random walk) of the state space of the network used to model a real biological system. The three-rule method proposed by Higa et al. is to induce the constraints between genes from the small difference between two similar states and the difference between their next states [25]. Given an m-point time series S = {S(1), S(2), …, S(m)} of gene expression profiles, where S(t) ∈ {0, 1} n for t = 1, 2, …, m, the three rules are as follows: Rule 1: Let S(t − 1), S(t), and S(t + 1) be three consecutive states. If S(t − 1) and S(t) differ by a single gene x k , then for each gene x i such that x i (t) ≠ x i (t + 1), we have x k directly regulates x i ; that is, a ik ≠ 0.
Rule 2: Only the active genes at time t can possibly regulate genes at time t + 1.
Rule 3: Given two similar states S(t 1 ) and S(t 2 ), the difference between S(t 1 + 1) and S(t 2 + 1) must result from the genes in their predecessors S(t 1 ) and S(t 2 ) that are expressed differently.
Both rules 1 and 3 can also be extended to situations where S(t − 1) and S(t) or S(t 1 ) and S(t 2 ) differ in more than one gene. Cyclically applying these rules to any two states may lead to a group of constraint inequalities between variables a ij . Many available constraint satisfaction problem solvers (CSPs) [26] can be used to solve the possible regulatory relationships of one gene to the target gene. Rules 1 and 3 may give incorrect relationships if applied to noisy data; in other words, they are very sensitive to the noise inherent in data. We demonstrate this by using a small network that contains only four genes (see Figure 1). An arrow represents positive regulation, a line segment with a bar at the end represents negative regulation, and the dotted loop on x 2 indicates that this gene downregulates itself. The time-series data at the right in Figure 1 are extracted from the network in Figure 1. Between S(1) and S (2), only x 2 changes from 1 to 0, and only x 3 flips from 0 to 1 in the successive states S(2) and S(3). We can conclude that x 2 must inhibit x 3 by applying rule 1, which means a 32 = − 1 because turning off x 2 turns on x 3 . If S(2) becomes 1001 owing to noise, then we will also have that gene x 4 inhibiting x 2 , which means a 24 = − 1.

Analysis of regulatory relationships based on constraints
In this section, we study the regulatory relationships based on the constraint inequalities in formula (2) and how the target gene switches from one state to another. The target gene can switch in one of four ways: 0 → 0, 0 → 1, 1 → 0, or 0 → 1. Given an input state, inactive genes have no effect on the target gene, which may help reduce the constraint inequalities of the summation ∑ j a ij x j (t) (1 ≤ j ≤ k i ). Because the null input provides no constraints between a ij , we only need to investigate the non-null input situations.
First, consider the simplest situation where there is only one regulatory gene x j 1 . If gene x j 1 is active and the target gene x i switches from 0 to 1, then gene x j 1 must activate the target gene x i (which means a ij 1 ¼ 1). On the contrary, if the target gene x i switches from 1 to 0, then it must be inhibited by x j 1 (which means a ij 1 ¼ −1). When the target gene x i remains in state 1, we have a ij 1 x j 1 ≥0 (which means a ij 1 ¼ 1). When the target gene x i remains in state 0, we have a ij 1 x j 1 ≤0 (which means a ij 1 ¼ −1 ). We present the four possible regulatory relationships a ij 1 in Table 1.
When there are two regulatory genes x j 1 and x j 2 , we only consider the input states 01, 10, and 11. If only one input gene is active, such as x j 1 x j 2 ¼ 01, then we can directly determine a ij 2 from Table 1, whereas a ij 1 remains totally nondeterminant because it has no effect on the target gene. If both gene x j 1 and gene x j 2 are active, then we need to know whether or not the target gene x i switches its state. First, if x i switches from 1 to 0, then we have . We call these later cases 'semi-determined' because there are two possible combinations of a ij 1 and a ij 2 in each case. In Table 2, we present the 12 possible regulatory relationships of a ij 1 and a ij 2 for two input genes.
Analogously, the regulatory relationships for three input genes are shown in Table 3. There are 10 semi-determined cases, and most of them occur when the target gene x i does not change. Some of the semi-determined cases in Tables 2 and 3 may become determined if some a ij are determined. For example, given a ij 1 þ a ij 2 ≤0 for (3) in Table 2, we can determine a ij 2 ¼ 1 if a ij 1 is determined to be 1. However, a ij 1 still remains semi-determined (either 1 or −1) if a ij 1 is determined to be −1. As the number of regulatory genes increases, the proportion of semi-determined cases increases significantly. We will not extend the above analysis to situations of more than three input genes. In most reference studies, the limit k i ≤ 3 is generally respected to mitigate model complexity, particularly for small sample sizes.
Given a target gene x i and its predictor genes If N −1 ij > N 1 ij , then a ij is likely to be −1; otherwise, it is likely to be 1. The larger the value of d ij , the greater the determination of a ij . In order to reduce the semidetermined cases, we first find the one with the largest determination, say, a ij, , and determine its value by the majority rule. Then, we apply the value of a ij to those inequalities including it to solve other semi-determined a ip (p ≠ j, 1 ≤ p, j ≤ k i ). By repeating this process, we can reduce the number of semi-determined cases and determine the values of other a ip accordingly.

Error analysis
Given a predictor set for gene x i , the basic inconsistency is the discrepancy in the determination of a ij , and we define the error resulting from such an inconsistency by . A second kind of inconsistency arises from the null input. Specifically, the target gene x i cannot flip its state under null input situations. Moreover, if it is negatively self-regulated (self-degradation), it cannot be active when its input genes are null. The number of such inconsistencies defines the error ε null i , which is listed in Table 4 for self-degradation and no self-degradation, respectively. The total error of a predictor set is defined by Generally, a consistent predicator set should have the minimal error and the minimal number of regulatory genes simultaneously.
No, totally undetermined; −1 or 1, semi-determined. Table 1 Regulatory relationships for one input gene

A small example
We now apply the above analysis to infer the predicator set for gene x 3 in Figure 1. Based on Tables 1,2,3,4, the results for all possible one-and two-input genes at each time point are presented in Tables 5,6,7,8, respectively. In those six possible predictor sets, the minimal error is achieved by x 1 and x 2 , which are just the regulatory genes of x 3 .

Inference algorithm
Given a time series S = {S(1), S(2), …, S(m)}, the minimal error predictor sets may not be unique. Each of them can be viewed as fitting the target gene in a different way. We employ the heuristic that if one gene occurs Table 5 Regulatory relationships a 3j for one input x 1 (or x 2 or x 4 ) at each time step No, totally undetermined; −1 or 1, semi-determined.
frequently in those sets, then it is highly probably to be a true regulatory gene. Combining them may give a more reliable prediction and can also help alleviate the constraint of using at most three input genes for a target gene. Given a target gene x i , we propose the following algorithm to infer its regulatory gene set: 1. Calculate the total error of each combination of one, two, or three regulatory gene sets P(x i ). 2. Sort the predictor sets in ascending order of their errors. 3. If a gene appears in the first l sets with a frequency greater than or equal to 50%, then it is selected as a regulatory gene.

Implementation
As mentioned in the introduction, many algorithms have been proposed to infer gene regulatory networks. A recent study shows that the best-fit algorithm appears to give the best results for the recovery of regulatory relationships among REVEAL, BIC, MDL, uMDL, and Best-Fit [27]. In this paper, we compare the performance of the three-rule algorithm, the best-fit algorithm and the proposed algorithm based on both synthetic networks as well as on a well-studied budding yeast cell cycle network. We have implemented the three-rule algorithm and our proposed algorithm based on the PBN Toolbox (http:// code.google.com/p/pbn-matlab-toolbox/), which includes the implementation of best-fit algorithm and the calculation of the steady state distribution and other intervention modules for Boolean networks. Genetic regulatory networks are commonly believed to have sparse connectivity topology. To evaluate the inference algorithms based on simulated time series of network states, we have restricted the random BNs to resemble this property of biological networks. Specifically, we have generated random BNs with a scale-free topology, and each gene has at most five predictors: ¼ max n i¼1 k i ≤5. We uniformly assign each gene 1 to K regulators that upregulate (1) or downregulate (−1) it. The average connectivity of random networks is (1 + K)/2.
In order to compare the performance of the three algorithms with the ground-truth network, we use the following three distances [28,29]: (1) The normalized-edge Hamming distance, where FN and FP represent the number of false-negative and false-positive wires, respectively. P and N represent the total number of positive and negative wires, respectively. Table 9 Average number of true-positive and false-positive connections for three algorithms   The italicized value is solved from the determination a 31 = 1. This Hamming distance reflects the accuracy of the recovered regulatory relationships.
(2) The normalized Hamming distance of state transitions, where f i (•) and f 0 i • ð Þ represent the Boolean function of gene i in the ground-truth network and the inferred network, respectively; x k represents a binary state vector, and ⊕ denotes modulo-2 addition. This Hamming distance indicates the accuracy of the inferred network for predicting the next state of the ground-truth network.
(3)The steady-state distribution distance, where π k and π 0 k are the steady-state distribution of state x k in the ground-truth network and the inferred network, respectively. The steady-state distribution distance reflects the degree of an inferred network approaching the longrun behavior of the ground-truth network.

Simulated results
Owing to the computational complexity and the network state space, which increases exponentially with the number of genes or the network size, all our simulations are based on networks with n = 10 genes. We generate 300 random Boolean networks respectively with maximal input degree K = 3 and K = 5. For each simulated network, we generate about 4 time series so that the total time points add up to 40. Given a specific sample data, the noise is added by flipping the value of each bit with probability 0.05 and 0.10, respectively. The steady-state distribution is calculated by a perturbation parameter p = 0. 0001. For the proposed algorithm, we selected the first l = 10 minimal error predictor sets. For best fit, we selected the minimal error predictor sets from k = 1, 2, 3. In Table 9, we list the average number of true-positive and false-positive connections for K = 3 and K = 5 in different noise intensities.  Figure 2 shows the performance of three algorithms on networks with K = 3 under different noise intensities according to three distance metrics: the normalized-edge Hamming distance μ e ham , the normalized Hamming distance of state transition μ st ham , and the steady-state distribution distance μ ssd . The performance of the three-rule algorithm and the proposed algorithm is very close when there is no noise. However, it differs dramatically in noisy data. Specifically, the performance of the proposed algorithm increases as the sample size increases while that of the three-rule algorithm decreases. The main reason lies in the fact that the proposed algorithm infers the regulatory relation based on the entire time series instead of on a small perturbation between two time points, which makes it more robust against noise than the three-rule algorithm. Given a specific noise intensity η, with more samples, there are more noisy perturbed bits; so, more incorrect connections will be inferred by the three-rule algorithm. Table 9 shows that the number of the false positives of the three-rule algorithm increases more quickly than that of the true positives as the sample size increases. This is the main factor which makes its performance deteriorate even though the sample size increases. Consequently, the three-rule algorithm is very sensitive to noise in the data, and increasing sample size makes no improvement in its performance.
Compared with the best-fit algorithm, the proposed algorithm performs better with respect to μ e ham and μ st ham . In a restricted Boolean network model, the output of states with X j a ij x i t ð Þ ¼ 0 is determined by the current state of the target gene x i . This means that given the same input state, x i may be 1 at one time and be 0 at another time. The best-fit algorithm does not allow such situation, and it will treat such a case in the data as an error. If the target gene x i has three regulators and one downregulates it, then there will be 3 such states out of the 8 possible input states. The influence of such cases on the performance of best-fit algorithm can not be neglected. Additionally, the best-fit algorithm cannot deal with the inconsistency listed in Figure 3. These two factors hurt its performances as compared to the proposed algorithm on μ e ham and μ st ham . Table 9 shows that the number of the true positives of both algorithms is  Figure 3 Comparison of μ e ham , μ st ham , μ ssd for the three algorithms with 0%, 5%, and 10% noises (K = 5).
almost the same, but the number of false positives of the best-fit algorithm is larger than that of the proposed algorithm. Concerning the steady state distribution distance μ ssd , the proposed algorithm performs not so well as the best-fit algorithm. However, their difference decreases as the noise intensity increases. As pointed in [27], the inferred networks with relative more connections can explain the observed data better with respect to steady-state distribution distance μ ssd , even though some are incorrect connections. Because the best-fit algorithm infers more connection than the proposed algorithm (see Table 9), it performs better on μ ssd than the latter. On the other hand, the proposed algorithm is more robust than the best-fit algorithm as it combines those minimal error sets to determine the regulatory gene instead of selecting one. When noise intensity increases, the performance of the best-fit algorithm will drop more quickly than that of the proposed algorithm, which leads to their performance on μ ssd converges. Figure 4 shows the performance of three algorithms on networks with K = 5, which are analogous to the trends observed in Figure 2. The only difference is that the performance of the three algorithms decreases because the networks' complexity makes them hard to infer. In summary, the proposed algorithm performs better than the three-rule algorithm on the three distance metrics in noisy situations, whereas it performs less well than the best-fit algorithm on the steady-state distribution distance. This suggests that it is more feasible to infer the structure of restricted Boolean network model than the three-rule algorithm and best-fit algorithm.

Cell cycle model of budding yeast
The cell cycle is a vital biological process in which one cell grows and divides into two daughter cells. It consists of four phases, G1, S, G2, and M, and is regulated by a highly complex network that is highly conserved among the eukaryotes. From the 800 genes involved in the cell cycle process of budding yeast, Li et al. constructed a network of 11 key regulators: Cln3, MBF, SBF, Cln1, Cdh1, Swi5, Cdc20, Clb5, Sic1, Clb1, and Mcm1 [18]. This restricted Boolean network model (shown in Figure 4A) has an attractor whose biggest basin corresponds to the biological G1 stationary state. The temporal sequence in Table 10 is a pathway from this basin, which follows the biological trajectory of the cell cycle network. We have applied the three algorithms to the above artificial time-series data and show the inferred networks in Figure 4. In the simplified model of the budding yeast cell cycle, there are a total of 34 regulatory relationships (or connections). The three-rule algorithm inferred 10 relationships, all correct (see Figure 4B). The best-fit algorithm inferred 15 correct and 5 incorrect relationships (see Figure 4C). The proposed algorithm inferred 15 correct and 4 incorrect relationships (see Figure 4D). Both best-fit and the proposed algorithms inferred more true regulatory relationships than the three-rule algorithm with some incorrect connections. For studying regulatory relationships, this may be more advantageous because more potential regulatory relationships are made available for biologists to check in the wet lab.
We also ran 100 simulations with 5% and 10% noises for this pathway. Even for the same pathway data, the result of each noisy pathway data differs dramatically. This is not surprising because noise significantly influences the determination of regulatory relations for all algorithms. The performance of the three algorithms on μ e ham , μ st ham , and μ ssd is listed Table 11. The relative performance of the three algorithms for this pathway data is also consistent with the previous simulation results.

Computational issues
When inferring real networks with moderate size, the time complexity of algorithms is a key issue. Almost all   algorithms proposed to date possess exponential complexity. The time complexity of the proposed algorithm and best-fit algorithm is n⋅C k n ⋅m À Á . The most time-consuming process for the three-rule algorithm is to solve the constraint inequalities, and its time complexity is O(n ⋅ c n ⋅ m 2 ) (1 < c < 2). From this point of view, the three-rule algorithm is more time consuming than the other two.
The proposed algorithm is similar in workflow to the best-fit algorithm; however, additional computation time results from three factors: (1) determination of the possible regulatory relationships, (2) determination during error estimation if an output state is correct for a given model according to Equation (2), and (3) combination of the first ten least-error models in the last step.
In practice, however, algorithm complexity is not the limiting factor. As shown in Table 12, for 11, 12, and 13 genes, and for N = 20 and N = 40, the proposed algorithm's computation time is between the best-fit and the threerule algorithms, but the overriding computational issue is computation of the steady-state distribution, which is often required for application. It is for this reason that interest has focused on reducing network complexity [29][30][31].

Conclusion
The model space of Boolean networks is huge and from the point of view of evolution, it is unimaginable for nature to select its operational mechanisms from such a large space. Restricted Boolean networks, as a simplified model, have recently been extensively used to study the dynamical behavior of the yeast cell cycle process. In this paper, we propose a systematic method to infer the restricted Boolean network from time-series data. We compare the performance of the three-rule, best-fit, and the proposed algorithms both on simulated networks and on an artificial model of budding yeast. Results show that our algorithm performs better than the three-rule and best-fit algorithms according to the distance metrics μ e ham and μ st ham , but slightly less well than the best-fit algorithm according to μ ssd . This result indicates that the proposed algorithm may be more appropriate for recovering regulatory relationships between genes under the restricted Boolean network model.
The main advantage of the proposed algorithm is that it is more robust to noise than both the three-rule algorithm and best-fit algorithm. The proposed algorithm infers the regulatory relationships according to the consecutive state transitions of the target gene, instead of the small perturbations between two similar states in the three-rule algorithm. Simulation results show that noise in the data may induce many incorrect constraints by the three-rule algorithm. This hinders its application to noisy samples. Moreover, the proposed algorithm can capture the intrinsic state transition defined in Equation 2, whereas the best-fit algorithm cannot. Hence, because the inference processes of both algorithms try to find the minimal-error predictor set, the proposed algorithm can distinguish error in the data more accurately than the best-fit algorithm. Additionally, combination of the minimal error predictor sets in the proposed algorithm also improves its robustness.
In the Boolean formalism, a single time series (or trajectory) can be treated as a random walk across state space. It is not possible to recover the complex biological system from just one short trajectory by any method. Using heterogeneous data and some a priori knowledge is typically a necessity. A priori knowledge can be incorporated into the proposed algorithm and helps by reducing the search space. For instance, an algorithm might assume a prescribed attractor structure [32]. In our case, if we know that x regulates y, then we only consider those combinations containing x, thereby reducing the search space. Additionally, different methods may focus on different aspects of the inference process. For example, the best-fit algorithm and CoD are mainly concerned with the fitness of the data, whereas MDL-based methods intend to reduce structural risks. Future work will involve combining MDL with the proposed algorithm to reduce the rate of false positives.