Properties of Boolean networks and methods for their tests

Transcriptional regulation networks are often modeled as Boolean networks. We discuss certain properties of Boolean functions (BFs), which are considered as important in such networks, namely, membership to the classes of unate or canalizing functions. Of further interest is the average sensitivity (AS) of functions. In this article, we discuss several algorithms to test the properties of interest. To test canalizing properties of functions, we apply spectral techniques, which can also be used to characterize the AS of functions as well as the influences of variables in unate BFs. Further, we provide and review upper and lower bounds on the AS of unate BFs based on the spectral representation. Finally, we apply these methods to a transcriptional regulation network of Escherichia coli, which controls central parts of the E. coli metabolism. We find that all functions are unate. Also the analysis of the AS of the network reveals an exceptional robustness against transient fluctuations of the binary variables.a


Introduction
Boolean modeling is often used to describe signal transduction and regulatory networks [1][2][3]. Over the last years random Boolean models received much attention to find some generic properties that characterize regulatory networks. In addition to the study of topological features (e.g., [4]), the choice of Boolean functions in such networks is an important question to consider. Many results indicate the importance of functions with a low average sensitivity. For example, it is well known that a low expected average sensitivity is a prerequisite for non-chaotic behavior of random Boolean networks, e.g., [5,6]. Further, so called canalizing functions have been conjectured to be characteristic for biological networks [7]. These functions have a stabilizing effect on the network dynamics [1] and many functions occurring in (non-random) regulative networks are canalizing [7].
In this work we follow a non-random approach to find properties characterizing regulatory networks. Namely, we focus on the properties of Boolean functions in a large *Correspondence: johannes.klotz@uni-ulm.de 1 Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, 89081 Ulm, Germany Full list of author information is available at the end of the article scale Boolean regulatory network model. Our goal is also to provide efficient algorithms to test these properties.
First, we consider the membership of the regulatory functions to certain classes of functions. We first consider unate functions, which are monotone in each of their variables and were shown to be implied by a biochemical model [2].
Next, we present a test using Fourier analysis to test canalizing properties of functions. Canalizing functions are used in signal processing for certain classes of filters [8] and play an important role in random and regulatory Boolean networks, as already mentioned. Interestingly, it has been shown in [9] that a subclass of canalizing functions, namely the nested canalizing functions, is identical to the class of unate-cascade functions, a subclass of the unate functions. The test presented in this work is inspired by [10], where the so-called forcing transform was introduced to test the membership of a function to the class of canalizing functions. Here, we generalize this approach to the Fourier transform, which is a more intuitive and natural approach and furthermore some spectral properties of canalizing functions have already been investigated in [11]. http://bsb.eurasipjournals.com/content/2013/1/1 It is well known that the average sensitivity can be directly obtained from the Fourier spectral coefficients. Further, the Fourier transform turns out to be useful to prove bounds on the average sensitivity. We derive an upper bound for unate functions similar to known results for monotone functions and recall a well-known lower bound on the average sensitivity.
Finally, we apply our tests to a large-scale Boolean model of the transcriptional network of Escherichia coli. We extended the network model of the transcriptional network of E. coli (Covert et al. [3]) by mapping genes to their corresponding fluxes in the flux-balance model presented by [12]. The network has a layered feed-forward structure and shows characteristic topological features, such as a long-tail like out-degree distribution.
Throughout this article we use Fourier analysis to investigate the mentioned properties. In particular we use the concept of restricted functions. Therefore we derived both-way relations between the Fourier coefficients of a Boolean function and its restriction. A very general one-way approach of this relation can be found in [13].
The remainder of this article is organized as follows: In Section 2 we give a short introduction to Boolean functions and networks, discuss some fundamentals of Fourier analysis and investigate the spectra of restricted functions. In Section 3 we discuss certain classes and properties of Boolean functions and show efficient ways to check these properties. We also introduce the average sensitivity and prove an upper bound on it for unate functions. In Section 4 we finally introduce Boolean networks and apply our methods and tests to the regulatory network of E.coli. Some final remarks are given in Section 5.

BFs
A BF f : {−1, 1} n → {−1, 1} maps n-ary binary input tuples to a binary output. In general, not all variables of a function f are relevant. A variable i is called relevant, if there exits at least one argument where the argument x ⊕ e i is obtained from x by changing its i-th entry. In the following, we denote the number of relevant variables by k.
For the sake of simplicity we assume throughout this article, that k = n , i.e., all variables are relevant, but note that the expositions in Section 2.1 are valid in general. The assignment of +1 and −1 chosen to represent the binary in and outputs is somewhat arbitrary. One can interpret the value −1 as "ON" or "TRUE" and +1 as "OFF" or "FALSE".

Fourier analysis
Here we will give a short introduction to the concepts of Fourier analysis so far used in this article. Let us consider x = (x 1 , x 2 , . . . , x N ) as an instance of a product distributed random vector X = (X 1 , X 2 , . . . , X N ) with probability density function Furthermore, let μ i be the expected value of X i , i.e., i be the standard deviation of X i . It can easily be seen that It is well known that any BF f can be expressed by the following sum, called Fourier-expansion [14,15], where [ n] = {1, 2, . . . , n} and For Further, let A ⊂ U andĀ = U \ A, then

Restricted functions
A function is called restricted, if some of the input variables are set to constants, i.e., variables i ∈ K are set to a constant x i = a i . Hence, the number of relevant variables is reduced by |K| . First, we consider the case that only one variable is restricted (K = {i} ). The function obtained in this way is denoted as The following lemma gives a relation between the Fourier coefficients of the original function and its restriction. http://bsb.eurasipjournals.com/content/2013/1/1 Proposition 1. Let the function f (x) be a function in n variables. Consider the restricted function obtained by setting x i = a i , further, letf | x i =a i be denoted asf a i then Proof. Using Equation (4) we can rewrite (7) aŝ By applying (5) and (3) we get Hence, we can combine the two sums in (8) and obtain: where Thus, the sum in Equation (9) can be simplified tô and finallŷ which is the definition of the Fourier coefficients from Equation (4) and concludes the proof.
A closely related property is given by the following proposition. Please note that this result for uniform distributed input variables can also be retrieved using ( [13], Lemma 2.17).

Proposition 2. Let i ∈[ n]
be fixed and denote f | x i =a with f a . For any n-ary BF f, Proof. Starting from the definition we obtain Note that for a = +1 or a = −1 by definition, hence, the proposition follows from Equation (1).
For the general case, that a BF is restricted to more than one input, the following Corollary to Proposition 1 applies:

Classes and properties of functions
In this section, we will present and discuss some classes of BFs, namely unate and canalizing functions. Further, we will discuss properties of functions characterizing their robustness, like for example the AS.

Unate functions
A BF is unate if it is monotone (either increasing or decreasing) in each of its variables, a precise definition will be given below. The class of unate functions is a simple extension of the class of monotone functions defined as follows Now unate functions can be defined as follows.

Definition 2. A BF f is unate, if there exists a vector
a ∈ {−1, 1} n such that the function f (a 1 · x 1 , . . . , a n · x n ) is monotone. http://bsb.eurasipjournals.com/content/2013/1/1 The class of unate functions is closed with respect to restriction, since every restriction of a locally monotone function yields again in a locally monotone function.
To test whether a function is unate or not it is sufficient to use the definition, however, a necessary condition for a function to be unate is given by the following proposition: Proposition 3 (for example [16]). If f is a unate function, then for each relevant variable î

Canalizing functions
A BF is called canalizing, if there exists a canalizing variable x i and a Boolean value a i ∈ {−1, 1} such that the function for all 1} is a constant. If the restricted function, which is obtained by setting x i = 1 − a i , is again canalizing and so on, the function is called nested canalizing.
The following propositions give a relation between the Fourier coefficients and the canalizing property.

Proposition 4. A BF f is canalizing in variable i, if for any constants a
where μ i is the expected value of x i and σ i the corresponding standard derivation.
Proof. Obviously, if a function is canalizing, Using Proposition 1, we get and the proposition follows from Equation (11).
A similar result namely the calculation of the Fourier coefficients of a canalizing BF from the coefficients of the restricted functionsf | x i =a i (U) is addressed in [11]. These results can also be achieved using Proposition 2.
Proposition 4 can easily be extended for nested canalizing functions: Proof. The proof follows from Corollary 1 and Proposition 4.
From Proposition 4 it is clear that the canalizing property can be tested by considering all Fourier coefficients of order one. Using the Fast Walsh Transform [17] this test is as fast as the one presented in [10], however, once we have retrieved the spectra of a function, we can easily compute other properties, such as the AS (see next section).

AS of functions
The AS [18] gives the influence of random disturbance at the input on the output of a BF. This can be interpreted as an indicator for the robustness of this BF and finally for the whole Boolean network.
To define the as we first have to look at the sensitivity s x (f ) of an input argument x ∈ {0, 1} n . It is defined as the number of single bit-flips in x so that the output of the function will change, i.e., s The AS as(f ) is the expected value over all arguments x : It is worth noting that the as depends on the distribution of the input vector. For example, a function having a low AS for the uniform distribution may have a large AS for other distributions. In general, the AS can be as large as the number of relevant variables k, i.e., 0 ≤ as(f ) ≤ k.  Alternatively, the AS can be defined using the notion of influence. The influence I i (f ) of a single input variable i on the functions f is defined as The AS can then be defined as the sum of all influences [19] as The influence I i (f ) for a unate function f is directly related to the corresponding Fourier coefficient: as it was shown for monotone functions in ( [16], Lemma 4.5) and can easily be extended to unate functions. Note that Equation (15) directly gives a proof for Proposition 3.
Hence, for unate functions we can write and from the Cauchy-Schwarz inequality it follows that Together with a lower bound as presented in [19,20] and since 1 −f (∅) 2 = 1 − E[ f ] 2 = Var(f ) we obtain the following proposition.

Proposition 6. Let f be an unate BF with in-degree n, further let σ i be the standard derivation of the i-th input, then the AS of f (as(f )) is bounded by
where Var(f ) denotes the variance of f.
It can be shown that some functions get close to the upper bound. Assuming uniform distribution the upper bound in Equation (18) is smaller than √ n. But it is well known that the AS of the majority function behaves like O( √ n) (see for example [21]).

Application to a regulatory network of E. coli
In the previous sections, we only considered BFs. Now we will focus on BNs. A synchronous BN of N nodes can be described by a graph G = G(V , E) with nodes V ⊆[ N], |V | = N, and edges E ⊆ V × V , and a set of ordered BFs F = (f 1 , f 2 , . . . , f N ), where we also allow a dummy function (see below). Each f i has n i = k i = in-deg(i) relevant variables where in-deg(i) is the in-degree of node i, i.e., the number of edges (j, i) with j ∈ V . In this case a node j is called a controlling node of i. If a node i has in-degree zero, the dummy function is attached and we call it an in-node. Consequently, the number of edges emerging from i is called the out-degree of node i. Usually to each node a binary state variable is assigned, i.e., for node i we assign x i (t) ∈ {−1, +1}. For in-nodes the state can be set by some external process at some time t 0 . The state of all other nodes at time t depend on its BF and the states of all controlling nodes at time instant t − 1.
In this article, we are only considering feed-forward networks, i.e., networks without feedback loops. In such feed-forward BNs, the set of nodes is partitioned in layers L 1 , L 2 , . . . , L l . If a node i is an element of layer L h all controlling nodes are element of layers L m with m < h. The first (highest) layer L 1 consists of the input nodes (in-nodes), while the lowest layer L l consists of the output nodes (out-nodes). In Figure 2 a sample network is depicted.

Structural properties
We applied the tests described in the previous sections to the regulatory network of E. coli [3]. The model provides Boolean formulas that describe how environmental conditions act on gene expression via a transcriptional regulatory network. We extended this network by the mapping of the genes to their corresponding fluxes in the flux-balance model [12]. The network as described in the literature contains functions with irrelevant variables, respectively, redundant edges, which are removed. A list of the affected nodes and the removed edges can be found in the Additional file 1.
The resulting network has a total of N = 3915 nodes and |E| = 4874 edges, where 1, 386 of these nodes are in layer L 1 , i.e., are inputs, hence, 2, 529 nodes have a nondummy function attached. The in-degree and out-degree distributions can be found in Figures 3 and 4. The average in-degree is 1.92724. The out-degree distribution shows a typical long tail behavior [4].
We found that all functions attached to the nodes are unate. Furthermore 2499 functions (98.8% ) are canalizing An overview of the functions, which are not canalizing, can be found in the Additional file 1.

Robustness
To evaluate the robustness of the network we assume in general that the state of nodes can be described by binary random variables. In a first step we assumed that each random variable of each nodes is uniformly distributed. This implies that we consider each node independently, i.e., the topology of the network is ignored. We calculated the AS for all functions in the network. In Figure 5, the resulting AS is plotted versus the bias, which is the probability that the output of the function equals one (a similar analysis Figure 3 In-degree distribution of the investigated network ([3] extended by [12]).

Figure 4 Out-degree distribution of the investigated network ([3] extended by [12]).
appears in [22]). Each color represents a BF with a certain in-degree n. We also included the lower bond and two exemplary upper bounds for n = 5 and n = 8 (Equation (18)). For increasing n the upper bounds will grow, i.e., the bound will move further to the right.
Obviously, functions with a strong bias, i.e., with a high probability to be either −1 or 1, have a low AS. Further it can be seen that the average sensitivities of all functions are very close to the lower bound. The mean value of the AS is 0.918874 . Hence, it can be stated that the AS of this network is rather low. Similar results can be obtained considering the network without the extension as originally defined by Covert et al. [3] and Samal and Jain [23].
In a second step we want to take the topology of the network into account. Therefore, we now assume that only the in-nodes of the network are equally distributed. However, the output of these functions will most certainly not be uniform, i.e., the functions have a bias unequal zero. Since the outputs of these functions serve as inputs of the functions of the next layer, we assume that their input distributions follow the output distribution of the first-layer functions. The output distributions of the second-layer functions serve then as input distributions of the third layer and so on. Obviously this has an impact on the as of the functions.
The results are shown in Figure 6. We did not include any upper bounds in this figure since these now depend on each input distribution (see Proposition 6). It can be seen that the AS is still very close to the lower bound. However, a few functions have a rather large AS, e.g., it can be seen in Figure 6 that two types of functions with indegree K = 2 are very close to their upper bound (which is in this case at as(f ) = 2). These functions have an argument with a sensitivity of 2. Due to the input distribution http://bsb.eurasipjournals.com/content/2013/1/1

Figure 5 AS of functions plotted versus bias of functions (equally distributed inputs).
of these functions this argument has a very large probability (> 98% ) which leads to a very high AS close to 2. Such high AS are normally observed for XOR and related functions. The average value of the AS is 0.908445 , hence the AS of the network further decreases when applying product distributions at the inputs of the functions.

Comparison with random ensembles
The network appears to be more robust against transient errors as for example certain randomly constructed networks. The in-degree distributions of all controlled nodes (in-degree larger zero) is shown in Figure 3. For all nodes with in-degree k we choose a random function out of the set of functions with k relevant variables. For k = 1 this results in E[ as(f )] = 1, for k > 1 we can at least state that E[ as(f )] > k 2 , as it is well known that if we choose randomly from functions, we expect an AS of k 2 . Taking the in-degree distribution into account this implies that the expectation of the AS of all BFs chosen in this way is larger than 1.25.
It is well known that random function ensembles with lower expected AS can be constructed, if functions with a higher bias have higher probability to be chosen [24]. To test if the observed robustness can be explained due to the bias of the functions, we proceeded as follows. Again, a random function is chosen for each node with in-degree k. We determine the frequency distribution for the bias b = P[ f = 1] for all functions of the original network model with a certain in-degree k. The random network is generated by replacing the original functions of the network with functions drawn from an ensemble of functions with the same distribution. For example, if k = 2, roughly 32% of all functions have b = 0.25, while all others have b = 0.75. Hence, with probability 32% we choose a function with b = 0.25, and b = 0.75 otherwise. The data can be found in the Additional file 1. As shown in [25,26], the expectation of the AS is then given by The results obtained are shown in Table 1 sorted according to the in-degree k. For k = 1 and k = 2 the observed mean of as(f ) and the expectation of the random function coincides as only identity functions, respectively, AND or OR functions are chosen. For larger k, the observed mean is always smaller as the expectation of the random function. For some values of k, for example k = 9, both values are close to each other. This is due to the fact that the corresponding functions are highly biased, which means that there are three existing functions with the values for b being 0.00195312, 0.0917969 and 0.994141. In contrast, for k = 11 the mean of the observed values and the http://bsb.eurasipjournals.com/content/2013/1/1

Figure 6 AS of functions plotted versus bias of functions (product distributed inputs).
expectation are far from each other. Indeed out of three functions, there is one function with b = 0.354004 for which, according to Equation (19), the expectation of the AS is 5.03353147.
It should be noted that in random BNs the expectation of the AS is an order parameter [5,6]. That is, if the expectation is less or equal to one many random networks show the so-called ordered behavior. Namely, single transient errors introduced in network nodes (by flipping their state) do not spread through the network with high probability. This ordered behavior is in sharp contrast to the so-called disordered behavior of random networks which is characterized by an expectation of the AS larger one. Indeed, it has been conjectured that biological relevant networks should be ordered (or critical) but not disordered [27]. A further investigation on how canalizing and nested canalizing functions influence the average sensitivity can be found in [7,11].

Impact of mutations on the metabolism
When investigating a regulatory network, the impact of the network on the metabolism is of major interest. Hence, only the stability of nodes in the bottom layer, i.e., Table 1 Fraction of functions with in-degree k, the mean of the AS of all functions with in-degree k, and the expectation of an accordingly chosen random function with same in-degree and same bias distribution (see text and Equation 19) the output of the network, is relevant. In regulatory networks, mutations are a source for errors. We consider two possible types of mutations. First we assume that a part of promoter region of a gene is mutated or deleted. In terms of our network this means that a edge is removed and the corresponding input is set to false (+1 ). The gene may still be transcribed, hence, the node itself remains functional. The second type of mutation is the deletion of a gene or a mutation which leads to disfunctional gene. In this case, the node is constantly set to false. In both cases, the value of one node may change (error). This error is now fed through the out-going edges of this node to other nodes. However, due to the low sensitivity of all functions in the network, the error has no impact on many nodes and, therefore, will in most cases not reach the bottom layer, which is, as mentioned above, the only part of the network, whose stability is crucial. From that point of view it can be stated that these permanent errors behave similar to the transient errors described above and that networks with a low mean AS are robust against such errors.

Summary
It is an important problem to characterize BFs that appear in Boolean models of regulatory networks. This will help to understand the constraints underlying such networks, but can, for example, also help to improve network inference algorithms (see for examples [28,29] for algorithms that utilize the membership to the class of unate functions). In this study, we focused on several properties that have been shown to be of interest in the context of Boolean regulatory networks. Namely, we discussed different classes of BFs such as unate and canalizing functions. Further, sensitivity measures of BFs, like the influence of variables, or the AS are considered. We devised simple algorithms to test these properties. To test canalizing properties of BFs we applied the Fourier representation of BFs where functions are represented as multivariate, multilinear real polynomials. To this end, we introduced two spectral relationships between the socalled restricted BFs and their unrestricted counter part. The Fourier representation is further useful as many interesting properties such as the influence of unate functions or the AS of BFs can easily be characterized in the spectral domain. For example, we show how to obtain theoretical upper bounds on the AS for unate functions using spectral techniques.
As an application of our results, we analyzed an extended [30] regulatory Boolean network model of the central metabolism of E. coli. It turned out that most functions are within the classes of unate functions. Further, the AS of most functions is close to a theoretical lower bound and far from the new upper bound. Especially, functions with large in-degree have low AS even if their so-called bias is close to 0.5 (see Figure 5). We compared our findings to random BNs with similar parameters and find that the investigated networks has an even lower AS. From that we conclude that the whole network is stable, and robust to small changes, e.g., mutations.

Endnote
a Preliminary results of this study have been presented at the 8th International Workshop on Computational Systems Biology (WCSB 2011) and the 3rd International Conference on Bioinformatics and Computational Biology (BICoB 2011).