Harmonic analysis of Boolean networks: determinative power and perturbations

Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the network is needed. We argue that the mutual information (MI) between a given subset of the inputs X={X1,...,Xn} of some node i and its associated function fi(X) quantifies the determinative power of this set of inputs over node i. We compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct relation between MI and sensitivity to perturbations. As an application of our results, we analyze the large-scale regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be tolerant to perturbations of its inputs.


Introduction
A Boolean network (BN) is a discrete dynamical system, which is, for example, used to study and model a variety of biochemical networks such as genetic regulatory networks. BNs have been introduced in the late 1960s by Kauffman [1,2] who proposed to study random BNs as models of gene regulatory networks. Kauffman investigated their dynamical behavior and a phenomena called self-organization. Aside from its original purpose, BNs were also used to model (small-scale) genetic regulatory networks; for example, in [3][4][5], it was demonstrated that BNs are capable of reproducing the underlying biological processes (i.e., the cell cycle) well. BNs are also used to model large-scale networks, such as the Escherichia coli regulatory network [6] which is analyzed in Section 6. This network is, in contrast to Kauffman's automata and the regulatory networks considered in [3][4][5], not an autonomous system, since the gene's states are determined by external factors.
In the literature addressing the analysis of BNs, it is common to consider measures that quantify the effect of perturbations. Whether a random BN operates in the so called ordered or disordered regime is determined by whether a single perturbation, i.e., flipping the state of a node, is expected to spread or die out eventually. Kauffman [2] argues that biological networks must operate at the border of the ordered and disordered regime; hence, they must be tolerant to perturbations to some extent.
In contrast to measures of perturbations, determinative power in BNs has not received much attention, even though there are several settings where such a notion is of interest. For example, given a feed forward network where the states of the nodes are controlled by the states of nodes in the input layer, we might ask whether a possibly small set of inputs suffices to determine most states, i.e., reduces the uncertainty about the network's states significantly. This can be addressed by quantifying the determinative power of the input nodes. For example, in the E. coli regulatory network, it turns out that a small set of metabolites and other inputs determine most genes that account for E. coli's metabolism (see Section 6).
In this paper, we view the state of each node in the network as an independent random variable. This modeling http://bsb.eurasipjournals.com/content/2013/1/6 assumption applies for networks with a tree-like topology, e.g., a feed forward network, and is often applied when studying the effect of perturbations. For this setting, determinative power of nodes and perturbation-related measures are properties of single functions; hence, the analysis of the BN reduces to the analysis of single functions. Our main tool for the theoretical results is Fourier analysis of Boolean functions. Fourier analytic techniques were first applied to BNs by Kesseli et al. [7,8]. In [7,8], results related to Derrida plots and convergence of trajectories in random BNs were derived. Ribeiro et al. [9] considered the pairwise mutual information in time series of random BNs, under a different setup that we use. Specifically, in [9], the functions are random; whereas here, the functions are deterministic, but the argument is random. Finally, note that part of this paper was presented at the 2012 International Workshop on Computational Systems Biology [10].

Contributions
Mutual information between a set of inputs to a node and the state of this node is a measure of the determinative power of this set of inputs, as mutual information quantifies mutual dependence of random variables. In order to understand the determinative power and mutual dependencies in Boolean networks, we systematically study the mutual information of sets of inputs and the state of a node. We relate mutual information to a measure of perturbations and prove that (maybe surprisingly) a set of inputs that is highly sensitive to perturbations might not necessarily have determinative power. Conversely, a set of inputs which has determinative power must be sensitive to perturbations. To prove those results, we show that the concentration of weight in the Fourier domain on certain sets of inputs characterizes a function in terms of tolerance to perturbations and determinative power of input nodes. Furthermore, we generalize a result by Xiao and Massey [11], which gives a necessary and sufficient condition of statistical independence of a set of inputs and a function's output in terms of the Fourier coefficients. This result can for instance be applied to decide for which classes of functions the algorithm presented in [12], which detects functional dependencies based on estimating mutual information, can succeed or fails. For unate functions, we show that any input and the function's output are statistically dependent and provide a direct relation between the mutual information and the influence of a variable. The class of unate functions is especially relevant for biological networks, as it includes all linear threshold functions and all nested canalizing functions, and describes functional dependencies in gene regulatory networks well [13]. As an application of the theoretical results in this paper, we show that mutual information can be used to identify the determinative nodes in the large-scale model of the control network of E. coli's metabolism [6].

Outline
The paper is organized as follows. Boolean networks and Fourier analysis of Boolean functions are reviewed in Section 2. In Section 3, the influence and average sensitivity as measures of perturbations are reviewed, and their relation to the Fourier spectrum is discussed. In Section 4, we study the mutual information of sets of inputs and the function's output. Section 5 is devoted to unate functions. Section 6 contains an analysis of the large-scale E. coli regulatory network, using the tools and ideas developed in previous sections.

Preliminaries
We start with a short introduction to Boolean networks and Fourier analysis of Boolean functions, and introduce notation.

Boolean networks
A (synchronous) BN can be viewed as a collection of n nodes with memory. The state of a node i is described by a binary state Choosing the alphabet to be {−1, +1} rather than {0, 1} as more common in the literature on BNs will turn out to be advantageous later. However, both choices are equivalent. The state of the network at time t can be described by the vector x(t) = [ x 1 (t), ..., x n (t)] ∈ {−1, +1} n . The network dynamic is defined by where f i : {−1, +1} n → {−1, +1} is the Boolean function associated with node i. At time t = 0, an initial state x(0) = x 0 is chosen. In general, not all arguments x 1 , ..., x n of a function f i (x) need to be relevant. The variable x j , j ∈ {1, ..., n} is said to be relevant for f i if there exists at least one x ∈ {−1, +1} n , such that changing x j to −x j changes the function's value. In most of the BN models in biology, the functions depend on a small subset of their arguments only. Furthermore, not every state must have a function associated with it; states can also be external inputs to the network.
To study the determinative power and tolerance to perturbations, a probabilistic setup is needed. In our analysis, we assume that each state is an independent random variable The assumption of independence holds for networks with tree-like topology, but is not feasible for networks with strong local dependencies and feedback loops. However, in many relevant settings, a BN has a tree-like topology, for instance the E. coli network analyzed in Section 6. For a network with few local dependencies, assuming independence will lead to a small modeling error. Major http://bsb.eurasipjournals.com/content/2013/1/6 results concerning the analysis of BNs have been obtained under the assumptions as stated above, e.g., the annealed approximation [14], an important result on the spread of perturbations in random BNs. Several important results on random BNs, e.g., [14], let the network size n tend to infinity; hence, there are no local dependencies.

Notation
We use [ n] for the set {1, 2, ..., n}, and all sets are subsets of [ n]. With S⊆A (·), we mean the sum over all sets S that are subsets of A. Throughout this paper, we use capital letters for random variables, e.g., X, and lower case letters for their realizations, e.g., x. Boldface letters denote vectors, e.g., X is a random vector, and x its realization. For a vector x and a set A ⊆[ n], x A denotes the subvector of x corresponding to the entries indexed by A.

Fourier analysis of Boolean functions
In the following, we give a short introduction to Fourier analysis of Boolean functions. Let X = (X 1 , ..., X n ) be a binary, product distributed random vector, i.e., the entries of X are independent random variables Throughout this paper, probabilities P[·] and expectations E[ ·] are with respect to the distribution of X. We denote p i P[X i = 1], the variance of X i by Var (X i ), its standard deviation by σ i √ Var (X i ) and finally μ i E[ X i ]. The inner product of f , g : {−1, +1} n → { − 1, +1} with respect to the distribution of X is defined as which induces the norm f = f , f . An orthonormal basis with respect to the distribution of X is This basis was first proposed by Bahadur [15]. Thus, each Boolean function f : {−1, +1} n → {−1, +1} can be uniquely expressed as wheref (S) f , S are the Fourier coefficients of f. Note that (3) is a representation of f as a multilinear polynomial. As an example, consider the AND2 function defined as f AND2 (x) = 1 if and only if x 1 = x 2 = 1, and let p 1 = p 2 = 1/2. According to (3) f AND (x) = − 1 2 As a second example consider PARITY2, i.e., the XOR function, defined as for all other choices of x. Written as a polynomial, f PARITY2 (x) = x 1 x 2 . We conclude this section by listing properties of the basis functions which are used frequently throughout this paper.
and S ⊂ A, and denoteS = A \ S. Then, Parseval's identity:

Influence and average sensitivity
Next, we discuss measures of perturbations and their relation to the Fourier spectrum. We start with a measure of the perturbation of a single input.

Definition 1 ([16]
). Define the influence of variable i on the function f as where x ⊕ e i is the vector obtained from x by flipping its ith entry.
By definition, the influence of variable i is the probability that perturbing, i.e., flipping, input i changes the function's output. Influence can be viewed as the capability of input i to change the output of f. In BNs, usually, the sum of all influences, i.e., the average sensitivity is studied. http://bsb.eurasipjournals.com/content/2013/1/6 Definition 2. The average sensitivity of f to the variables in the set A is defined as The average sensitivity of f is defined as as(f ) I {1,...,n} (f ).
I A (f ) captures whether flipping an input chosen uniformly at random from A affects the function's output. Most commonly, all inputs are taken into account, i.e., the average sensitivity as(f ) is studied. As an example, as(f PARITY2 ) = 2 and as(f AND2 ) = 1; hence, PARITY2 is more sensitive to single perturbations than AND2. Influence and average sensitivity have the following convenient expressions in terms of Fourier coefficients.

Proposition 2. For any Boolean function f,
Proposition 2 follows directly from Proposition 1 and the definition of I A (f ). From (5), we see that as(f ) is large if the Fourier weight is concentrated on the coefficients of high degree d = |S|, i.e., if S : |S|≥df (S) 2 is large (i.e., close to one). For this case, Parseval's identity implies that thef (S) 2 with |S| < d must be small. Let's see an example: Suppose p 1 = p 2 = p 3 = 1/2 and consider the AND3 function, i.e., f AND3 (x 1 , x 2 , x 3 ) = 1 if and only if x 1 = x 2 = x 3 = 1. f AND3 is tolerant to perturbations since as(f AND3 ) = 0.75, and as Figure 1 shows, its spectrum is concentrated on the coefficients of low degree. In contrast Hence, PARITY3 is maximally sensitive to perturbations. Figure 1 shows that its spectrum is maximally concentrated on the coefficient of highest degree.
According to (5) as(f ) is small only if the Fourier weight is concentrated on the coefficients of low degree. This is the case either if f is strongly biased (i.e., if f (x) = a, for most inputs x, where a ∈ {−1, 1} is a constant) or if f depends on few variables only. This is in accordance with the results of Kauffman [1]; he found that a random BN operates in the ordered regime if the functions in the network depend on average on few variables.
We will state our result for measures of single perturbations. However, these results also apply to other noise models, specifically to the noise sensitivity of f. That is, because the noise sensitivity of f is small if f is tolerant to single perturbations. The noise sensitivity of a Boolean function is defined as the probability that the function's output changes if each input is flipped independently with probability . For uniformly distributed X, as(f ) is an upper bound for the noise sensitivity; for small values of , as(f ) approximates the noise sensitivity well. For the X i being equally but possibly nonuniformly distributed and a slightly different noise model, it was found in [18] that as(f ) still upper bounds the noise sensitivity. This result was generalized to product distributed X in [19].

Mutual information and uncertainty
In this section, we study the determinative power of a subset of variables X A , where X A consists of the entries of X corresponding to the indices in the set A ⊆ [ n], over the function's output f (X). As a measure of determinative power, we take the mutual information MI(f (X); X A ) between f (X) and X A , since MI(f (X); X A ) quantifies the statistical dependence between the random variable X A and f (X). Hence, this section is devoted to the study of Before giving a formal definition of mutual information, let us start with an example. Consider the PAR-ITY2 function and let its inputs X 1 , X 2 be uniformly distributed. Intuitively, if X 1 has determinative power, knowledge about X 1 should provide us with information about f PARITY2 (X). Suppose we know the value of X 1 , say Hence, knowledge of X 1 does not help to predict the value of f PARITY2 (X). Therefore, X 1 has no determinative power over f PARITY2 (X). We indeed have MI(f PARITY2 (X); X 1 ) = 0.
We next define mutual information. Mutual information is the reduction of uncertainty of a random variable Y due http://bsb.eurasipjournals.com/content/2013/1/6 to the knowledge of X; therefore, we need to define a measure of uncertainty first, which is entropy. As a reference for the following definitions, see [20].

For a binary random variable
The properties of mutual information are what we intuitively expect from a measure of determinative power: If knowledge of X i reduces the uncertainty of f (X), then X i determines the state of f (X) to some extent, because then, knowledge about the state of X i helps in predicting f (X). Furthermore, we require from a measure of determinative power that not all variables can have large determinative power simultaneously. This is guaranteed for mutual information as which follows from the chain rule of mutual information (as a reference, see [20]) and independence of the , close to 1, we can be sure that X i has determinative power over f (X) since (7) implies that MI(f (X); X j ) for j = i must be small then.

Mutual information and the Fourier spectrum
In order to study determinative power, its relation to measures of perturbations, and statistical dependencies, we start by characterizing the mutual information in terms of Fourier coefficients. Our results are based on the following novel characterization of entropy in terms of Fourier coefficients.
where h(·) is the binary entropy function as defined in ().
Proof. See Appendix 2. For the special case of uniformly distributed X, a proof appears in [21], in the context of designing S-boxes.
Using the definition of mutual information, an immediate corollary of Theorem 1 is the following: Theorem 1 (and Corollary 1) shows that the conditional entropy H(f (X)|X A ) and the mutual information MI(f (X); X A ) are functions of the coefficients {f (S) : S ⊆ A} only. This already hints at a fundamental difference to the average sensitivity, since the average sensitivity depends on the coefficients {f (S) : |S ∩ A| > 0}, according to (5).
We next discuss MI(f (X); X i ) based on (8). First, note that MI(f (X); X i ) has previously been studied under the notion information gain as a measure of 'goodness' for split variables in greedy tree learners [22] and also under the notion of informativeness to quantify voting power [23]. According to (8), the mutual information

as a function off ({i}) and f (∅).
It can be seen that MI(f (X); X i ) = 0, i.e., f (X) and X i are statistically independent if and only iff ({i}) = 0. That can be formalized as follows: MI(f (X); X i ) is convex inf ({i}). This can be proven by taking the second derivative of (8) and observing that it is larger than zero for all pairs of values (f (∅),f ({i})) for which MI(f (X); X i ) is defined. Next, from (8) Proof. See Appendix 3. 2 . It further shows that H(f (X)|X A ) is small if the Fourier weight is concentrated on the variables in the set A, i.e., if S⊆Af (S) 2 is close to one. In contrast, as mentioned previously, for I A (f ), it is relevant whether the Fourier weight is concentrated on the coefficients with high degree.

Relation to measures of perturbation
Mutual information and average sensitivity are related as follows.

Theorem 3.
For any Boolean function f, for any product distributed X, Proof. See Appendix 4.
Note that the term Var f (X) is close to zero. Specifically, for any f (X) we have 0 ≤ Var f (X) < 0.12, and for settings of interest, Var f (X) is very close to zero, as explained in more detail in the following. Theorem 3 shows that if MI(f (X); X A ) if large (i.e., close to one), f must be sensitive to perturbations of the entries of X A . Moreover, if I A (f ) is small (i.e., if f is tolerant to perturbations of the entries of X A ), then MI(f (X); X A ) must be small (i.e., the entries of X A do not have determinative power). For the case that A = [ n], Theorem 3 states that the average sensitivity as(f ) is lower-bounded by MI(f (X); X) minus some small term.
We next discuss the special case that A = {i}. Theorem 3 evaluated for A = {i} yields a lower bound on the influence of a variable in terms of the mutual information of that variable, namely Again, Var f (X) is close to zero for settings of interest, as the following argument explains. Equation (11) will not be evaluated for small Var f (X) ; since then, f (X) is close to a constant function (i.e., close to f (X) = 1 or f (X) = −1), and I i (f ) and MI(f (X); X i ) must both be small (i.e., close to zero) anyway. Hence, (11) is of interest when Var f (X) is large, i.e., close to 1; for this case, the term Var f (X) is small (e.g., for Var f (X) > 0.8, Var f (X) < 0.05 . Observe that, according to (11), if http://bsb.eurasipjournals.com/content/2013/1/6 MI(f (X); X i ) is large, then I i (f ) is also large. That proves the intuitive idea that if an input determines f (X) to some extent, this input must be sensitive to perturbations. Conversely, as mentioned previously, an input i can have large influence and still MI(f (X); X i ) = 0. E.g., for the PARITY2 function, we have I i (f ) = 1 and MI(f (X); X i ) = 0.
Interestingly, the influence also has an information theoretic interpretation. The following theorem generalizes Theorem 1 in [23].

Theorem 4. For any Boolean function f, for any product distributed X,
Proof. See Appendix 5. For uniformly distributed X, a proof appears in [23].
Theorem 4 shows that the influence of a variable is a measure for the uncertainty of the function's output that remains if all variables except variable i are set.

Statistical independence of inputs to a Boolean function
Next, we characterize statistical independence of f (X) and a set of its arguments X A in terms of Fourier coefficients. This result generalizes a theorem derived by Xiao and Massey [11] from uniform to product distributed X. Proof. See Appendix 6. For uniformly distributed X, i.e., P[X i = 1] = 1/2 for all i ∈ [ n], Theorem 5 has been derived by Xiao and Massey [11]. Note that the proof provided here is also conceptually different from the proof for the uniform case in [11], as it does not rely on the Xiao-Massey lemma.

Theorem 5. Let A ⊆[ n] be fixed, f be a Boolean function, and X be product distributed. Then, f (X) and the inputs X
Theorem 5 shows that a function and small sets of its inputs are statistically independent if the spectrum is concentrated on the coefficients of high degree d = |S|. The most prominent example is the parity function of n variables, i.e., f PARITYN (x) = x 1 x 2 ...x n : For uniformly distributed X, each subset of n − 1 or fewer arguments and f PARITYN (X) are statistically independent. Conversely, if a function is concentrated on the coefficients of low degree d = |S|, which is the case for functions that are tolerant to perturbations, then small sets of inputs and the function's output are statistically dependent.
Theorem 5 also has an important implication for algorithms that detect functional dependencies in a BN based on estimating the mutual information from observations of the network's states, such as the algorithm presented in [12]. Theorem 5 characterizes the classes of functions for which such an algorithm may succeed and for which it will fail. Moreover, Theorem 5 shows that in a Boolean model of a genetic regulatory network, a functional dependency between a gene and a regulator cannot be detected based on statistical dependence of a regulator X i and a gene's state f j (X), unless the regulatory functions are restricted to those for which |f ({i})| > 0 holds for each relevant input i.

Unate functions
In this section, we discuss unate, i.e., locally monotone functions.

Definition 6.
A Boolean function f is said to be unate in x i if for each x = (x 1 , ..., x n ) ∈ {−1, +1} n and for some fixed Each linear threshold function and nested canalizing function is unate. Moreover, most, if not all, regulatory interactions in a biological network are considered to be unate. That can be deduced from [13,24], and the basic argument is the following: If an element acts either as a repressor or an activator for some gene, but never as both (which is a reasonable assumption for regulatory interactions [13,24]), then the function determining the gene's state is unate by definition. For unate functions, the following property holds: where a i ∈ {−1, +1} is the parameter in Definition 6.
Proof. Goes along the same lines as the proof for monotone functions in Lemma 4.5 of [17].
Note that conversely, if (12) holds for each x i , i ∈ [ n], f is not necessarily unate. Inserting (12) into (8) yields where the expectation in (13) In a Boolean model of a biological regulatory network, this implies that if the functions in the network are unate, then a regulator and the target gene must be statistically dependent.

E. coli regulatory network
In [6], the authors presented a complex computational model of the E. coli transcriptional regulatory network that controls central parts of the E. coli metabolism. The network consists of 798 nodes and 1160 edges. Of the nodes, 636 represent genes and of the remaining 162 nodes, most (103) are external metabolites. The rest are stimuli, and others are state variables such as internal metabolites. The network has a layered feed-forward structure, i.e., no feedback loops exist. The elements in the first layer can be viewed as the inputs of the system, and the elements in the following seven layers are interacting genes that represent the internal state of the system. Our experiments revealed that all functions are unate; therefore, the properties derived in Section 5 apply. Note that all functions being unate is a special property of the network, since if functions are chosen uniformly at random, it is unlikely to sample a unate function, in particular if the number of inputs n is large.

Determinative nodes in the E. coli network
We first identify the input nodes that have large determinative power (we will define what that means in a network setting shortly) and then show that a small number thereof reduces the uncertainty of the network's state significantly. Specifically, we show that on average, the entropy of the node's states conditioned on a small set of determinative input nodes, is small.
To put this result into perspective, we perform the same experiment for random networks with the same and different topology as the E. coli network. We denote by X = {X 1 , ..., X n }, n = 145 the set of inputs of the feed forward network and assume that the X i are independent and uniformly distributed. The remaining variables are denoted by Y = {Y 1 , ..., Y m }, m = 653 and are a function of the inputs and the network's states, i.e., Y i = f i (X, Y). For our analysis, the distributions of the random variables Y 1 , ..., Y m need to be computed, since some of those variables are arguments to other functions. This can be circumvented by defining a collapsed network, i.e., a network where each state of a node is given as a function of the input nodes only, i.e., Y i = f i (X). The collapsed network is obtained by consecutively inserting functions into each other, until each function only depends on states of nodes in the input layer, i.e., on X. The collapsed network reveals the dependencies of each node on the input variables. Interestingly, in the collapsed network, it is seen that the variables chol_xt>0, salicylate, 2ddglcn_xt>0, mnnh>0, altrh>0, and his-l_xt>0 (here, and in the following, we adopt the names from the original dataset), which appear to be inputs when considering the original E. coli network, turn out to be not. Consider, for example, the node salicylate. The only node dependent on salicylate is mara = (( NOT arca OR NOT fnr) OR oxyr OR salicylate). However, arca = (fnr AND NOT oxyr), and it is easily seen that mara simplifies to mara = 1.
Next, we identify the determinative nodes. As argued in Section 4, MI(f i (X); X j ) is a measure of the determinative power of X j over Y i = f i (X). This motivates the definition of the determinative power of input X j over the states in the network as Note that a small value of D(j) implies that X j alone does not have large determinative power over the network's states, but X j may have large determinative power over the network states in conjunction with other variables. In principle m i=1 MI(f i (X); X j , X k ) can be large for some j, k ∈ [ n], even though D(j) and D(k) are equal to zero. This is, however, not possible in the E. coli network since the functions are unate. Specifically, MI(f i (X); X j , X k ) = 0 implies that x j or x k are relevant variables, and according to Theorem 6, MI(f i (X); X j ) = 0 or MI(f i (X); X k ) = 0. We computed D(j) for each input variable and found that D(j) is large just for some inputs, such as o2_xt (37 bit), leu-l_xt (20.9 bit), glc-d_xt (19.3 bit), and glcn_xt>0 (17 bit), but is small for most other variables. Partly, this can be explained by the out-degree (i.e., the number of outgoing edges of a node) distribution of the input nodes. However, having a large out-degree does not necessarily result in large values of D(j). In fact, in the E. coli network, glc-d_xt, glcn_xt>0, and o2_xt have 99, 93, and 73 outgoing edges, respectively. On the other hand, D(glc-d_xt) = 19.3 bit and D(glcn_xt>0) = 17 bit, whereas D(o2_xt) = 37 bit.
Denote τ as a permutation on [ n], such that D(X τ (1) ) ≥ D(X τ (2) ) ≥ ... ≥ D(X τ (n) ), i.e., τ orders the input nodes in descending order in their determinative power. We next consider H (Y|X τ (1) , ..., X τ (l) ) as a function of l to see whether knowledge of a small set of input nodes http://bsb.eurasipjournals.com/content/2013/1/6 reduces the entropy of the overall network state significantly. H (Y|X τ (1) , ..., X τ (l) ) has an interesting interpretation which arises as a consequence of the so called asymptotic equipartition property [20] (as discussed in greater detail in [25]): Consider a sequence y 1 , ..., y k of k samples of the random variable Y. For > 0 and k sufficiently large, there exists a set A (k) of typical sequences y 1 , ..., y k , such that where |A (k) | denotes the cardinality of the set A (k) . This shows that the sequences obtained as samples of Y are likely to fall in a set of size determined by the uncertainty of Y. Since the output layer consists of 653 nodes, the network's state space has maximal size 2 653 . Since Y is a function of X, H(Y) ≤ H(X) = 145bit, where for the last equality, we assume uniformly distributed inputs. Thus, without knowing the state of any input variable, the network's state is likely to be in a set of size roughly 2 145 . Given the knowledge about the states X τ (1) , ..., X τ (l) , the state of the network is likely to be in a set of size roughly 2 H(Y|X τ(1) ,...,X τ(l) ) . For a large network, however, H(Y|X τ (1) , ..., X τ (l) ) is expensive to compute as by definition: Hence, the number of terms in the sum is exponential in n and |A|. An estimate of (14) can be obtained by sampling uniformly at random over x A and y. Instead, we will consider the following upper bound which is computationally inexpensive to compute: (1) , ..., X τ (l) ).
The bound above follows from the chain rule for entropy [20]. H(Y i |X τ (1) , ..., X τ (l) ) is computationally inexpensive to compute, since Y i depends on few variables only (in the E. coli network, on ≤ 8). For the E. coli network, A(l) is depicted in Figure 3 as a function of l. Figure 3 shows that knowledge of the states of the most determinative nodes reduces the uncertainty about the network's states significantly. In fact, the upper bound A(l) is loose; hence, we even expect H(Y|X τ (1) , ..., X τ (l) ) to lie significantly below A(l). Also, note that when A(l) is small, H(Y i |X τ (1) , ..., X τ (l) ) must be small on average; hence, P Y i = 1|X τ (1) , ..., X τ (l) is close to one or zero on average. To put A(l) for the E. coli network in Figure 3 into perspective, we compute A(l) for random networks. First, we took the E. coli network and exchanged each function with one chosen uniformly at random from the set of all Boolean functions of corresponding degree. We also exchanged each function with one chosen uniformly at random from all unate functions. We performed the same experiment for the original E. coli network for 25 choices of random and random unate functions, respectively. The mean of A(l), along with one standard deviation from the mean (dashed lines), is plotted in Figure 3 for random and random unate functions. It is seen that fewer inputs determine the output of the original E. coli network, compared to its random counterparts. For example, to obtain A(l) = 50, about twice as many inputs need to be known if the functions in the E. coli network are exchanged for functions chosen uniformly at random.
Next, we generated at random feed forward networks with m = 653 outputs and n = 145 inputs, each with out-degree 8, i.e., the average out-degree of the inputs in the collapsed E. coli network. Again, we computed A(l) for 25 choices of random and random unate functions, respectively. The mean and one standard deviation from the mean are depicted in Figure 3. The results show that, http://bsb.eurasipjournals.com/content/2013/1/6 as expected, for a random feed forward network, there seems to be no small set of inputs that determines the outputs.

Tolerance to perturbations
Finally, we discuss the average sensitivity of individual functions in the E. coli network. In Section 3, we found that the average sensitivity is small if the Fourier spectrum is concentrated on the coefficients of low degree. This appears to be the case for functions that are highly biased and for functions that depend on few variables only. Figure 4 shows pairs of values (as(f ), Pr[ f (X) = 1] ) for each function in the E. coli network, again assuming that the X i are independent and uniformly distributed. We can see from Figure 4 that the average sensitivity of all functions is close to the lower bound on the average sensitivity. Note that the functions with high in-degree K (i.e., number of relevant input variables), which could have average sensitivity up to K, also have small average sensitivity, because those functions are highly biased. We, therefore, can conclude that the functions have small average sensitivity either because they depend on few variables only or because they are highly biased. For other input distributions, i.e., other values of p = P[X i = 1] , ∀i ∈ [ n], we obtained the same results.

Conclusion
In a Boolean network, tolerance to perturbations, determinative power, and statistical dependencies between nodes are properties of single functions in a probabilistic setting. Hence, we analyzed single functions with product distributed argument. We used Fourier analysis of Boolean functions to study the mutual information between a function f (X) and a set of its inputs X A , as a measure of determinative power of X A over f (X). We related the mutual information to the Fourier spectrum and proved that the mutual information lower bounds the influence, a measure of perturbation. We also gave necessary and sufficient conditions for statistical independence of f (X) and X A . For the class of unate functions, which are particularly interesting for biological networks, we found that mutual information and influence are directly related (not just via an inequality). We also found that MI(f (X); X i ) > 0 for each relevant input i, which, as an application, implies that in a unate regulatory network, a gene and its regulator must be statistically dependent.
As an application of our results, we analyzed the largescale regulatory network of E. coli. We identified the most determinative input nodes in the network and found that it is sufficient to know only a small subset of those in order to reduce the uncertainty of the overall network state significantly. This, in turn, reduces the size of the state space in which the network is likely to be found significantly. A possible direction for future work is to provide an analysis similar to that of the E. coli regulatory network for other Boolean models of biological networks, and see if similar conclusions as in Section 6 can be reached. One of the main assumptions in our work is the independence among the input variables of the network. It would be interesting to provide methods that can be used beyond this setup. However, deriving such results is challenging because for dependent inputs, the basis functions S (x) do not factorize as in (3), and many results cited and derived in this paper make use of this particular form of the basis functions. In this paper, we focused on generic properties of information-processing networks that may help identify possible principles that underly biological networks. Assessing our findings from a biological perspective would be an interesting next step.

Appendix 1 Lemma 1
For the proof of Theorems 1 and 5, we will need the following lemma: Lemma 1. Let f be a Boolean function, let X be product distributed, and let A ⊆[ n] and some fixed x A ∈ {−1, +1} |A| be given. Then, Proof. Inserting the Fourier expansion of f (X) given by (3) in the left-hand side of (15) and utilizing the linearity of conditional expectation yields For S ⊆ A, Conversely, for S ⊆ A, To see this, assume without loss of generality that S = A ∪ {j} and j / ∈ A. Using the decomposition property of the basis function as given in Section 2.3, which is equal to zero as

Appendix 2 Proof of Theorem 1
First, where (16) follows from an application of Lemma 1. By definition of the conditional entropy, where h(·) is the binary entropy function as defined in (6).
To obtain (17), we used (16). The expectation in (17) is with respect to the distribution of X A . Inserting q(X A ) as given by (16) in (18) concludes the proof.

Appendix 3 Proof of Theorem 2
First, note that with q(·) as defined in (16), we have where (19) follows from the orthogonality of the basis functions.
We start with proving the lower bound in Theorem 2. Applying the lower bound on the binary entropy function h(p) ≥ 4p(1 − p), given in Theorem 1.2 of [26], on (18) yields and the lower bound in Theorem 2 follows using (19).

Appendix 4 Proof of Theorem 3
According to Proposition 2, Next, we rewrite the lower bound on H(f (X)|X A ) given by Theorem 2 as By adding H(f (X)) − H(f (X)) on the right-hand side of (23) and using the definition of mutual information, (23)  with (·) as defined in (10). Finally, Theorem 3 follows by combining (22) and (25).

Appendix 5 Proof of Theorem 4
For notational convenience, let A =[ n] \{i}. By definition of the conditional entropy, where h(·) is the binary entropy function as defined in (6).