Harmonic analysis of Boolean networks: determinative power and perturbations

Heckel, Reinhard; Schober, Steffen; Bossert, Martin

doi:10.1186/1687-4153-2013-6

Research
Open access
Published: 04 May 2013

Harmonic analysis of Boolean networks: determinative power and perturbations

Reinhard Heckel¹,
Steffen Schober² &
Martin Bossert²

EURASIP Journal on Bioinformatics and Systems Biology volume 2013, Article number: 6 (2013) Cite this article

3761 Accesses
9 Citations
13 Altmetric
Metrics details

Abstract

Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the network is needed. We argue that the mutual information (MI) between a given subset of the inputs X={X₁,...,X_n} of some node i and its associated function f_i(X) quantifies the determinative power of this set of inputs over node i. We compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct relation between MI and sensitivity to perturbations. As an application of our results, we analyze the large-scale regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be tolerant to perturbations of its inputs.

1 Introduction

A Boolean network (BN) is a discrete dynamical system, which is, for example, used to study and model a variety of biochemical networks such as genetic regulatory networks. BNs have been introduced in the late 1960s by Kauffman [1, 2] who proposed to study random BNs as models of gene regulatory networks. Kauffman investigated their dynamical behavior and a phenomena called self-organization. Aside from its original purpose, BNs were also used to model (small-scale) genetic regulatory networks; for example, in [3–5], it was demonstrated that BNs are capable of reproducing the underlying biological processes (i.e., the cell cycle) well. BNs are also used to model large-scale networks, such as the Escherichia coli regulatory network [6] which is analyzed in Section 6. This network is, in contrast to Kauffman’s automata and the regulatory networks considered in [3–5], not an autonomous system, since the gene’s states are determined by external factors.

In the literature addressing the analysis of BNs, it is common to consider measures that quantify the effect of perturbations. Whether a random BN operates in the so called ordered or disordered regime is determined by whether a single perturbation, i.e., flipping the state of a node, is expected to spread or die out eventually. Kauffman [2] argues that biological networks must operate at the border of the ordered and disordered regime; hence, they must be tolerant to perturbations to some extent.

In contrast to measures of perturbations, determinative power in BNs has not received much attention, even though there are several settings where such a notion is of interest. For example, given a feed forward network where the states of the nodes are controlled by the states of nodes in the input layer, we might ask whether a possibly small set of inputs suffices to determine most states, i.e., reduces the uncertainty about the network’s states significantly. This can be addressed by quantifying the determinative power of the input nodes. For example, in the E. coli regulatory network, it turns out that a small set of metabolites and other inputs determine most genes that account for E. coli’s metabolism (see Section 6).

In this paper, we view the state of each node in the network as an independent random variable. This modeling assumption applies for networks with a tree-like topology, e.g., a feed forward network, and is often applied when studying the effect of perturbations. For this setting, determinative power of nodes and perturbation-related measures are properties of single functions; hence, the analysis of the BN reduces to the analysis of single functions. Our main tool for the theoretical results is Fourier analysis of Boolean functions. Fourier analytic techniques were first applied to BNs by Kesseli et al. [7, 8]. In [7, 8], results related to Derrida plots and convergence of trajectories in random BNs were derived. Ribeiro et al. [9] considered the pairwise mutual information in time series of random BNs, under a different setup that we use. Specifically, in [9], the functions are random; whereas here, the functions are deterministic, but the argument is random. Finally, note that part of this paper was presented at the 2012 International Workshop on Computational Systems Biology [10].

1.1 Contributions

Mutual information between a set of inputs to a node and the state of this node is a measure of the determinative power of this set of inputs, as mutual information quantifies mutual dependence of random variables. In order to understand the determinative power and mutual dependencies in Boolean networks, we systematically study the mutual information of sets of inputs and the state of a node. We relate mutual information to a measure of perturbations and prove that (maybe surprisingly) a set of inputs that is highly sensitive to perturbations might not necessarily have determinative power. Conversely, a set of inputs which has determinative power must be sensitive to perturbations. To prove those results, we show that the concentration of weight in the Fourier domain on certain sets of inputs characterizes a function in terms of tolerance to perturbations and determinative power of input nodes. Furthermore, we generalize a result by Xiao and Massey [11], which gives a necessary and sufficient condition of statistical independence of a set of inputs and a function’s output in terms of the Fourier coefficients. This result can for instance be applied to decide for which classes of functions the algorithm presented in [12], which detects functional dependencies based on estimating mutual information, can succeed or fails. For unate functions, we show that any input and the function’s output are statistically dependent and provide a direct relation between the mutual information and the influence of a variable. The class of unate functions is especially relevant for biological networks, as it includes all linear threshold functions and all nested canalizing functions, and describes functional dependencies in gene regulatory networks well [13]. As an application of the theoretical results in this paper, we show that mutual information can be used to identify the determinative nodes in the large-scale model of the control network of E. coli’s metabolism [6].

1.2 Outline

The paper is organized as follows. Boolean networks and Fourier analysis of Boolean functions are reviewed in Section 2. In Section 3, the influence and average sensitivity as measures of perturbations are reviewed, and their relation to the Fourier spectrum is discussed. In Section 4, we study the mutual information of sets of inputs and the function’s output. Section 5 is devoted to unate functions. Section 6 contains an analysis of the large-scale E. coli regulatory network, using the tools and ideas developed in previous sections.

2 Preliminaries

We start with a short introduction to Boolean networks and Fourier analysis of Boolean functions, and introduce notation.

2.1 Boolean networks

A (synchronous) BN can be viewed as a collection of n nodes with memory. The state of a node i is described by a binary state x_i(t)∈{−1,+1} at discrete time $t \in ℕ$ . Choosing the alphabet to be {−1,+1} rather than {0,1} as more common in the literature on BNs will turn out to be advantageous later. However, both choices are equivalent. The state of the network at time t can be described by the vector x(t)= [x₁(t),...,x_n(t)]∈{−1,+1}ⁿ. The network dynamic is defined by

x_{i} (t + 1) = f_{i} (x (t)),

(1)

where f_i:{−1,+1}ⁿ→{−1,+1} is the Boolean function associated with node i. At time t=0, an initial state x(0)=x₀ is chosen. In general, not all arguments x₁,...,x_n of a function f_i(X) need to be relevant. The variable x_j,j∈{1,...,n} is said to be relevant for f_i if there exists at least one x∈{−1,+1}ⁿ, such that changing x_j to −x_j changes the function’s value. In most of the BN models in biology, the functions depend on a small subset of their arguments only. Furthermore, not every state must have a function associated with it; states can also be external inputs to the network.

To study the determinative power and tolerance to perturbations, a probabilistic setup is needed. In our analysis, we assume that each state is an independent random variable X_i with distribution P [X_i=x_i], x_i∈{−1,+1}. The assumption of independence holds for networks with tree-like topology, but is not feasible for networks with strong local dependencies and feedback loops. However, in many relevant settings, a BN has a tree-like topology, for instance the E. coli network analyzed in Section 6. For a network with few local dependencies, assuming independence will lead to a small modeling error. Major results concerning the analysis of BNs have been obtained under the assumptions as stated above, e.g., the annealed approximation [14], an important result on the spread of perturbations in random BNs. Several important results on random BNs, e.g., [14], let the network size n tend to infinity; hence, there are no local dependencies.

2.2 Notation

We use [n] for the set {1,2,...,n}, and all sets are subsets of [n]. With $\sum_{S \subseteq A} (\cdot)$ , we mean the sum over all sets S that are subsets of A. Throughout this paper, we use capital letters for random variables, e.g., X, and lower case letters for their realizations, e.g., x. Boldface letters denote vectors, e.g., X is a random vector, and X its realization. For a vector X and a set A⊆[n], x_A denotes the subvector of X corresponding to the entries indexed by A.

2.3 Fourier analysis of Boolean functions

In the following, we give a short introduction to Fourier analysis of Boolean functions. Let X=(X₁,...,X_n) be a binary, product distributed random vector, i.e., the entries of X are independent random variables X_i,i∈ [n] with distribution P [X_i=x_i],x_i∈{−1,+1}. Throughout this paper, probabilities P [·] and expectations $E [\cdot]$ are with respect to the distribution of X. We denote $p_{i} ≜ P [X_{i} = 1]$ , the variance of X_i by Var(X_i), its standard deviation by $σ_{i} ≜ \sqrt{Var (X_{i})}$ and finally $μ_{i} ≜ E [X_{i}]$ . The inner product of $f, g : {- 1, + 1}^{n} \to {- 1, + 1}$ with respect to the distribution of X is defined as

〈 f, g 〉 ≜ E [f (X) g (X)] = \sum_{x \in {- 1, 1}^{n}} P [X = x] f (x) g (x)

(2)

which induces the norm $_{∥ f ∥} = \sqrt{〈 f, f 〉}$ . An orthonormal basis with respect to the distribution of X is

Φ_{S} (x) = \prod_{i \in S} \frac{x_{i} - μ_{i}}{σ_{i}}, S \subseteq [n] ∖ \emptyset

and

Φ_{S} (x) = 1, S = ∅.

This basis was first proposed by Bahadur [15]. Thus, each Boolean function f:{−1,+1}ⁿ→{−1,+1} can be uniquely expressed as

f (x) = \sum_{S \subseteq [n]} \hat{f} (S) Φ_{S} (x),

(3)

where $\hat{f} (S) ≜ 〈 f, Φ_{S} 〉$ are the Fourier coefficients of f. Note that (3) is a representation of f as a multilinear polynomial. As an example, consider the AND2 function defined as f_AND2(x)=1 if and only if x₁=x₂=1, and let p₁=p₂=1/2. According to (3)

f_{AND} (x) = - \frac{1}{2} + \frac{1}{2} x_{1} + \frac{1}{2} x_{2} + \frac{1}{2} x_{1} x_{2} .

As a second example consider PARITY2, i.e., the XOR function, defined as f_PARITY2(x)=1 if x₁=x₂=1 or if x₁=x₂=−1, and f_PARITY2(x)=−1 for all other choices of X. Written as a polynomial, f_PARITY2(x)=x₁x₂. We conclude this section by listing properties of the basis functions which are used frequently throughout this paper.

Decomposition: Let A⊆ [n] and S⊂A, and denote $\bar{S} = A ∖ S$ . Then,

Φ_{A} (x) = Φ_{S} (x) Φ_{\bar{S}} (x) .

Orthonormality: For A,B⊆ [n],

E [Φ_{A} (X) Φ_{B} (X)] = \{\begin{matrix} 1, if A = B \\ 0, otherwise . \end{matrix}

Parseval’s identity: For f:{−1,+1}ⁿ→{−1,+1},

E [f {(X)}^{2}] = {∥ f ∥}^{2} = \sum_{S \subseteq [n]} \hat{f} {(S)}^{2} = 1 .

3 Influence and average sensitivity

Next, we discuss measures of perturbations and their relation to the Fourier spectrum. We start with a measure of the perturbation of a single input.

Definition 1 ([16])

Define the influence of variable i on the function f as

I_{i} (f) = P [f (X) \neq f (X \oplus e_{i})],

where x⊕e_i is the vector obtained from X by flipping its i th entry.

By definition, the influence of variable i is the probability that perturbing, i.e., flipping, input i changes the function’s output. Influence can be viewed as the capability of input i to change the output of f. In BNs, usually, the sum of all influences, i.e., the average sensitivity is studied.

Definition 2

The average sensitivity of f to the variables in the set A is defined as

I_{A} (f) \sum_{i \in A} I_{i} (f) .

The average sensitivity of f is defined as $as (f) ≜ I_{{1, ..., n}} (f)$ .

I_A(f) captures whether flipping an input chosen uniformly at random from A affects the function’s output. Most commonly, all inputs are taken into account, i.e., the average sensitivity as(f) is studied. As an example, as(f_PARITY2)=2 and as(f_AND2)=1; hence, PARITY2 is more sensitive to single perturbations than AND2. Influence and average sensitivity have the following convenient expressions in terms of Fourier coefficients.

Proposition 1 (Lemma 4.1 of [17])

For any Boolean function f,

\begin{align} I_{i} (f) = \frac{1}{σ_{i}^{2}} \sum_{S \subseteq [n] : i \in S} \hat{f} {(S)}^{2} . \end{align}

(4)

Proposition 2

For any Boolean function f,

I_{A} (f) = \sum_{S \subseteq [n]} \hat{f} {(S)}^{2} \sum_{i \in S \cap A} \frac{1}{σ_{i}^{2}} .

(5)

Proposition 2 follows directly from Proposition 1 and the definition of I_A(f). From (5), we see that as(f) is large if the Fourier weight is concentrated on the coefficients of high degree d=|S|, i.e., if $\sum_{S : | S | \geq d} \hat{f} {(S)}^{2}$ is large (i.e., close to one). For this case, Parseval’s identity implies that the $\hat{f} {(S)}^{2}$ with |S|<d must be small. Let’s see an example: Suppose p₁=p₂=p₃=1/2 and consider the AND3 function, i.e., f_AND3(x₁,x₂,x₃)=1 if and only if x₁=x₂=x₃=1. f_AND3 is tolerant to perturbations since as(f_AND3)=0.75, and as Figure 1 shows, its spectrum is concentrated on the coefficients of low degree. In contrast for $f_{PARITY 3} (x_{1}, x_{2}, x_{3}) ≜ x_{1} x_{2} x_{3}$ , as(f_PARITY)=3. Hence, PARITY3 is maximally sensitive to perturbations. Figure 1 shows that its spectrum is maximally concentrated on the coefficient of highest degree.

According to (5) as(f) is small only if the Fourier weight is concentrated on the coefficients of low degree. This is the case either if f is strongly biased (i.e., if f(x)=a, for most inputs X, where a∈{−1,1} is a constant) or if f depends on few variables only. This is in accordance with the results of Kauffman [1]; he found that a random BN operates in the ordered regime if the functions in the network depend on average on few variables.

We will state our result for measures of single perturbations. However, these results also apply to other noise models, specifically to the noise sensitivity of f. That is, because the noise sensitivity of f is small if f is tolerant to single perturbations. The noise sensitivity of a Boolean function is defined as the probability that the function’s output changes if each input is flipped independently with probability Є. For uniformly distributed X, Є as(f) is an upper bound for the noise sensitivity; for small values of Є, Є as(f) approximates the noise sensitivity well. For the X_i being equally but possibly nonuniformly distributed and a slightly different noise model, it was found in [18] that Є as(f) still upper bounds the noise sensitivity. This result was generalized to product distributed X in [19].

4 Mutual information and uncertainty

In this section, we study the determinative power of a subset of variables X_A, where X_A consists of the entries of X corresponding to the indices in the set A⊆ [n], over the function’s output f(X). As a measure of determinative power, we take the mutual information MI(f(X);X_A) between f(X) and X_A, since MI(f(X);X_A) quantifies the statistical dependence between the random variable X_A and f(X). Hence, this section is devoted to the study of MI(f(X);X_A).

Before giving a formal definition of mutual information, let us start with an example. Consider the PARITY2 function and let its inputs X₁,X₂ be uniformly distributed. Intuitively, if X₁ has determinative power, knowledge about X₁ should provide us with information about f_PARITY2(X). Suppose we know the value of X₁, say X₁=1. Since f_PARITY2(x)=x₁x₂, we have with P [X₂=1]=1/2 that P [f_PARITY2(X)=1]=P [f_PARITY2=1|X₁=1]. Hence, knowledge of X₁ does not help to predict the value of f_PARITY2(X). Therefore, X₁ has no determinative power over f_PARITY2(X). We indeed have MI(f_PARITY2(X);X₁)=0.

We next define mutual information. Mutual information is the reduction of uncertainty of a random variable Y due to the knowledge of X; therefore, we need to define a measure of uncertainty first, which is entropy. As a reference for the following definitions, see [20].

Definition 3

The entropy H(X) of a discrete random variable X with alphabet $X$ is defined as

H (X) ≜ - \sum_{x \in X} P [X = x] {log}_{2} P [X = x] .

Definition 4

The conditional entropy H(Y|X) of a pair of discrete and jointly distributed random variables (Y,X) is defined as

H (Y | X) ≜ \sum_{x \in X} P [X = x] H (Y | X = x) .

Definition 5

The mutual information MI(Y;X) is the reduction of uncertainty of the random variable Y due to the knowledge of X

MI (Y; X) ≜ H (Y) - H (Y | X) .

For a binary random variable X with alphabet $X = {x_{1}, x_{2}}$ and $p ≜ P [X = x_{1}]$ , we have H(X)=h(p), where h(p) is the binary entropy function, defined as

h (p) ≜ - p {log}_{2} p - (1 - p) {log}_{2} (1 - p) .

(6)

The properties of mutual information are what we intuitively expect from a measure of determinative power: If knowledge of X_i reduces the uncertainty of f(X), then X_i determines the state of f(X) to some extent, because then, knowledge about the state of X_i helps in predicting f(X). Furthermore, we require from a measure of determinative power that not all variables can have large determinative power simultaneously. This is guaranteed for mutual information as

\sum_{i = 1}^{n} MI (f (X); X_{i}) \leq MI (f (X); X) \leq 1,

(7)

which follows from the chain rule of mutual information (as a reference, see [20]) and independence of the X_i,i∈[n]. Hence, if MI(f(X);X_i) is large, i.e., close to 1, we can be sure that X_i has determinative power over f(X) since (7) implies that MI(f(X);X_j) for j≠i must be small then.

4.1 Mutual information and the Fourier spectrum

In order to study determinative power, its relation to measures of perturbations, and statistical dependencies, we start by characterizing the mutual information in terms of Fourier coefficients. Our results are based on the following novel characterization of entropy in terms of Fourier coefficients.

Theorem 1

Let f be a Boolean function, let X be product distributed, and let X_A={X_i:i∈A} be a fixed set of arguments, where A⊆[n]. Then,

H (f (X) | X_{A} = E [h (\frac{1}{2} (1 + \sum_{S \subseteq A} \hat{f} (S) Φ_{S} (X_{A})))],

where h(·) is the binary entropy function as defined in (6).

Proof

See Appendix 2. For the special case of uniformly distributed X, a proof appears in [21], in the context of designing S-boxes. □

Using the definition of mutual information, an immediate corollary of Theorem 1 is the following:

Corollary 1

Let f be a Boolean function, X be product distributed, and X_A={X_i:i∈A}. Then,

\begin{array}{l} MI (f (X); X_{A}) = h (1 / 2 (1 + \hat{f} (\emptyset)) \\ - E [h (\frac{1}{2} (1 + \sum_{S \subseteq A} \hat{f} (S) Φ_{S} (X_{A})))] . \end{array}

(8)

Theorem 1 (and Corollary 1) shows that the conditional entropy H(f(X)|X_A) and the mutual information MI(f(X);X_A) are functions of the coefficients ${\hat{f} (S) : S \subseteq A}$ only. This already hints at a fundamental difference to the average sensitivity, since the average sensitivity depends on the coefficients ${\hat{f} (S) : | S \cap A | > 0}$ , according to (5).

We next discuss MI(f(X);X_i) based on (8). First, note that MI(f(X);X_i) has previously been studied under the notion information gain as a measure of ‘goodness’ for split variables in greedy tree learners [22] and also under the notion of informativeness to quantify voting power [23]. According to (8), the mutual information MI(f(X);X_i) just depends on $\hat{f} ({i})$ , $\hat{f} (\emptyset)$ , and p_i. In contrast, the influence I_i(f) is a function of the coefficients ${\hat{f} (S) : S \in [n], i \in S}$ , according to (4). In Figure 2, we depict MI(f(X);X_i) for p_i=0.3 as a function of $\hat{f} ({i})$ and $\hat{f} (\emptyset)$ .

It can be seen that MI(f(X);X_i)=0, i.e., f(X) and X_i are statistically independent if and only if $\hat{f} ({i}) = 0$ . That can be formalized as follows: MI(f(X);X_i) is convex in $\hat{f} ({i})$ . This can be proven by taking the second derivative of (8) and observing that it is larger than zero for all pairs of values ( $\hat{f} (\emptyset)$ , $\hat{f} ({i})$ ) for which MI(f(X);X_i) is defined. Next, from (8), we see that MI(f(X);X_i)=0 if $\hat{f} ({i}) = 0$ ; hence, it follows that MI(f(X);X_i)=0 if and only if $\hat{f} ({i}) = 0$ , which proves the following result:

Corollary 2

Let f be a Boolean function, and X be product distributed. X_i and f(X) are statistically independent if and only if $\hat{f} ({i}) = 0$ .

Corollary 2 also follows immediately from a more general result, namely Theorem 5, which is presented later. Recall that for PARITY2, MI(f_PARITY2(X);X₁)=0 and $\hat{f} ({1}) = 0$ ; hence, Corollary 2 comes at no surprise.

From Figure 2, it can be seen that the larger $| \hat{f} ({i}) |$ , the larger MI(f(X);X_i) becomes. Formally, it follows from the convexity of MI(f(X);X_i) and Corollary 2 that MI(f(X);X_i) is increasing in $| \hat{f} ({i}) |$ . Hence, X_i has large determinative power, i.e., MI(f(X);X_i) is large, if and only if $| \hat{f} ({i}) |$ is large (i.e., close to one). $| \hat{f} ({i}) |$ is trivially maximized for the dictatorship function, i.e., for f(x)=x_i, or its negation, i.e., f(x)=−x_i. The output f(X) of the dictatorship function is fully determined by X_i.

Next, let us consider the (trivial) case where A=[n] and hence X_A=X. Then, $MI (f (X); X) = h (1 / 2 (1 + \hat{f} (\emptyset))$ . It follows that MI(f(X);X) is maximized for $\hat{f} (\emptyset) = 0$ , i.e, P [f(X)=1]=1/2, i.e., if the variance of f(X) is 1. In general, the closer to zero $\hat{f} (\emptyset)$ is, the larger the mutual information between a function’s output and all its inputs becomes. Let us finally relate the conditional entropy H(f(X)|X_A) to the concentration of the Fourier weight on the coefficients {S:S⊆A}, A⊆ [n].

Theorem 2

Let f be a Boolean function, let X be product distributed, and let X_A={X_i:i∈A} be a fixed set of arguments, where A⊆[n]. Then,

{(1 - \sum_{S \subseteq A} \hat{f} {(S)}^{2})}^{\frac{1}{ln (4)}} \geq H (f (X) | X_{A}) \geq 1 - \sum_{S \subseteq A} \hat{f} {(S)}^{2} .

Proof

See Appendix 3. □

Theorem 2 shows that H(f(X)|X_A) can be approximated with $1 - \sum_{S \subseteq A} \hat{f} {(S)}^{2}$ . It further shows that H(f(X)|X_A) is small if the Fourier weight is concentrated on the variables in the set A, i.e., if $\sum_{S \subseteq A} \hat{f} {(S)}^{2}$ is close to one. In contrast, as mentioned previously, for I_A(f), it is relevant whether the Fourier weight is concentrated on the coefficients with high degree.

4.2 Relation to measures of perturbation

Mutual information and average sensitivity are related as follows.

Theorem 3

For any Boolean function f, for any product distributed X,

I_{A} (f) \geq \min_{i \in A} (\frac{1}{σ_{i}^{2}}) (MI (f (X); X_{A}) - Ψ (Var (f (X))))

(9)

with

Ψ (x) ≜ {(x)}^{1 / ln (4)} - x.

(10)

Proof

See Appendix 4. □

Note that the term Ψ(Var(f(X))) is close to zero. Specifically, for any f(X) we have 0≤Ψ(Var(f(X)))<0.12, and for settings of interest, Ψ(Var(f(X))) is very close to zero, as explained in more detail in the following. Theorem 3 shows that if MI(f(X);X_A) if large (i.e., close to one), f must be sensitive to perturbations of the entries of X_A. Moreover, if I_A(f) is small (i.e., if f is tolerant to perturbations of the entries of X_A), then MI(f(X);X_A) must be small (i.e., the entries of X_A do not have determinative power). For the case that A= [n], Theorem 3 states that the average sensitivity as(f) is lower-bounded by MI(f(X);X) minus some small term.

We next discuss the special case that A={i}. Theorem 3 evaluated for A={i} yields a lower bound on the influence of a variable in terms of the mutual information of that variable, namely

I_{i} (f) \geq \frac{1}{σ_{i}^{2}} (MI (f (X); X_{i}) - Ψ (Var (f (X)))) .

(11)

Again, Ψ(Var(f(X))) is close to zero for settings of interest, as the following argument explains. Equation (11) will not be evaluated for small Var(f(X)); since then, f(X) is close to a constant function (i.e., close to f(X)=1 or f(X)=−1), and I_i(f) and MI(f(X);X_i) must both be small (i.e., close to zero) anyway. Hence, (11) is of interest when Var(f(X)) is large, i.e., close to 1; for this case, the term Ψ(Var(f(X))) is small (e.g., for Var(f(X))>0.8, Ψ(Var(f(X))<0.05). Observe that, according to (11), if MI(f(X);X_i) is large, then I_i(f) is also large. That proves the intuitive idea that if an input determines f(X) to some extent, this input must be sensitive to perturbations. Conversely, as mentioned previously, an input i can have large influence and still MI(f(X);X_i)=0. E.g., for the PARITY2 function, we have I_i(f)=1 and MI(f(X);X_i)=0.

Interestingly, the influence also has an information theoretic interpretation. The following theorem generalizes Theorem 1 in [23].

Theorem 4

For any Boolean function f, for any product distributed X,

I_{i} (f) = \frac{H (f (X) | X_{[n] ∖ {i}})}{H (X_{i})} .

Proof

See Appendix 5. For uniformly distributed X, a proof appears in [23]. □

Theorem 4 shows that the influence of a variable is a measure for the uncertainty of the function’s output that remains if all variables except variable i are set.

4.3 Statistical independence of inputs to a Boolean function

Next, we characterize statistical independence of f(X) and a set of its arguments X_A in terms of Fourier coefficients. This result generalizes a theorem derived by Xiao and Massey [11] from uniform to product distributed X.

Theorem 5

Let A⊆[n] be fixed, f be a Boolean function, and X be product distributed. Then, f(X) and the inputs X_A={X_i:i∈A} are statistically independent if and only if

\hat{f} (S) = 0 for all S \subseteq A ∖ ∅.

Proof

See Appendix 6. For uniformly distributed X, i.e., P [X_i=1]=1/2 for all i∈ [n], Theorem 5 has been derived by Xiao and Massey [11]. Note that the proof provided here is also conceptually different from the proof for the uniform case in [11], as it does not rely on the Xiao-Massey lemma. □

Theorem 5 shows that a function and small sets of its inputs are statistically independent if the spectrum is concentrated on the coefficients of high degree d=|S|. The most prominent example is the parity function of n variables, i.e., f_PARITYN(x)=x₁x₂...x_n: For uniformly distributed X, each subset of n−1 or fewer arguments and f_PARITYN(X) are statistically independent. Conversely, if a function is concentrated on the coefficients of low degree d=|S|, which is the case for functions that are tolerant to perturbations, then small sets of inputs and the function’s output are statistically dependent.

Theorem 5 also has an important implication for algorithms that detect functional dependencies in a BN based on estimating the mutual information from observations of the network’s states, such as the algorithm presented in [12]. Theorem 5 characterizes the classes of functions for which such an algorithm may succeed and for which it will fail. Moreover, Theorem 5 shows that in a Boolean model of a genetic regulatory network, a functional dependency between a gene and a regulator cannot be detected based on statistical dependence of a regulator X_i and a gene’s state f_j(X), unless the regulatory functions are restricted to those for which $| \hat{f} ({i}) | > 0$ holds for each relevant input i.

5 Unate functions

In this section, we discuss unate, i.e., locally monotone functions.

Definition 6

A Boolean function f is said to be unate in X_i if for each x=(x₁,...,x_n)∈{−1,+1}ⁿ and for some fixed a_i∈{−1,+1}, f(x₁,...,x_i=−a_i,...,x_n)≤f(x₁,...,x_i=a_i,...,x_n) holds. f is said to be unate if f is unate in each variable X_i, i∈ [n].

Each linear threshold function and nested canalizing function is unate. Moreover, most, if not all, regulatory interactions in a biological network are considered to be unate. That can be deduced from [13, 24], and the basic argument is the following: If an element acts either as a repressor or an activator for some gene, but never as both (which is a reasonable assumption for regulatory interactions[13, 24]), then the function determining the gene’s state is unate by definition. For unate functions, the following property holds:

Proposition 3

Let f:{−1,+1}ⁿ→{−1,+1} be unate. Then,

\hat{f} ({i}) = a_{i} σ_{i} I_{i} (f), \forall i \in [n],

(12)

where a_i∈{−1,+1} is the parameter in Definition 6.

Proof

Goes along the same lines as the proof for monotone functions in Lemma 4.5 of [17]. □

Note that conversely, if (12) holds for each X_i,i∈ [n], f is not necessarily unate. Inserting (12) into (8) yields

\begin{array}{l} MI (f (X); X_{i}) = h (\frac{1}{2} (1 + \hat{f} (\emptyset))) \\ - E [h (\frac{1}{2} (1 + \hat{f} (\emptyset) + a_{i} σ_{i} I_{i} (f) \frac{X_{i} - μ_{i}}{σ_{i}}))], \end{array}

(13)

where the expectation in (13) is over X_i. Based on (13), the discussion from Section 4.1 on MI(f;X_i) applies by using $\hat{f} ({i})$ and a_iσ_iI_i(f) synonymously. Hence, for unate functions, the mutual information MI(f;X_i) is increasing in the influence |I_i(f)|. Moreover, if f is unate, and X_i is a relevant variable, i.e., a variable on which the functions actually depend on, then $| \hat{f} ({i}) | > 0$ . From this fact and the same arguments as given in Section 4.1 follows:

Theorem 6

Let f:{−1,+1}ⁿ→{−1,+1} be unate. If and only if X_i is a relevant variable, then MI(f(X);X_i)≠0.

In a Boolean model of a biological regulatory network, this implies that if the functions in the network are unate, then a regulator and the target gene must be statistically dependent.

6 E. coli regulatory network

In [6], the authors presented a complex computational model of the E. coli transcriptional regulatory network that controls central parts of the E. coli metabolism. The network consists of 798 nodes and 1160 edges. Of the nodes, 636 represent genes and of the remaining 162 nodes, most (103) are external metabolites. The rest are stimuli, and others are state variables such as internal metabolites. The network has a layered feed-forward structure, i.e., no feedback loops exist. The elements in the first layer can be viewed as the inputs of the system, and the elements in the following seven layers are interacting genes that represent the internal state of the system. Our experiments revealed that all functions are unate; therefore, the properties derived in Section 5 apply. Note that all functions being unate is a special property of the network, since if functions are chosen uniformly at random, it is unlikely to sample a unate function, in particular if the number of inputs n is large.

6.1 Determinative nodes in the E. coli network

We first identify the input nodes that have large determinative power (we will define what that means in a network setting shortly) and then show that a small number thereof reduces the uncertainty of the network’s state significantly. Specifically, we show that on average, the entropy of the node’s states conditioned on a small set of determinative input nodes, is small.

To put this result into perspective, we perform the same experiment for random networks with the same and different topology as the E. coli network. We denote by X={X₁,...,X_n},n=145 the set of inputs of the feed forward network and assume that the X_i are independent and uniformly distributed. The remaining variables are denoted by Y={Y₁,...,Y_m},m=653 and are a function of the inputs and the network’s states, i.e., Y_i=f i′(X,Y). For our analysis, the distributions of the random variables Y₁,...,Y_m need to be computed, since some of those variables are arguments to other functions. This can be circumvented by defining a collapsed network, i.e., a network where each state of a node is given as a function of the input nodes only, i.e., Y_i=f_i(X). The collapsed network is obtained by consecutively inserting functions into each other, until each function only depends on states of nodes in the input layer, i.e., on X. The collapsed network reveals the dependencies of each node on the input variables. Interestingly, in the collapsed network, it is seen that the variables chol_xt >0, salicylate, 2ddglcn_xt >0, mnnh >0, altrh >0, and his-l_xt >0 (here, and in the following, we adopt the names from the original dataset), which appear to be inputs when considering the original E. coli network, turn out to be not. Consider, for example, the node salicylate. The only node dependent on salicylate is mara = ((NOT arca OR NOT fnr) OR oxyr OR salicylate). However, arca = (fnr AND NOT oxyr), and it is easily seen that mara simplifies to mara = 1.

Next, we identify the determinative nodes. As argued in Section 4, MI(f_i(X);X_j) is a measure of the determinative power of x_j over Y_i=f_i(X). This motivates the definition of the determinative power of input x_j over the states in the network as

D (j) ≜ \sum_{i = 1}^{m} MI (f_{i} (X); X_{j}) .

Note that a small value of D(j) implies that x_j alone does not have large determinative power over the network’s states, but x_j may have large determinative power over the network states in conjunction with other variables. In principle $\sum_{i = 1}^{m} MI (f_{i} (X); X_{j}, X_{k})$ can be large for some j,k∈ [n], even though D(j) and D(k) are equal to zero. This is, however, not possible in the E. coli network since the functions are unate. Specifically, MI(f_i(X);X_j,X_k)≠0 implies that x_j or x_k are relevant variables, and according to Theorem 6, MI(f_i(X);X_j)≠0 or MI(f_i(X);X_k)≠0. We computed D(j) for each input variable and found that D(j) is large just for some inputs, such as o2_xt (37 bit), leu-l_xt (20.9 bit), glc-d_xt (19.3 bit), and glcn_xt >0 (17 bit), but is small for most other variables. Partly, this can be explained by the out-degree (i.e., the number of outgoing edges of a node) distribution of the input nodes. However, having a large out-degree does not necessarily result in large values of D(j). In fact, in the E. coli network, glc-d_xt, glcn_xt >0, and o2_xt have 99, 93, and 73 outgoing edges, respectively. On the other hand, D(glc-d_xt) = 19.3 bit and D(glcn_xt >0) = 17 bit, whereas D(o2_xt) = 37 bit.

Denote τ as a permutation on [n], such that D(X_τ(1))≥D(X_τ(2))≥...≥D(X_τ(n)), i.e., τ orders the input nodes in descending order in their determinative power. We next consider H(Y|X_τ(1),...,X_τ(l)) as a function of l to see whether knowledge of a small set of input nodes reduces the entropy of the overall network state significantly. H(Y|X_τ(1),...,X_τ(l)) has an interesting interpretation which arises as a consequence of the so called asymptotic equipartition property [20] (as discussed in greater detail in [25]): Consider a sequence y₁,...,y_k of k samples of the random variable Y. For Є>0 and k sufficiently large, there exists a set $A_{Є}^{(k)}$ of typical sequences y₁,...,y_k, such that

| A_{Є}^{(k)} | \leq 2^{k (H (Y) + Є)}

and

P [Y \in A_{Є}^{(k)}] > 1 - Є,

where $| A_{Є}^{(k)} |$ denotes the cardinality of the set $A_{Є}^{(k)}$ . This shows that the sequences obtained as samples of Y are likely to fall in a set of size determined by the uncertainty of Y. Since the output layer consists of 653 nodes, the network’s state space has maximal size 2⁶⁵³. Since Y is a function of X, H(Y)≤H(X)=145bit, where for the last equality, we assume uniformly distributed inputs. Thus, without knowing the state of any input variable, the network’s state is likely to be in a set of size roughly 2¹⁴⁵. Given the knowledge about the states X_τ(1),...,X_τ(l), the state of the network is likely to be in a set of size roughly $2^{H (Y | X_{τ (1)}, ..., X_{τ (l)})}$ . For a large network, however, H(Y|X_τ(1),...,X_τ(l)) is expensive to compute as by definition:

\begin{array}{l} H (Y | X_{A}) = \sum_{x_{A}} P [X_{A} = x_{A}] \\ \times \sum_{y} P [Y = y | X_{A} = x_{A}] {log}_{2} P [Y = y | X_{A} = x_{A}] . \end{array}

(14)

Hence, the number of terms in the sum is exponential in n and |A|. An estimate of (14) can be obtained by sampling uniformly at random over X_A and Y. Instead, we will consider the following upper bound which is computationally inexpensive to compute:

H (Y | X_{τ (1)}, ..., X_{τ (l)}) \leq A (l)

with

A (l) ≜ \sum_{i = 1}^{m} H (Y_{i} | X_{τ (1)}, ..., X_{τ (l)}) .

The bound above follows from the chain rule for entropy [20]. H(Y_i|X_τ(1),...,X_τ(l)) is computationally inexpensive to compute, since Y_i depends on few variables only (in the E. coli network, on ≤8). For the E. coli network, A(l) is depicted in Figure 3 as a function of l. Figure 3 shows that knowledge of the states of the most determinative nodes reduces the uncertainty about the network’s states significantly. In fact, the upper bound A(l) is loose; hence, we even expect H(Y|X_τ(1),...,X_τ(l)) to lie significantly below A(l). Also, note that when A(l) is small, H(Y_i|X_τ(1),...,X_τ(l)) must be small on average; hence, P [Y_i=1|X_τ(1),...,X_τ(l)] is close to one or zero on average.

To put A(l) for the E. coli network in Figure 3 into perspective, we compute A(l) for random networks. First, we took the E. coli network and exchanged each function with one chosen uniformly at random from the set of all Boolean functions of corresponding degree. We also exchanged each function with one chosen uniformly at random from all unate functions. We performed the same experiment for the original E. coli network for 25 choices of random and random unate functions, respectively. The mean of A(l), along with one standard deviation from the mean (dashed lines), is plotted in Figure 3 for random and random unate functions. It is seen that fewer inputs determine the output of the original E. coli network, compared to its random counterparts. For example, to obtain A(l)=50, about twice as many inputs need to be known if the functions in the E. coli network are exchanged for functions chosen uniformly at random. Next, we generated at random feed forward networks with m=653 outputs and n=145 inputs, each with out-degree 8, i.e., the average out-degree of the inputs in the collapsed E. coli network. Again, we computed A(l) for 25 choices of random and random unate functions, respectively. The mean and one standard deviation from the mean are depicted in Figure 3. The results show that, as expected, for a random feed forward network, there seems to be no small set of inputs that determines the outputs.

6.2 Tolerance to perturbations

Finally, we discuss the average sensitivity of individual functions in the E. coli network. In Section 3, we found that the average sensitivity is small if the Fourier spectrum is concentrated on the coefficients of low degree. This appears to be the case for functions that are highly biased and for functions that depend on few variables only. Figure 4 shows pairs of values (as(f),Pr[f(X)=1]) for each function in the E. coli network, again assuming that the X_i are independent and uniformly distributed. We can see from Figure 4 that the average sensitivity of all functions is close to the lower bound on the average sensitivity. Note that the functions with high in-degree K (i.e., number of relevant input variables), which could have average sensitivity up to K, also have small average sensitivity, because those functions are highly biased. We, therefore, can conclude that the functions have small average sensitivity either because they depend on few variables only or because they are highly biased. For other input distributions, i.e., other values of p=P [X_i=1],∀i∈ [n], we obtained the same results.

7 Conclusion

In a Boolean network, tolerance to perturbations, determinative power, and statistical dependencies between nodes are properties of single functions in a probabilistic setting. Hence, we analyzed single functions with product distributed argument. We used Fourier analysis of Boolean functions to study the mutual information between a function f(X) and a set of its inputs X_A, as a measure of determinative power of X_A over f(X). We related the mutual information to the Fourier spectrum and proved that the mutual information lower bounds the influence, a measure of perturbation. We also gave necessary and sufficient conditions for statistical independence of f(X) and X_A. For the class of unate functions, which are particularly interesting for biological networks, we found that mutual information and influence are directly related (not just via an inequality). We also found that MI(f(X);X_i)>0 for each relevant input i, which, as an application, implies that in a unate regulatory network, a gene and its regulator must be statistically dependent. As an application of our results, we analyzed the large-scale regulatory network of E. coli. We identified the most determinative input nodes in the network and found that it is sufficient to know only a small subset of those in order to reduce the uncertainty of the overall network state significantly. This, in turn, reduces the size of the state space in which the network is likely to be found significantly.

A possible direction for future work is to provide an analysis similar to that of the E. coli regulatory network for other Boolean models of biological networks, and see if similar conclusions as in Section 6 can be reached. One of the main assumptions in our work is the independence among the input variables of the network. It would be interesting to provide methods that can be used beyond this setup. However, deriving such results is challenging because for dependent inputs, the basis functions Φ_S(x) do not factorize as in (3), and many results cited and derived in this paper make use of this particular form of the basis functions. In this paper, we focused on generic properties of information-processing networks that may help identify possible principles that underly biological networks. Assessing our findings from a biological perspective would be an interesting next step.

Appendices

Appendix 1

Lemma 1

For the proof of Theorems 1 and 5, we will need the following lemma:

Lemma 1

Let f be a Boolean function, let X be product distributed, and let A⊆[n] and some fixed $x_{A} \in {- 1, + 1}^{| A |}$ be given. Then,

E [f (X) | X_{A} = x_{A}] = \sum_{S \subseteq A} \hat{f} (S) Φ_{S} (x_{A}) .

(15)

Proof

Inserting the Fourier expansion of f(X) given by (3) in the left-hand side of (15) and utilizing the linearity of conditional expectation yields

E [f (X) | X_{A} = x_{A}] = \sum_{S \subseteq [n]} \hat{f} (S) E [Φ_{S} (X) | X_{A} = x_{A}] .

For S⊆A,

E [Φ_{S} (X) | X_{A} = x_{A}] = Φ_{S} (x_{A}) .

Conversely, for S⫅̸A,

E [Φ_{S} (X) | X_{A} = x_{A}] = 0 .

To see this, assume without loss of generality that S=A∪{j} and j∉A. Using the decomposition property of the basis function as given in Section 2.3,

\begin{align} E [Φ_{S} (X) | X_{A} = x_{A}] & = E [\prod_{i \in S} Φ_{{i}} (X) | X_{A} = x_{A}] \\ = \prod_{i \in S} E [Φ_{{i}} (X) | X_{A} = x_{A}] \end{align}

which is equal to zero as

E [Φ_{{j}} (X) | X_{A} = x_{A}] = E [Φ_{{j}} (X)] = 0 .

□

Appendix 2

Proof of Theorem 1

First,

\begin{align} P [f (X) = 1 | X_{A} = x_{A}] = \frac{1}{2} (1 + E [f (X) | X_{A} = x_{A}]) \\ = \underset{q (x_{A})}{\underset{}{\frac{1}{2} (1 + \sum_{S \subseteq A} \hat{f} (S) Φ_{S} (x_{A}))}}, \end{align}

(16)

where (16) follows from an application of Lemma 1. By definition of the conditional entropy,

\begin{align} H (f (X) | X_{A}) & = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] H (f (X) | X_{A} = x_{A}) \\ = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] h (P [f (X) = 1 | X_{A} = x_{A}]) \\ = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] h (q (x_{A})) \end{align}

(17)

\begin{align} = E [h (q (X_{A}))], \end{align}

(18)

where h(·) is the binary entropy function as defined in (6). To obtain (17), we used (16). The expectation in (17) is with respect to the distribution of x_A. Inserting q(X_A) as given by (16) in (18) concludes the proof.

Appendix 3

Proof of Theorem 2

First, note that with q(·) as defined in (16), we have

\begin{align} E [4 q (X_{A}) (1 - q (X_{A}))] & = E [1 - {(\sum_{S \subseteq A} \hat{f} (S) Φ_{S} (X_{A}))}^{2}] \\ = \sum_{S \subseteq A} \sum_{U \subseteq A} \hat{f} (S) \hat{f} (U) E [Φ_{S} (X_{A}) Φ_{U} (X_{A})] \\ = 1 - \sum_{S \subseteq A} \hat{f} {(S)}^{2}, \end{align}

(19)

where (19) follows from the orthogonality of the basis functions.

We start with proving the lower bound in Theorem 2. Applying the lower bound on the binary entropy function h(p)≥4p(1−p), given in Theorem 1.2 of [26], on (18) yields

H (f (X) | X_{A}) = E [h (q (X_{A}))] \geq E [4 q (X_{A}) (1 - q (X_{A}))],

and the lower bound in Theorem 2 follows using (19).

Next, we prove the upper bound in Theorem 2. Applying the upper bound on the binary entropy function h(p)≤(p(1−p))^{1/ ln(4)}, given in Theorem 1.2 of [26], on (18) yields

\begin{align} H (f (X) | X_{A}) & = E [h (q (X_{A})] \\ \leq E [{(\underset{Y}{\underset{}{4 q (X_{A}) (1 - q (X_{A}))}})}^{1 / ln (4)}] . \end{align}

(20)

The term Y in (20) is a random variable, and the function (Y)^{1/ ln(4)} is concave in Y. An application of Jensen’s inequality (see e.g. [20]) yields $E [{(Y)}^{1 / ln (4)}] \leq {(E [Y])}^{1 / ln (4)}$ ; hence, the right-hand side of (20) can be lower as

H (f (X) | X_{A}) \leq {(E [4 q (X_{A}) (1 - q (X_{A}))])}^{1 / ln (4)} .

(21)

Finally, the upper bound in Theorem 2 follows from combining (21) and (19).

Appendix 4

Proof of Theorem 3

According to Proposition 2,

\begin{align} I_{A} (f) & = \sum_{S \subseteq [n]} \hat{f} {(S)}^{2} \sum_{i \in S \cap A} \frac{1}{σ_{i}^{2}} \\ \geq \sum_{S \subseteq [n] ∖ \emptyset} \hat{f} {(S)}^{2} | S \cap A | \min_{i \in A} (\frac{1}{σ_{i}^{2}}) \\ \geq \min_{i \in A} (\frac{1}{σ_{i}^{2}}) \sum_{S \subseteq A ∖ \emptyset} \hat{f} {(S)}^{2} . \end{align}

(22)

Next, we rewrite the lower bound on H(f(X)|X_A) given by Theorem 2 as

\sum_{S \subseteq A ∖ \emptyset} \hat{f} {(S)}^{2} \geq 1 - \hat{f} {(\emptyset)}^{2} - H (f (X) | X_{A}) .

(23)

By adding H(f(X))−H(f(X)) on the right-hand side of (23) and using the definition of mutual information, (23) becomes

\sum_{S \subseteq A ∖ \emptyset} \hat{f} {(S)}^{2} \geq MI (f (X); X_{A}) - H (f (X)) + 1 - \hat{f} {(\emptyset)}^{2} .

(24)

With $Var (f (X)) = 1 - \hat{f} {(\emptyset)}^{2}$ and by using the inequality H(f(X))≤(Var(f(X)))^{1/ ln(4)}, given in Theorem 1.2 of [26], (24) becomes

\sum_{S \subseteq A ∖ \emptyset} \hat{f} {(S)}^{2} \geq MI (f (X); X_{A}) - Ψ (Var (f (X))),

(25)

with Ψ(·) as defined in (10). Finally, Theorem 3 follows by combining (22) and (25).

Appendix 5

Proof of Theorem 4

For notational convenience, let A=[n]∖{i}. By definition of the conditional entropy,

\begin{align} H (f (X) | X_{A}) & = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] H (f (X) | X_{A} = x_{A}) \\ = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] h (P [f (X) = 1 | X_{A} = x_{A}]), \end{align}

(26)

where h(·) is the binary entropy function as defined in (6). Observe that

h (P [f (X) = 1 | X_{A} = x_{A}]) = h (P [X_{i} = 1])

if

\begin{align} f (X_{1} = x_{1}, ..., X_{i} = 1, ..., X_{n} = x_{n}) \\ \neq f (X_{1} = x_{1}, ..., X_{i} = - 1, ..., X_{n} = x_{n}) \end{align}

and

h (P [f (X) = 1 | X_{A} = x_{A}]) = 0

otherwise. Hence, (26) becomes

\begin{align} H (f (X) | X_{A}) = \sum_{x_{A} \in {- 1, 1}^{| A |}} P [X_{A} = x_{A}] h (p_{i}) 1_{{f (X) \neq f (X \oplus e_{i})}}, \end{align}

where x⊕e_i is the vector obtained from X by flipping its i th entry, and Theorem 4 follows by using the definition of the influence.

Appendix 6

Proof of Theorem 5

By definition, f(X) and x_A are statistically independent if and only if for all x_A∈{−1,+1}^|A|

P [f (X) = 1 | X_{A} = x_{A}] = P [f (X) = 1] .

(27)

With

P [f (X) = 1 | X_{A} = x_{A}] = \frac{1}{2} + \frac{1}{2} E [f (X) | X_{A} = x_{A}]

and application of Lemma 1 given in Appendix 1 ł::bel APP:1, (27) becomes

\begin{align} \sum_{S \subseteq A} \hat{f} (S) Φ_{S} (x_{A}) = \hat{f} (\emptyset) \\ \Leftrightarrow \sum_{S \subseteq A ∖ \emptyset} \hat{f} (S) Φ_{S} (x_{A}) = 0 . \end{align}

(28)

It follows from the Fourier expansion (3) that (28) holds for all x_A∈{−1,+1}^|A| if and only if $\hat{f} (S) = 0$ for all S⊆A∖∅, which proves the theorem.

References

Kauffman S, Metabolic stability and epigenesis in randomly constructed genetic nets: J. Theor. Biol. 1969,22(3):437-467. 10.1016/0022-5193(69)90015-0
Article Google Scholar
Kauffman S: Homeostasis and differentiation in random genetic control networks. Nature 1969,224(5215):177-178. 10.1038/224177a0
Article Google Scholar
Davidich S, Bornholdt MI: Boolean network model predicts cell cycle sequence of fission yeast. PLoS ONE 2008,3(2):e1672. 10.1371/journal.pone.0001672
Article Google Scholar
Saez-Rodriguez J, Simeoni L, Lindquist JA, Hemenway R, Bommhardt U, Arndt B, Haus U, Weismantel R, Gilles ED, Klamt S, Schraven B: logical model provides insights into T cell receptor signaling. PLoS Comput Biol 2007,3(8):e163. 10.1371/journal.pcbi.0030163
Article MathSciNet Google Scholar
Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. USA 2004,101(14):4781-4786. 10.1073/pnas.0305937101
Article Google Scholar
Covert BO, Knight MW, Reed EM, Herrgard JL, Palsson MJ: Integrating high-throughput and computational data elucidates bacterial networks. Nature 2004,429(6987):92-96. 10.1038/nature02456
Article Google Scholar
J Kesseli P, Rämö O: Yli-Harja, On spectral techniques in analysis of Boolean networks. Physica D: Nonlinear Phenomena 2005,206(1–2):49-61.
Article MathSciNet MATH Google Scholar
Kesseli J, Rämö P, Yli-Harja O: Tracking perturbations in Boolean networks with spectral methods. Phys. Rev. E 2005,72(2):026137.
Article MathSciNet MATH Google Scholar
Ribeiro L, Kauffman AS, loyd-J Price SA, Samuelsson B, Socolar JES: Mutual information in random Boolean models of regulatory networks. Phys. Rev. E 2008.,77(011901):
Article MathSciNet Google Scholar
Heckel R, Schober S, Bossert M: Determinative power and tolerance to perturbations in Boolean networks. Paper presented at the 9th international workshop on computational systems biology, Ulm, Germany. 4-6.
Xiao G, Massey J: A spectral characterization of correlation-immune combining functions. Inf Theory IEEE Trans 1988,34(3):569-571. 10.1109/18.6037
Article MathSciNet MATH Google Scholar
Liang S, Fuhrman S, Somogyi R: Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing 1998, 3: 18-29.
Google Scholar
Grefenstette J, Kim S, Kauffman S: An analysis of the class of gene regulatory functions implied by a biochemical model. Bio. Syst 2006,84(2):81-90.
Google Scholar
Derrida B, Pomeau Y: Random networks of automata: a simple annealed approximation. Europhysics Lett 1986,1(2):45-49. 10.1209/0295-5075/1/2/001
Article Google Scholar
Bahadur RR: A representation of the joint distribution of responses to n dichotomous items. In Studies in Item Analysis and Prediction. Edited by: Solomon H. Stanford: Stanford University Press; 1961:158-168.
Google Scholar
Ben-Or M, Linial N: Collective coin flipping, robust voting schemes, and minima of Banzhaf values. Paper presented at the 26th annual symposium on foundations of computer science, Portland, Oregon, USA. 21-23.
Bshouty C, Tamon NH: On the Fourier spectrum of monotone functions. J. ACM 1996,43(4):747-770. 10.1145/234533.234564
Article MathSciNet MATH Google Scholar
Schober S: About Boolean networks with noisy inputs. Paper presented at the fifth international workshop on computational systems biology, Leipzig, Germany. 11-13.
Matache V, Matache MT: On the sensitivity to noise of a Boolean function. J. Math. Phys 2009,50(10):103512. 10.1063/1.3225563
Article MathSciNet MATH Google Scholar
Cover JA, Thomas TM: Elements of Information Theory. New York: Wiley-Interscience; 2006.
MATH Google Scholar
Forre R: Methods and instruments for designing S-boxes. J. Cryptology 1990,2(3):115-130.
Article MathSciNet MATH Google Scholar
Rosell B, Hellerstein L, Ray S: Why skewing works: learning difficult Boolean functions with greedy tree learners. Paper presented at the 22nd international conference on machine learning, Bonn, Germany. 7-11.
Diskin A, Koppel M: Voting power: an information theory approach. Soc Choice Welfare 2010, 34: 105-119. 10.1007/s00355-009-0390-8
Article MathSciNet MATH Google Scholar
Raeymaekers L: Dynamics of Boolean networks controlled by biologically meaningful functions. J. Theor. Biol 2002,218(3):331-341. 10.1006/jtbi.2002.3081
Article MathSciNet Google Scholar
Schober S: Analysis and Identification of Boolean Networks using Harmonic Analysis. Germany: Der Andere Verlag; 2011.
Google Scholar
Topsoe F: Bounds for entropy and divergence for distributions over a two-element set. Inequalities Appl. Pure Math. 2001, 2: Art, 25.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank Sara Al-Sayed and Dejan Lazich for their helpful discussions and careful reading of the manuscript.

Author information

Authors and Affiliations

Department of Information Technology and Electrical Engineering, ETH, Zürich, Zürich, Switzerland
Reinhard Heckel
Institute of Telecommunications and Applied Information Theory, University of Ulm, Ulm, Germany
Steffen Schober & Martin Bossert

Authors

Reinhard Heckel
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Schober
View author publications
You can also search for this author in PubMed Google Scholar
Martin Bossert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinhard Heckel.

Additional information

Competing interest

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Heckel, R., Schober, S. & Bossert, M. Harmonic analysis of Boolean networks: determinative power and perturbations. J Bioinform Sys Biology 2013, 6 (2013). https://doi.org/10.1186/1687-4153-2013-6

Download citation

Received: 14 August 2012
Accepted: 10 April 2013
Published: 04 May 2013
DOI: https://doi.org/10.1186/1687-4153-2013-6

Harmonic analysis of Boolean networks: determinative power and perturbations

Abstract

1 Introduction

1.1 Contributions

1.2 Outline

2 Preliminaries

2.1 Boolean networks

2.2 Notation

2.3 Fourier analysis of Boolean functions

3 Influence and average sensitivity

Definition 1 ([16])

Definition 2

Proposition 1 (Lemma 4.1 of [17])

Proposition 2

4 Mutual information and uncertainty

Definition 3

Definition 4

Definition 5

4.1 Mutual information and the Fourier spectrum

Theorem 1

Proof

Corollary 1

Corollary 2

Theorem 2

Proof

4.2 Relation to measures of perturbation

Theorem 3

Proof

Theorem 4

Proof

4.3 Statistical independence of inputs to a Boolean function

Theorem 5

Proof

5 Unate functions

Definition 6

Proposition 3

Proof

Theorem 6

6 E. coli regulatory network

6.1 Determinative nodes in the E. coli network

6.2 Tolerance to perturbations

7 Conclusion

Appendices

Appendix 1

Lemma 1

Lemma 1

Proof

Appendix 2

Proof of Theorem 1

Appendix 3

Proof of Theorem 2

Appendix 4

Proof of Theorem 3

Appendix 5

Proof of Theorem 4

Appendix 6

Proof of Theorem 5

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interest

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords