A comparison study of optimal and suboptimal intervention policies for gene regulatory networks in the presence of uncertainty
- Mohammadmahdi R Yousefi^{1}Email author and
- Edward R Dougherty^{2}
https://doi.org/10.1186/1687-4153-2014-6
© Yousefi and Dougherty; licensee Springer. 2014
Received: 31 December 2013
Accepted: 18 March 2014
Published: 3 April 2014
Abstract
Perfect knowledge of the underlying state transition probabilities is necessary for designing an optimal intervention strategy for a given Markovian genetic regulatory network. However, in many practical situations, the complex nature of the network and/or identification costs limit the availability of such perfect knowledge. To address this difficulty, we propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, each assigned some probability, which reflects our prior knowledge about the system. We define the objective function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, starting from a prior probability distribution over the uncertainty class and collecting samples from the process over time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and adaptive.
Keywords
Optimal intervention Markovian gene regulatory networks Probabilistic Boolean networks; Uncertainty; Prior knowledge; Bayesian controlIntroduction
A fundamental problem of translational genomics is to develop optimal therapeutic methods in the context of genetic regulatory networks (GRNs) [1]. Most previous studies rely on perfect knowledge regarding the state transition rules of the network; however, when dealing with biological systems such as cancer cells, owing to their intrinsic complexity, little is known about how they respond to various stimuli or how they function under certain conditions. Moreover, if there exists any knowledge regarding their functioning, it is usually marginal and insufficient to provide a perfect understanding of the full system. To address uncertainty, one can construct an uncertainty class of models, each representing the system of interest to some extent, and optimize an objective function across the entire uncertainty class. In this way, success in therapeutic applications is fundamentally bound to the degree of robustness of the designed intervention method.
Markovian dynamical networks, especially probabilistic Boolean networks (PBNs) [2], have been the main framework in which to study intervention methods due to their ability to model randomness that is intrinsic to the interactions among genes or gene products. The stochastic state transition rules of any PBN can be characterized by a corresponding Markov chain with known transition probability matrix (TPM) [3]. Markov decision processes (MDPs), on the other hand, are a standard framework for characterizing optimal intervention strategies. Many GRN optimization problems have been formulated in the context of MDPs - for instance - infinite-horizon control [4], constrained intervention [5], optimal intervention in asynchronous GRNs [6], optimal intervention when there are random-length responses to drug intervention [7], and optimal intervention to achieve the maximal beneficial shift in the steady-state distribution [8]. Herein, PBNs will be our choice of reference model for GRNs.
The first efforts to address robustness in the design of intervention policies for PBNs assumed that the errors made during data extraction, discretization, gene selection and network generation introduce a mismatch between the PBN model and the actual GRN [9, 10]. Therefore, uncertainties manifest themselves in the entries of the TPM. A minimax approach was taken in which robust intervention policies were formulated by minimizing the worst-case performance across the uncertainty class [9]. Thus, the resulting policies were typically conservative. To avoid the detrimental effects of extreme, but rare, states on minimax design and motivated by the results of Bayesian robust filter design [11], the authors in [10] adopted a Bayesian approach whereby the optimal intervention policy depends on the prior probability distribution over the uncertainty class of networks. Constructing a collection of optimal policies, each being optimal for a member of the uncertainty class, the goal was to pick a single policy from this collection that minimizes the average performance relative to the prior distribution. The corresponding policy provides a model-constrained robust (MCR) policy. It was noted that this model-constrained policy may not yield the best average performance among all possible policies (we will later define the set of all possible policies for this problem). The authors also considered a class of globally robust (GR) policies, which are designed optimally only for a centrality parameter, such as the mean or median, to represent the mass of the uncertainty distribution.
Since [10] was concerned only with stationary policies, it did not consider the possibility of finding nonstationary policies under a Bayesian updating framework, where state transitions observed from the system are used directly to enrich the prior knowledge regarding the uncertainty class. The resulting nonstationary intervention policy, which we refer to it as the optimal Bayesian robust (OBR) policy, is our main interest in the present paper. As our main optimization criterion, we use the expected total discounted cost in the long run. This choice is motivated by the practical implications of discounted cost in the context of medical treatment, where the discounting factor emphasizes that obtaining good treatment outcomes at an earlier stage is favored over later stages.
Since the early development of MDPs, it was recognized that when dealing with a real-world problem it seldom happens that the decision maker is provided with the full knowledge of the TPM, but rather some prior information often expressed in a probabilistic manner. Taking a Bayesian approach, an optimal control policy may exist in the expected value sense specifying the best choice of control action in each state. Since the decision maker’s state of knowledge about the underlying true process evolves in time as the process continues, the best choice of control action at each state might also evolve. Because the observations are acquired through a controlled process (a control action is taken at every stage of the process), the optimal policy derived through the Bayesian framework may not necessarily ever coincide with a policy that is optimal for the true state of nature. In fact, frequently, the optimal policy is not self-optimizing[12]; rather, optimal control will provide the best trade-off between exploration rewards and immediate costs.
Bellman [13] considered a special case of this problem - the two-armed bandit problem with discounted cost - and later used the term adaptive control for control processes with incompletely known transition probabilities. He suggested transforming the problem into an equivalent dynamic program with completely known transition laws for which the state now constitutes both the physical state of the process and an information state summarizing the past history of the observed state transitions from the process [14]. This new state is referred to as the hyperstate. Along this line of research, authors in [15–17] developed the theory of the OBR policy for Markov chains with uncertainty in their transition probabilities, where there is a clear notion of optimality defined with respect to all possible scenarios within the uncertainty class. This approach is in contrast with the MCR methodology because the resulting policy may not be optimal for any member of the uncertainty class but it yields the best performance when averaged over the entire uncertainty class.
Following the methodology proposed in [17] and assuming that the prior probability distribution of a random TPM belongs to a conjugate family of distributions which are closed under consecutive observations, one can formulate a set of functional equations, similar to those of fully known controlled Markov chains, and use a method of successive approximation to find the unique set of solutions to these equations. In this paper, we adopt this approach for the robust intervention of Markovian GRNs and provide a simulation study demonstrating the performance of OBR policies compared with several suboptimal methods, such as MCR and two variations of GR policies, when applied to synthetic PBNs with various structural properties and parameters, as well as to a mutated mammalian cell cycle network.
The paper is organized as follows. First, we give an overview of controlled PBNs and review the nominal MDP problem where the TPMs of the underlying Markov chain are completely known. We then formulate the OBR policy for PBNs with uncertainty in their TPMs and provide the dynamic programming solution to this optimization problem. We demonstrate a conjugate family of probability distributions over the uncertainty class where each row of the random TPM follows a Dirichlet distribution with certain parameters. Assuming that the rows are independent, the posterior probability distribution will again be a Dirichlet distribution with updated parameters. This provides a compact representation of the dynamic programming equation and facilitates the computations involved in the optimization problem. Several related suboptimal policies are also discussed in detail. Finally, we provide simulation results over both synthetic and real networks, comparing the performance of different design strategies discussed in this paper.
Methods
Controlled PBNs
PBNs constitute a broad class of stochastic models for transcriptional regulatory networks. Their construction takes into account several random factors, including effects of latent variables, involved in the dynamical genetic regulation [3]. The backbone of every PBN is laid upon a collection of Boolean networks (BNs) [18]. A BN is composed of a set of n nodes, V={v^{1},v^{2},…,v^{ n }} (representing expression level of genes g^{1},g^{2},…,g^{ n } or their products) and a list of Boolean functions F={f^{1},f^{2},…,f^{ n }} describing the functional relationships between the nodes. We restrict ourselves to binary BNs, where we assume that each node takes on value of 0, corresponding to an unexpressed (OFF) gene and 1, corresponding to an expressed (ON) gene. This definition extends directly to any finitely discrete-valued nodes. The Boolean function ${f}^{i}:{\{0,1\}}^{{j}_{i}}\to \{0,1\}$ determines the value of node i at time k+1 given the value of its predictor nodes at time k by ${v}_{k+1}^{i}={f}^{i}({v}_{k}^{i1},{v}_{k}^{i2},\dots ,{v}_{k}^{i{j}_{i}})$, where $\{{v}^{i1},{v}^{i2},\dots ,{v}^{i{j}_{i}}\}$ is the predictor set of node v^{ i }. In a BN, all nodes are assumed to update their values synchronously according to F. The dynamics of a BN are completely determined by its state transition diagram composed of 2^{ n } states. Each state corresponds to a vector ${\mathbf{v}}_{k}=({v}_{k}^{1},{v}_{k}^{2},\dots ,{v}_{k}^{n})$ known as the gene activity profile (GAP) of the BN at time k. To make our analysis more straightforward, we will replace each GAP, v_{ k }, with its decimal equivalent denoted by ${x}_{k}=1+\sum _{i=1}^{n}{2}^{n-i}{v}_{k}^{i}$, where ${x}_{k}\in \mathcal{S}=\{1,\dots ,{2}^{n}\}$ for all k.
A PBN is fully characterized by the same set of n nodes, V, and a set of m constituent BNs, F={F^{1},F^{2},…,F^{ m }}, called contexts, a selection probability vector R={r^{1},r^{2},…,r^{ m }} over F (r^{ i }≥0 for i=1,…,m and $\sum _{i=1}^{m}{r}^{i}=1$), a network switching probability q>0, and a random gene perturbation probability p≥0. At any updating epoch, depending on the value of a random variable ξ∈{0,1}, with P(ξ=1)=q, one of two mutually exclusive events will occur. If ξ=0 then the values of all nodes are updated synchronously according to an operative constituent BN; if ξ=1 then another operative BN, F^{ l }∈F, is randomly selected with probability r^{ l }, and the values of the nodes are updated accordingly. The current BN may be selected consecutively when a switch is called for [1]. PBNs also admit random gene perturbations where the current state of each node in the network can be randomly flipped with probability p.
A PBN is said to be context-sensitive if q<1; otherwise, a PBN is called instantaneously random. The number of states in a context-sensitive PBN is m 2^{ n }, whereas the state transition diagram of an instantaneously random PBN is composed of the same 2^{ n } states in . It is shown in [19] that averaging over the various contexts, relative to R, reduces the transition probabilities of a context-sensitive PBN to an instantaneously random PBN with identical parameters. PBNs with only one constituent BN, i.e., m=1, are called BNs with perturbation and are of particular interest in some applications [8, 20]. For the sake of simplicity and reducing the computational time, we will focus only on instantaneously random PBNs.
Since the nature of transitions from one state to another in a PBN is stochastic and has the Markov property, we can model any PBN by an equivalent homogeneous Markov chain, whose states are members of and the TPM of this Markov chain can be calculated as described in [19]. We denote the TPM of an instantaneously random PBN by and let $\{{Z}_{k}\in \mathcal{S},k=0,1,\dots \}$ be the stochastic process of the state transitions for this PBN. Originating from state $i\in \mathcal{S}$, the successor state $j\in \mathcal{S}$ is selected randomly according to the transition probability ${\mathcal{P}}_{\mathit{\text{ij}}}=P({Z}_{k+1}=j\mid {Z}_{k}=i)$, the (i,j) element of the TPM. For every $i\in \mathcal{S}$, the transition probability vector $({\mathcal{P}}_{i1},{\mathcal{P}}_{i2},\dots ,{\mathcal{P}}_{i\left|\mathcal{S}\right|})$ is a stochastic vector such that ${\mathcal{P}}_{\mathit{\text{ij}}}\ge 0$ and $\sum _{j\in \mathcal{S}}{\mathcal{P}}_{\mathit{\text{ij}}}=1$ for every $i\in \mathcal{S}$. Random gene perturbation guarantees the ergodicity of the equivalent Markov chain, resulting in a unique invariant measure equal to its limiting distribution.
To model the effect of interventions, we assume that PBNs admit an external control input, A, from a set of possible inputs signals, , that determines a specific type of intervention on a set of control genes. It is common to assume that the control input is binary, i.e., $\mathcal{A}=\{0,1\}$, where A=0 indicates no-intervention and A=1 indicates that the expression level of a single control gene, g^{ c } (or equivalently v^{ c }), for a given c∈{1,2,…,n}, should be flipped. For this control scheme, A=0 does not alter the TPM of the original uncontrolled PBN. However, assuming that the network is in state i, the action A=1 replaces the row corresponding to this state by the row that corresponds to the state $\u0129$, where the binary representation of $\u0129$ is the same as i except v^{ c } being flipped. The effect of this binary control scheme on any PBN can be easily generalized to more than one control gene with more than two control actions; in this paper, we only consider the binary control scheme.
is the probability of going to state $j\in \mathcal{S}$ at time k+1 from state $i\in \mathcal{S}$, while taking action $a\in \mathcal{A}$, at time k. By this construction, it is clear that the controlled TPM, $\mathcal{P}\left(a\right)$, can be calculated directly from .
The nominal problem
External intervention in the context of Markovian networks refers to a class of sequential decision making problems in which actions are taken at discrete time units to alter the dynamics of the underlying GRN. It is usually assumed that the decision maker can observe the state evolution of the network at consecutive time epochs k=0,1,…,N, where the horizon N may be finite or infinite. At each k, upon observing the state, the decision maker chooses an action from that will subsequently alter the dynamics of the network. Hence, the stochastic movement of the GRN from one state to another is completely characterized based on the current state and action taken at this state by (1).
is minimized, i.e., ${J}_{\mathcal{P}}^{\ast}\left(i\right)=\underset{\mu \in \mathcal{\mathcal{M}}}{min}{J}_{\mathcal{P}}^{\mu}\left(i\right)$ for all $i\in \mathcal{S}$. In the above equation, ${E}_{i}^{\mu}$ denotes expectation relative to the probability measure ${P}_{i}^{\mu}$.
The optimal cost function J^{∗} uniquely satisfies the above functional equation, i.e., it is the fixed point of the mapping T. One can determine the optimal policy with the help of convergence, optimality, and uniqueness theorems for the solution, proven in [21]. These results furnish an iterative method for successive approximation of the optimal cost function, which in turn gives the optimal intervention policy. It can be further shown that the optimal intervention policy belongs to the class of stationary deterministic policies, meaning that μ_{ k }=μ for all k and $\mu :\mathcal{S}\to \mathcal{A}$ is a single-valued mapping from states to actions.
OBR intervention policy
where the expectation is taken not only with respect to the random behavior of the state-action stochastic process but also with respect to the random choice of according to its prior distribution, $\pi \left(\mathcal{P}\right)$. The goal is to find an optimal policy μ^{∗} such that (5) is minimized for any $i\in \mathcal{S}$ and any prior distribution π, i.e., ${\mu}^{\ast}={\text{argmin}}_{\mu \in \mathcal{\mathcal{M}}}{J}^{\mu}(i,\pi )$. We denote the optimal cost by J^{∗}(i,π).
Suppose that we could find optimal intervention policies for every element of Ω. Letting ${J}_{\mathcal{P}}^{\ast}\left(i\right)$ denote the optimal cost for any $\mathcal{P}\in \Omega $ and $i\in \mathcal{S}$ and assuming that the optimal cost J^{∗}(i,π) exists, we have ${E}_{\pi}\left[\phantom{\rule{0.3em}{0ex}}{J}_{\mathcal{P}}^{\ast}\right(i\left)\right]\le {J}^{\ast}(i,\pi )$ for all $i\in \mathcal{S}$ and any π. In other words, ${E}_{\pi}\left[\phantom{\rule{0.3em}{0ex}}{J}_{\mathcal{P}}^{\ast}\right(i\left)\right]$ is the best that could be achieved if we were to optimize for every element of the uncertainty class for fixed i and π.
Since at every stage of the problem an observation is made immediately after taking an action, we can utilize this additional information and update the prior distribution to a posterior distribution as the process proceeds in time. Therefore, we can treat $\pi \left(\mathcal{P}\right)$ as an additional state and call (i,π) the hyperstate of the process. From this point of view, we seek an intervention policy that minimizes the total expected discounted cost when the process starts from a hyperstate (i,π). Suppose the true, but unknown, TPM is $\widehat{\mathcal{P}}$. At time 0, the initial state z_{0} is known and is distributed according to π. Based on z_{0} and π, the controller chooses an action a_{0} according to some intervention policy. Based on $({z}_{0},{a}_{0},\widehat{\mathcal{P}})$ the new state z_{1} is realized according to the probability transition rule ${\widehat{\mathcal{P}}}_{{z}_{0}{z}_{1}}\left({a}_{0}\right)$ and a cost ${g}_{{z}_{0}{z}_{1}}\left({a}_{0}\right)$ is incurred. Based on (z_{0},π,a_{0},z_{1}), the controller chooses an action a_{1} according to some (possibly another) intervention policy and so on [12]. Although the number of states in and actions in are finite, the space of all possible hyperstates is essentially uncountable. Therefore, finding an optimal intervention policy which provides a mapping from the space of hyperstates to the space of actions in a sense similar to the nominal case is rather difficult. However, as we will see, it is possible to find an optimal action for a fixed initial hyperstate using an equivalent dynamic program.
Dynamic programming solution
which is the fixed point of the operator T. Since the space of all possible hyperstates (i,π) is uncountable, construction of an optimal intervention policy for all (i,π), except for some special cases, may not be feasible. However, given that the process starts at (i,π), the minimization argument in the above equation yields an optimal action to take only for the current hyperstate.
The difficulty in solving (7), which makes it more complicated than (3), is that the total expected discounted cost when different actions are taken now involves the difference in expected immediate costs and the expected difference in future costs due to being in different states at the next period as well as the effect of different information states resulting from these actions [22]. It should be noted that since the decision maker’s knowledge regarding the uncertainty about evolves with each transition, the intervention policy will also evolve over time. In a sense, the optimal policy will adapt, implying that stationary optimal policies as defined for the nominal problem do not exist. The optimal nonstationary intervention policy derived through the process discussed above is referred to as the OBR policy.
Special case: independent Dirichlet priors
Suppose that both prior and posterior distributions belong to the same family of distributions, i.e., they are conjugate distributions. Then, instead of dealing with prior and posterior at every stage of the problem, we will only need to keep track of the hyperparameters of the prior/posterior distributions. A special case of the families of distributions closed under consecutive observations is the Dirichlet distribution, which is the conjugate prior of the multinomial distribution.
where β_{ i j } denotes the number of transitions in z_{ n } from state i to state j. The right product in (8) is called the likelihood function and the constant of proportionality can be found by normalizing the integral of ${\pi}^{\prime}\left(\mathcal{P}\right)$ over Ω to 1. Note that although the transitions made in z_{ n } result from an intervention policy, we have formulated the likelihood function only in terms of the elements of (and not $\mathcal{P}\left(a\right)$). This is a consequence of our particular intervention model, where we can substitute for ${\mathcal{P}}_{\mathit{\text{ij}}}\left(a\right)$ with ${\mathcal{P}}_{\mathrm{\u0129j}}$ whenever a=1 as shown in (1). To be more precise, we have ${\beta}_{\mathit{\text{ij}}}={\beta}_{\mathit{\text{ij}}}\left(0\right)+{\beta}_{\mathrm{\u0129j}}\left(1\right)$, where β_{ i j }(a) is the number of transitions in z_{ n } from state i to state j under control a.
We also have the following theorem, which is due to Martin [17].
Theorem 1. Let have a probability density function given in (9) and (10) with the hyperparameter matrix α and suppose that a sample with a transition count matrix β= [ β_{ i j }] is observed. Then the posterior probability density function of will have the same form as in (9) and (10), but with the hyperparameter matrix α+β.
with {J_{0}(i,α)} as a set of bounded initial functions. Under some mild conditions, the sequence of functions {J_{ k }(i,α)} converges monotonically to the optimal solution J^{∗}(i,α) for any $i\in \mathcal{S}$ and uniformly for all valid α[17]. Faster rates of convergence can be achieved for smaller values of λ. Assuming that the method of successive approximation converges in K steps, then for a specific value of (i,α), one needs to evaluate ${\left(\right|\mathcal{A}|\times |\mathcal{S}\left|\right)}^{K}$ terminal values necessary for the computation of J^{∗}(i,α). Therefore, to minimize computational time, we restrict ourselves to small values for λ and K. Once the successive approximation converges, an action a^{∗} that minimizes the RHS of (12) is optimal.
The extreme computational complexity of finding the OBR intervention policy for MDPs with large state-space poses a major obstacle when dealing with real-world problems. It is relatively straightforward to implement the procedure described above for networks with three or four genes. However, for larger networks, one should resort either to clever ways of indexing all possible transitions, such as hash tables or a branch-and-bound algorithm, or to approximation methods, such as reinforcement learning. See [12, 22, 23] for more details. An alternative approach, as we will demonstrate, is to implement suboptimal methods that, in general, have acceptable performance. Yet another potential approach to circumvent the explosion of the space of all hyperstates is to reduce the size of the uncertainty class. For example, we can assume that some rows of the underlying TPM are perfectly known and uncertainty is only on some other rows, with the implication that the regulatory network is partially known. We will leave the analysis of such approaches to future research.
Suboptimal intervention policies
Besides the OBR policy, three suboptimal policies are of particular interest: MCR, GR, and adaptive GR (AGR). Similar to the previous section, let be random, having a probability density $\pi \left(\mathcal{P}\right)$ over the set of valid TPMs, Ω, defined in (4).
where ${J}_{\mathcal{P}}^{\mu}\left({Z}_{0}\right)$ is defined in (2) for any fixed Z_{0} and . Since we are limiting ourselves to policies in ${\mathcal{\mathcal{M}}}_{\text{MCR}}$, it is seldom the case that a single policy minimizes ${E}_{\pi}\left[\phantom{\rule{0.3em}{0ex}}{J}_{\mathcal{P}}^{\mu}\right({Z}_{0}\left)\right]$ for all ${Z}_{0}\in \mathcal{S}$. Hence, we take the expected value of ${J}_{\mathcal{P}}^{\mu}\left({Z}_{0}\right)$ with respect to η in (13) as a single value representing the expected cost. The resulting MCR intervention policy is therefore fixed for a given prior distribution in the sense that it will not adapt to the observed transitions.
We define the GR policy as the minimizing argument for the optimization problem given by ${J}_{\text{GR}}\left(i\right)=\underset{\mu \in \mathcal{\mathcal{M}}}{min}{J}_{\stackrel{\u0304}{\mathcal{P}}}^{\mu}\left(i\right)$, for all $i\in \mathcal{S}$, where $\stackrel{\u0304}{\mathcal{P}}\in \Omega $ is the mean of the uncertainty class Ω with respect to the prior distribution π. The optimization method presented for the nominal problem can be readily applied. Hence, the resulting policy, μ_{GR}, is stationary and deterministic. In the case of independent Dirichlet priors, $\stackrel{\u0304}{\mathcal{P}}$ is given by Equation 11. Here we are considering the mean as an estimate for unknown . However, one can use any other estimate of and find the optimal policy in a similar fashion. Similar to the MCR policy, this intervention method is also fixed for a given prior distribution and it will not adapt to the observed transitions.
The AGR policy is similar to the GR policy in the sense that it is optimal for the mean of the uncertainty class Ω. However, instead of taking the mean with respect to the prior distribution π and using the same policy for the entire process, we update π to a posterior π^{′}, defined in (6), whenever a transition is made and calculate the mean of Ω with respect to π^{′}. Since the posterior evolves as we observe more and more transitions, the AGR policy also evolves - therefore, the name adaptive. We denote the cost and the corresponding policy resulting from this procedure, for any initial hyperstate (i,π), by J_{AGR}(i,π) and μ_{AGR}, respectively. In the case of independent Dirichlet priors, we can simply replace π with α.
Results
In this section, we provide a comparison study on the performance of optimal and suboptimal policies based on simulations on synthetically generated PBNs and a real network. Since we implement the method of successive approximation to calculate μ_{OBR}, we restrict ourselves to synthetic networks with n=3 genes. Given that, as we will show, μ_{AGR} yields very similar performance compared to the optimal policy, we can implement μ_{AGR} for networks of larger size and use it as the baseline for comparison with other suboptimal policies, keeping in mind that the optimal policy should and will outperform any suboptimal method.
Synthetic networks
the interpretation being that a cost will be incurred if the future state in undesirable or there is an intervention in the network.
To design an OBR policy for a given network, we need to assign the prior probability distribution to the set Ω. As discussed earlier, independent Dirichlet priors parameterized by α constitute a natural choice for this application. Therefore, we only need to assign values to α. The choice of prior hyperparameters plays a crucial role in the design of an optimal policy: the tighter the prior around the true, but unknown, TPM $\widehat{\mathcal{P}}$, the closer the OBR cost is to that of $\widehat{\mathcal{P}}$. Since our synthetic networks are generated randomly and not according to some biologically motivated GRN, it would be difficult to assign prior probabilities for individual networks. Therefore, we use the randomly generated PBNs themselves for this purpose and perturb and scale the elements of the TPMs via the ε-contamination method.
where κ>0 controls the tightness of the prior around the true PBN and ε∈[0,1] controls the level of contamination. For networks with three genes, we assume that ε=0.1 and demonstrate the effect of κ on the performance of intervention policies.
We generate 500 random PBNs, denoted by $\left\{{\mathcal{N}}^{l}\right\}$ for l=1 to 500, for each set of parameters and calculate their TPMs, denoted by $\left\{{\widehat{\mathcal{P}}}^{l}\right\}$. These networks will serve as the ground-truth for our simulation study. For a given pair of κ and ε, we then construct hyperparameter matrices, denoted by {α^{ l }}, using (15), each corresponding to a random network. To compare the performance of different intervention policies, for each randomly generated network ${\mathcal{N}}^{l}$, we take a Monte Carlo approach and generate 500 random TPMs, denoted by $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ for l^{′}=1 to 500, from the α^{ l }-parameterized independent Dirichlet priors. The set $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ will essentially represent Ω and the prior distribution.
To design and evaluate the performance of μ_{MCR} for each random PBN ${\mathcal{N}}^{l}$, we proceed as follows: We find the optimal intervention policy for each ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$, apply this policy to every element in the set $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$, and calculate the average over all equally likely initial states, ${Z}_{0}\in \mathcal{S}$, of the infinite-horizon expected discounted cost using (2) for that element. The expected performance of the each policy optimal for ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$, relative to the prior distribution, can be computed by taking the average of the resulting costs over all ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$. We repeat this procedure for every element of $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ and declare a policy MCR if it yields the minimum expected performance. We denote the expected cost function for a random PBN ${\mathcal{N}}^{l}$ obtained via an MCR policy by ${J}_{\text{MCR}}^{l}$.
Finding μ_{GR} for each PBN ${\mathcal{N}}^{l}$, on the other hand, is easier and it requires only the value of the hyperparameter α^{ l }. Once found, the performance of this policy is evaluated by applying it to all elements of $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ and taking the average of the resulting costs. Similar to the MCR policy, we assume that the initial states are equally likely and calculate the average over all possible initial states. We denote the expected cost function corresponding to the GR policy derived for ${\mathcal{N}}^{l}$ by ${J}_{\text{GR}}^{l}$.
To quantify the performance of the OBR policy for each random PBN ${\mathcal{N}}^{l}$, we directly evaluate the cost function defined in (5) relative to the independent Dirichlet prior distribution, π^{ l }, parameterized by α^{ l }. This is accomplished using the sample set of 500 random TPMs, $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$. Starting from a hyperstate and a TPM ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$, we derive an optimal action from (12) using the method of successive approximations with K=5 and some initial cost function. We then observe a transition according to ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$ and find the incurred discounted immediate cost according to (14), depending on the new observed state and the optimal action just taken. We update our prior hyperparameter and carry out the optimization problem again, but now with the updated hyperparameter and the recently observed state, and accumulate the newly incurred discounted immediate cost. We iterate this for seven epochs, thus observing seven different hyperstates for a sampling path, and record the total accumulated discounted cost over this period. We then repeat this entire process, for the same ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$ for 100 iterations (although the same TPM is used, different sampling paths will result due to random transitions), and take the average of all 100 total accumulated discounted cost values. This will represent the cost associated with ${\widehat{\mathcal{P}}}^{l,{l}^{\prime}}$ and the initial state. We implement a similar procedure for all initial states (assuming all equally likely) and all elements of $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ and take the average of the resulting costs, yielding the expected optimal cost, E_{ η }[J^{∗}(Z_{0},π^{ l })], with respect to the uniform probability distribution η over the initial states in . Since we use the same hyperparameter α^{ l } in our Monte Carlo simulation for a given random PBN ${\mathcal{N}}^{l}$, we denote the expected optimal cost obtained from a OBR policy by ${J}_{\text{OBR}}^{l}$.
We take a similar approach for evaluating the performance of μ_{AGR}. Instead of using the method of successive approximations at every epoch, we use the current value of the hyperparameter to calculate the mean of Ω and use this to find the optimal action to take at that hyperstate. Every other step of the process is essentially the same to those of the OBR policy. We denote the expected optimal cost obtained from this policy by ${J}_{\text{AGR}}^{l}$.
We also evaluate three other cost functions for each PBN ${\mathcal{N}}^{l}$: ${J}_{\text{LB}}^{l}:={E}_{\pi}\left[{E}_{\eta}\left[{J}_{\mathcal{P}}^{\ast}\right({Z}_{0}\left)\right]\right]$, ${J}_{\mathrm{T}}^{l}:={E}_{\eta}\left[{J}_{{\widehat{\mathcal{P}}}^{l}}^{\ast}\right({Z}_{0}\left)\right]$, and ${J}_{\text{ET}}^{l}:={E}_{\pi}\left[{E}_{\eta}\left[{J}_{\mathcal{P}}^{l}\right({Z}_{0}\left)\right]\right],$ where ${J}_{\mathcal{P}}^{l}$ is the cost of applying an optimal intervention policy corresponding to ${\widehat{\mathcal{P}}}^{l}$ to an element of Ω. The first cost function, ${J}_{\text{LB}}^{l}$, is a lower bound on the performance of the OBR policy, ${J}_{\text{BA}}^{l}$. The second cost function, ${J}_{\mathrm{T}}^{l}$, corresponds to the cost of applying an optimal intervention policy as if we knew the true network, ${\widehat{\mathcal{P}}}^{l}$, to the true network itself. The third cost function, ${J}_{\text{ET}}^{l}$, is the expected cost, relative to the prior, of applying an intervention policy that is optimal for the true network. We can calculate these cost functions assuming that Ω and the prior distribution π^{ l } are represented by the set $\left\{{\widehat{\mathcal{P}}}^{l,{l}^{\prime}}\right\}$ corresponding to each PBN ${\mathcal{N}}^{l}$.
Average costs across all 500 randomly generated PBNs with n =3 genes and ε =0 . 1
$E\left[{J}_{\text{LB}}^{l}\right]$ | $E\left[{J}_{\mathrm{T}}^{l}\right]$ | $E\left[{J}_{\text{ET}}^{l}\right]$ | $E\left[{J}_{\text{MCR}}^{l}\right]$ | $E\left[{J}_{\text{GR}}^{l}\right]$ | $E\left[{J}_{\text{AGR}}^{l}\right]$ | $E\left[{J}_{\text{OBR}}^{l}\right]$ | |
---|---|---|---|---|---|---|---|
κ=0.1 | 0.7626 | 1.0803 | 1.0998 | 1.0948 | 1.0991 | 1.0816 | 1.0812 |
κ=1.0 | 0.8078 | 1.0296 | 1.0531 | 1.0520 | 1.0526 | 1.0458 | 1.0457 |
κ=5.0 | 0.9417 | 1.0209 | 1.0525 | 1.0518 | 1.0513 | 1.0502 | 1.0501 |
where ${J}_{\circ}^{l}$ and ${J}_{\bullet}^{l}$ denote two different intervention policies. Since PBNs are randomly generated, ${\Delta}_{\circ}^{l}$ will also be a random variable with a probability distribution. We estimate the complementary cumulative distribution function (CCDF) of this distribution for different values of ${\Delta}_{\circ}^{l}$ using its empirical distribution function.
Average costs across all 500 randomly generated PBNs with n=4 genes
$E\left[{J}_{\text{LB}}^{l}\right]$ | $E\left[{J}_{\mathrm{T}}^{l}\right]$ | $E\left[{J}_{\text{ET}}^{l}\right]$ | $E\left[{J}_{\text{MCR}}^{l}\right]$ | $E\left[{J}_{\text{GR}}^{l}\right]$ | $E\left[{J}_{\text{AGR}}^{l}\right]$ | |
---|---|---|---|---|---|---|
(κ,ε)=(0.1,0.0) | 0.7559 | 1.0878 | 1.0869 | 1.0856 | 1.0869 | 1.0773 |
(κ,ε)=(1.0,0.0) | 0.8702 | 1.0888 | 1.0888 | 1.0918 | 1.0888 | 1.0854 |
(κ,ε)=(5.0,0.0) | 0.9510 | 1.0579 | 1.0578 | 1.0612 | 1.0578 | 1.0572 |
(κ,ε)=(0.1,0.1) | 0.7711 | 1.1099 | 1.1260 | 1.1248 | 1.1258 | 1.1156 |
(κ,ε)=(1.0,0.1) | 0.8722 | 1.1106 | 1.1278 | 1.1314 | 1.1276 | 1.1236 |
(κ,ε)=(5.0,0.1) | 0.9714 | 1.0826 | 1.1011 | 1.1049 | 1.1009 | 1.1002 |
(κ,ε)=(0.1,0.25) | 0.7177 | 1.0796 | 1.1289 | 1.1234 | 1.1248 | 1.1133 |
(κ,ε)=(1.0,0.25) | 0.8307 | 1.0853 | 1.1348 | 1.1325 | 1.1305 | 1.1257 |
(κ,ε)=(5.0,0.25) | 0.9729 | 1.0629 | 1.1178 | 1.1157 | 1.1137 | 1.1130 |
Real network
Boolean regulatory functions of a mutated mammalian cell cycle
Gene | Node | Predictor functions |
---|---|---|
CycD | v _{1} | Extracellular signal |
Rb | v _{2} | $(\overline{{v}_{1}}\wedge \overline{{v}_{4}}\wedge \overline{{v}_{5}}\wedge \overline{{v}_{9}})$ |
E2F | v _{3} | $(\overline{{v}_{2}}\wedge \overline{{v}_{5}}\wedge \overline{{v}_{9}})$ |
CycE | v _{4} | $({v}_{3}\wedge \overline{{v}_{2}})$ |
CycA | v _{5} | $({v}_{3}\wedge \overline{{v}_{2}}\wedge \overline{{v}_{6}}\wedge (\overline{{v}_{7}\wedge {v}_{8}}\left)\right)\vee ({v}_{5}\wedge \overline{{v}_{2}}\wedge \overline{{v}_{6}}\wedge (\overline{{v}_{7}\wedge {v}_{8}}\left)\right)$ |
Cdc20 | v _{6} | v _{9} |
Cdh1 | v _{7} | $(\overline{{v}_{5}}\wedge \overline{{v}_{9}})\vee {v}_{6}$ |
UbcH10 | v _{8} | $\overline{{v}_{7}}\vee ({v}_{7}\wedge {v}_{8}\wedge ({v}_{6}\vee {v}_{5}\vee {v}_{9}\left)\right)$ |
CycB | v _{9} | $(\overline{{v}_{6}}\wedge \overline{{v}_{7}})$ |
Boolean regulatory functions of a reduced mutated mammalian cell cycle
Gene | Node | Predictor functions |
---|---|---|
CycD | v _{1} | Extracellular signal |
Rb | v _{2} | $(\overline{{v}_{1}}\wedge {v}_{2}\wedge \overline{{v}_{3}}\wedge \overline{{v}_{5}})$ |
CycA | v _{3} | $(\overline{{v}_{2}}\wedge {v}_{3}\wedge \overline{{v}_{5}})\vee (\overline{{v}_{2}}\wedge \overline{{v}_{4}}\vee \overline{{v}_{5}})$ |
UbcH10 | v _{4} | $({v}_{4}\wedge {v}_{5})\vee ({v}_{3}\wedge \overline{{v}_{5}})$ |
CycB | v _{5} | ${v}_{3}\wedge \overline{{v}_{5}}$ |
Total discounted cost of different suboptimal policies for the reduced cell cycle network
J _{LB} | J _{T} | J _{ET} | J _{MCR} | J _{GR} | J _{AGR} | |
---|---|---|---|---|---|---|
(κ,ε)=(0.1,0.0) | 0.7507 | 0.9685 | 0.9326 | 0.9465 | 0.9326 | 0.9316 |
(κ,ε)=(1.0,0.0) | 0.4990 | 0.9685 | 0.9675 | 0.9614 | 0.9675 | 0.9571 |
(κ,ε)=(5.0,0.0) | 0.6136 | 0.9685 | 0.9658 | 0.9774 | 0.9658 | 0.9605 |
(κ,ε)=(0.1,0.1) | 0.4501 | 0.9685 | 0.9239 | 0.9268 | 0.9239 | 0.9144 |
(κ,ε)=(1.0,0.1) | 0.5752 | 0.9685 | 0.9340 | 0.9526 | 0.9340 | 0.9294 |
(κ,ε)=(5.0,0.1) | 0.7507 | 0.9685 | 0.9326 | 0.9465 | 0.9326 | 0.9316 |
(κ,ε)=(0.1,0.25) | 0.3885 | 0.9685 | 0.8643 | 0.8674 | 0.8623 | 0.8550 |
(κ,ε)=(1.0,0.25) | 0.5140 | 0.9685 | 0.8728 | 0.8860 | 0.8730 | 0.8694 |
(κ,ε)=(5.0,0.25) | 0.7014 | 0.9685 | 0.8864 | 0.9002 | 0.8864 | 0.8861 |
Conclusions
Due to the complex nature of Markovian genetic regulatory networks, it is commonplace not to possess accurate knowledge of their parameters. Under the latter assumption, we have treated the system of interest as an uncertainty class of TPMs governed by a prior distribution. The goal is to find a robust intervention policy minimizing the expected infinite-horizon discounted cost relative to the prior distribution. We have taken a Bayesian approach and formulated the intervention policy optimizing this cost, thereby resulting in an intrinsically robust policy. Owing to extreme computational complexity, the resulting OBR policy is, from a practical sense, infeasible. Using only a few genes, we have compared it to several suboptimal polices on synthetically generated PBNs. In this case, although there are PBNs where the OBR policy significantly outperforms the suboptimal AGR policy, on average there is very little difference. Hence, one can feel somewhat comfortable using the AGR policy while losing only negligible performance. Unfortunately, even the AGR policy is computationally burdensome. Hence, when applying it to the mammalian cell cycle network, we are restricted to five genes.
The twin issues of uncertainty and computational complexity are inherent to translational genomics. Here we have examined the problem in the context of therapy, where the uncertainty is relative to network structure. It occurs to also in the other major area of translational genomics, gene-based classification. Whereas here the prior distribution is over an uncertainty class of networks, in classification it is over an uncertainty class of feature-label distributions and one looks for a classifier that is optimal, on average, across that prior distribution [27, 28]. There is no doubt, however, that the complexity issue is much graver in the case of dynamical intervention. Hence, much greater effort should be placed on gaining knowledge regarding biochemical pathways and thereby reducing the uncertainty when designing intervention strategies [29]. This means more attention should be paid to classical biological regulatory experiments and less reliance on blind data mining [30].
Declarations
Acknowledgements
The authors thank the High-Performance Biocomputing Center of TGen for providing the clustered computing resources used in this study; this includes the Saguaro-2 cluster supercomputer, partially funded by NIH grant 1S10RR025056-01.
Authors’ Affiliations
References
- Dougherty ER, Pal R, Qian X, Bittner ML, Datta A: Stationary and structural control in gene regulatory networks: basic concepts. Int. J. Syst. Sci 2010,41(1):5-16. 10.1080/00207720903144560MathSciNetView ArticleGoogle Scholar
- Shmulevich I, Dougherty ER: Genomic Signal Processing. Princeton: Princeton University; 2007.View ArticleGoogle Scholar
- Shmulevich I, Dougherty ER, Kim S, Zhang W, Probabilistic: Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002,18(2):261-274. 10.1093/bioinformatics/18.2.261View ArticleGoogle Scholar
- Pal R, Datta A, Dougherty ER: Optimal infinite-horizon control for probabilistic Boolean networks. IEEE Trans. Signal Process 2006,54(6):2375-2387.View ArticleGoogle Scholar
- Faryabi B, Vahedi G, Chamberland J-F, Datta A, Dougherty ER: Optimal constrained stationary intervention in gene regulatory networks. EURASIP J. Bioinform. Syst. Biol 2008, 2008: 620767.View ArticleGoogle Scholar
- Faryabi B, Chamberland J-F, Vahedi G, Datta A, Dougherty ER: Optimal intervention in asynchronous genetic regulatory networks. IEEE J. Sel. Top. Signal. Process 2008,2(3):412-423.View ArticleGoogle Scholar
- Yousefi MR, Datta A, Dougherty ER: Optimal intervention in Markovian gene regulatory networks with random-length therapeutic response to antitumor drug. IEEE Trans. Biomed. Eng 2013,60(12):3542-3552.View ArticleGoogle Scholar
- Yousefi MR, Dougherty ER: Intervention in gene regulatory networks with maximal phenotype alteration. Bioinformatics 2013,29(14):1758-1767. 10.1093/bioinformatics/btt242View ArticleGoogle Scholar
- Pal R, Datta A, Dougherty ER: Robust intervention in probabilistic Boolean networks. IEEE Trans. Signal Process 2008,56(3):1280-1294.MathSciNetView ArticleGoogle Scholar
- Pal R, Datta A, Dougherty ER: Bayesian robustness in the control of gene regulatory networks. IEEE Trans. Signal Process 2009,57(9):3667-3678.MathSciNetView ArticleGoogle Scholar
- Grigoryan AM, Dougherty ER: Bayesian robust optimal linear filters. Signal Process 2001,81(12):2503-2521. 10.1016/S0165-1684(01)00144-XView ArticleGoogle Scholar
- Kumar PR: A survey of some results in stochastic adaptive control. SIAM J. Contr. Optim 1985,23(3):329-380. 10.1137/0323023View ArticleGoogle Scholar
- Bellman R: A problem in the sequential design of experiments. Sankhya: Indian J. Stat 1956,16(3/4):221-229.Google Scholar
- Bellman R, Kalaba R: Dynamic programming and adaptive processes: mathematical foundation. IRE Trans. Automatic Control 1960,AC-5(1):5-10.View ArticleGoogle Scholar
- Silver EA: Markovian decision processes with uncertain transition probabilities or rewards. Technical report, DTIC document, (1963)Google Scholar
- Gozzolino JM, Gonzalez-Zubieta R, Miller RL: Markovian decision processes with uncertain transition probabilities. Technical report, DTIC document (1965)Google Scholar
- Martin JJ: Bayesian Decision Problems and Markov Chains. New York: Wiley; 1967.Google Scholar
- SA Kauffman SA: Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol 1969,22(3):437-467. 10.1016/0022-5193(69)90015-0View ArticleGoogle Scholar
- Faryabi B, Vahedi G, Chamberland J-F, Datta A, Dougherty ER: Intervention in context-sensitive probabilistic Boolean networks revisited. EURASIP J. Bioinform. Syst. Biol 2009.,2009(5):Google Scholar
- Qian X, Dougherty ER: Effect of function perturbation on the steady-state distribution of genetic regulatory networks: optimal structural intervention. IEEE Trans. Signal Process 2008,56(10):4966-4976.MathSciNetView ArticleGoogle Scholar
- Derman C: Finite State Markovian Decision Processes. Orlando: Academic; 1970.Google Scholar
- Satia JK, Lave RE: Markovian decision processes with uncertain transition probabilities. Oper. Res 1973,21(3):728-740. 10.1287/opre.21.3.728MathSciNetView ArticleGoogle Scholar
- Duff MO: Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst (2002)Google Scholar
- Yousefi MR, Datta A, Dougherty ER: Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness. IEEE Trans. Signal Process 2012,60(9):4930-4944.MathSciNetView ArticleGoogle Scholar
- Faure A, Naldi A, Chaouiya C, Thieffry D: Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics 2006,22(14):124-131. 10.1093/bioinformatics/btl210View ArticleGoogle Scholar
- Veliz-Cuba A: Reduction of Boolean network models. J. Theor. Biol 2011, 289: 167-172.MathSciNetView ArticleGoogle Scholar
- Dalton LA, Dougherty ER: Optimal classifiers with minimum expected error within a Bayesian framework - Part I: discrete and gaussian models. Pattern Recogn 2013,46(5):1301-1314. 10.1016/j.patcog.2012.10.018View ArticleGoogle Scholar
- Dalton LA, Dougherty ER: Optimal classifiers with minimum expected error within a Bayesian framework - Part II: properties and performance analysis. Pattern Recogn 2013,46(5):1288-1300. 10.1016/j.patcog.2012.10.019View ArticleGoogle Scholar
- Yoon B-J, Qian X, Dougherty ER: Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Trans. Signal Process 2013,61(9):2256-2266.View ArticleGoogle Scholar
- Dougherty ER, Bittner ML: Epistemology of the Cell: A Systems Perspective on Biological Knowledge. Hoboken: Wiley; 2011.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.