Transition Dependency: A Gene-Gene Interaction Measure for Times Series Microarray Data
- Xin Gao^{1}Email author,
- Daniel Q. Pu^{1} and
- Peter X.-K. Song^{2}
https://doi.org/10.1155/2009/535869
© Xin Gao et al. 2009
Received: 1 May 2008
Accepted: 6 November 2008
Published: 5 January 2009
Abstract
Gene-Gene dependency plays a very important role in system biology as it pertains to the crucial understanding of different biological mechanisms. Time-course microarray data provides a new platform useful to reveal the dynamic mechanism of gene-gene dependencies. Existing interaction measures are mostly based on association measures, such as Pearson or Spearman correlations. However, it is well known that such interaction measures can only capture linear or monotonic dependency relationships but not for nonlinear combinatorial dependency relationships. With the invocation of hidden Markov models, we propose a new measure of pairwise dependency based on transition probabilities. The new dynamic interaction measure checks whether or not the joint transition kernel of the bivariate state variables is the product of two marginal transition kernels. This new measure enables us not only to evaluate the strength, but also to infer the details of gene dependencies. It reveals nonlinear combinatorial dependency structure in two aspects: between two genes and across adjacent time points. We conduct a bootstrap-based test for presence/absence of the dependency between every pair of genes. Simulation studies and real biological data analysis demonstrate the application of the proposed method. The software package is available under request.
1. Introduction
Biological processes in the cell such as biochemical interactions and regulatory activities involve complicated dependency relationships among genes. It is one of the most fundamental aims in biology to build up appropriate models for inferring such dependency relationships. Time series microarray data consist of trajectories of gene expression profiles at multiple time points, which provide an innovative platform for biologists to investigate the dynamic nature of gene dependencies. Such gene-gene dependencies are attributed to some physical interactions among encoded proteins or between an encoded protein and genes, or through coregulation of some common transcription factors. Although from the microarray data, we cannot directly learn about how these physical interactions work, we can still make inference whether or not there is a dependency relationship between two genes' transcriptional changes via some mathematical models. The notion of gene-gene interaction in this article refers to such dependency relationship in the expression levels.
Many methods have been proposed to detect gene-gene interactions using microarray data [1–3]. A traditional approach is to cluster genes using pairwise Pearson or Spearman correlations as a distance measure [4–6]. Pearson correlation captures linear dependencies and depends on normality assumption. Spearman correlation measures the concordance in the ranks of data and is invariant to any monotonic transformations on the data. As it does not rely on any normality or linearity assumptions, it is often used as a robust statistic to identify the coexpression patterns in genes. When applied on a pair of time series data, calculating both Pearson and Spearman correlations implicitly assumes that all the paired measurements across different time points are independent replications. This calculation is too simplistic to adequately describe the complex relationship between two time series, in which the dependency may be beyond a linear or monotone pattern. In the literature, there are several extensions of Pearson correlation in the context of time series data. For example, Dubin and Müller [7] introduced the notion of dynamic correlation (DC) across two time series, which, however, is not sensitive to autoregressive dependency. Another commonly used correlation measure in time series is cross-correlation function (CCF) proposed in [8], which calculates a linear correlation across lagged time points. Nevertheless, neither DC nor CCF is deemed to measure nonlinear dependencies.
In this article, we invoke hidden Markov models (HMMs) that give rise to a gene-gene dependency measure. The HMMs framework allows us to make a few new developments that overcome some of the key difficulties in the existing methodologies discussed above. We propose a new dependency measure based on transition probabilities across two Markovian processes, which allows us to study nonlinear relationships among genes. An intuition behind the proposed approach is that we intend to track time-varying behaviors of interactions among genes. This dynamic relationship seems naturally reflected by the transitional mechanism described in the HMMs. Thus, the dependency between two genes can be characterized via the difference between their joint transition matrix and the product of the two corresponding marginal transition matrices. In spirit, this idea is very similar to the concept of mutual information (MI) [9], which measures the difference between the sum of marginal entropies and the bivariate joint entropies. When the two random variables are independent, the MI takes zero value. Both approaches are based directly on probability arguments and both can detect nonlinear relationships among interacting genes. Unfortunately, the MI is only defined for two random variables and cannot be readily applied to time series data. In contrast, the proposed transition dependency is developed specifically to evaluate nonlinear dependencies between two time series. As shown in Section 2, this dependency measure is rich in detail describing how a pair of genes influence each other over time. We will use this dependency measure to perform a screening analysis that selects significant pairwise dependencies among all the gene pairs at a reasonable false discovery rate. The related statistical significance is given by a bootstrap-based -test.
2. Method
2.1. Definition of Transition Dependency Measure
We now introduce a new dependency measure across two Markovian processes. Consider a bivariate HMMs with discrete hidden states. Let the collection of bivariate hidden states be where for a pair of genes. Given the hidden state or the conditional distribution of is denoted as or respectively. Here, depending on the observation process , the hidden state may have different interpretations. For a one-sample experiment, could stand for a normalized measurement of gene expression level or hybridization intensity, and the corresponding hidden states may be labelled as "upregulated" (UR = 1) and "downregulated" (DR = 0), respectively. In the context of two-sample comparative experiment, could stand for a measurement of difference in expression values across two experiment conditions for gene at time Then, the hidden states can be regarded as "differentially expressed" (DE = 1) and "not differentially expressed" (NDE = 0) as in [10]. Many methods are available to estimate the conditional distributions and including nonparametric empirical Bayes method in [11], parametric empirical Bayes method in [10], and EM method for finite mixture models [12].
with and denoting the two marginal transition matrices and denoting the Kronecker product of two matrices. This transition dependency matrix measures the deviation of the actual joint transition matrix from the expected joint transition matrix under the independence assumption. It has been proved by Sandland [13] that if the two processes are independent, then all the entries of matrix should be equal to zero. In other words, when two processes are dependent, this cross-dependency matrix would fully characterize the strength of their dependency. The continuous analog of this dependency measure between two point processes has been proposed in [14].
To interpret the transition dependency matrix , here we give two examples.
Example 1. Each entry of the dependency matrix corresponds to the dependency in different direction and has its own biological interpretation. For instance, if the hidden states of DE and NDE satisfy then gene 2 has an induction effect on gene 1. This means that the DE state of gene 2 enhances the probability of gene 1 switching from NDE state to DE state. The contrary is inhibition effect, where the hidden states satisfy This implies that the DE state of gene 2 reduces the probability of gene 1 changing from NDE state to DE state.
It is evident that there is a large discrepancy between the joint transition matrix (2) and the product of the marginal transitions (3). The resulting nonzero matrix provides the evidence for a strong dependency between the two genes. The failure of the traditional correlation measure to detect the dependency here is due to the fact that it essentially relies on the concordant and discordant changes between two trajectories which are clearly absent in this type of nonlinear dependency relationship.
2.2. Testing for Pairwise Dependency
As the hidden state vectors are unobserved, the EM algorithm is invoked to carry out the maximum likelihood estimation, which iterates the following two steps till convergence.
E Step: given , we calculate two conditional expectations that are the expected numbers of transitions: and This is achieved by using the forward-backward algorithm especially designed for the HMMs model [16].
As usual, multiple starting points can be used to achieve the global maximum instead of local stationary points. To test for the null hypothesis , we can tabulate relevant data in a form of contingence table, where cell count denotes the total number of transitions between states and Let be the number of marginal transitions from to for gene 1, and let be the number of marginal transitions from to for gene 2, with or Under the , the expected frequency of transitions is where denotes the th element of vector or Thus a chi-squared-type test statistic [17] can be formed as
Even when the s are available, because of the autocorrelations between the transitions across time points, the limiting distribution of is not a chi-squared distribution of 9 degrees of freedom. Furthermore, all the counts are not observed, we have to estimate them. Upon the convergence of the EM algorithm, we may obtain the estimated counts of transitions between each pair of states: The resulting statistic is denoted by , with in place of in the statistic. Thus the estimation procedure brings extra random variation into the statistic .
To assess the significance of statistic, we invoke the bootstrap method to generate its empirical null distribution. We randomly resample the bivariate hidden Markovian process under the null hypothesis (cross-independence) as follows. From the EM algorithm, we estimate the marginal transition matrices under the null hypothesis. For each run of bootstrap sampling, using and the estimated marginal transition matrices, we randomly generate bivariate Markovian processes where the two processes of hidden states are cross-independent. Based on the sample path of the we then randomly generate the measurement process according to the conditional distributions. Subsequently, we discard , treat the generated as the bootstrap data, and invoke the EM algorithm. Utilizing the output of the EM estimates based on the bootstrap data, we can calculate a value of statistic, which can be viewed as a random draw from the null distribution of the statistic. By generating a large number of bootstrap replicates, we can obtain the empirical distribution of the null statistic which provides an accurate approximation to the null distribution of statistic.
2.3. Pairwise Analysis
In microarray data, the expression trajectories of genes can be modeled as an -variate times series data, where indexes for the sample replicate, indexes for the th variate (gene), and indexes for the time point. In practice, two kinds of pairwise analyses may be considered: (1) given a specific gene of interest, and the task is to infer all the genes that interact with this gene; (2) test all pairs exhaustively, and select the most significant pairwise dependencies for a further analysis.
In both scenarios, a list of potentially promising interactions are determined while the false discovery rate (FDR) is under control. False discovery rate (FDR) is an error measure used in the context of multiple hypotheses testing. Given a family of simultaneously tested null hypotheses of which are true. Let denote the number of rejected hypotheses, and let denote the number of true hypotheses erroneously rejected. Let denote when and otherwise. Then the FDR is defined as , the expected rate of false discovery. As shown in [18], the FDR of a multiple comparison procedure is always smaller than or equal to the familywise error rate (FWER). To control the FDR, we proceed as follows. For each pair we construct the test statistic, and also generate bootstrap-based null statistics To deal with the issue that test statistics are correlated, we follow Reiner et al. [19] to form the null distribution by collapsing all the null statistics together. Thus the -value of each pairwise test can be obtained by referring to the empirical null distribution. Given the ordered -values, the multiplicity adjusted -value employed by the Benjamini-Hochberg (BH) procedure [18] is where denotes the total number of tests under screening. Pairs with adjusted -values less than a prespecified FDR are declared to be significant and selected for a further consideration. Although this screening procedure potentially contains some false positives, it is computationally efficient and provides a promising pool of candidate relationships for a future analysis.
3. Results on Simulated Data
In our simulation study, a few scenarios were given via the combinations of different parameter values, including the deviation parameter and , the number of replicates , and , and the number of time points and . For each pair of genes, or bootstrap samples were generated to form the null statistics, and they were then collapsed together to form the empirical null distribution [19]. The conditional distributions and were chosen to be and , respectively. To test the null hypothesis , our HMMs approach was compared with two correlation-measure-based methods, namely, the sample dynamic correlation (DC) method and the classical cross-correlation function (CCF) method in the theory of multivariate time series analysis. Both DC and CCF methods used their respective empirical distribution from the bootstrap samples to obtain the corresponding -values under the null hypothesis .
Replicates | Time points | DC | CCF | DC | CCF | |||
2 | 7 | 0.00 | 0.084 | 0.057 | 0.038 | 0.070 | 0.054 | 0.039 |
0.05 | 0.135 | 0.092 | 0.038 | 0.120 | 0.077 | 0.045 | ||
0.10 | 0.249 | 0.198 | 0.061 | 0.260 | 0.183 | 0.073 | ||
0.15 | 0.472 | 0.388 | 0.098 | 0.448 | 0.369 | 0.092 | ||
2 | 10 | 0.00 | 0.054 | 0.053 | 0.049 | 0.044 | 0.045 | 0.033 |
0.05 | 0.131 | 0.101 | 0.052 | 0.118 | 0.113 | 0.045 | ||
0.10 | 0.281 | 0.256 | 0.081 | 0.298 | 0.286 | 0.100 | ||
0.15 | 0.583 | 0.561 | 0.150 | 0.577 | 0.561 | 0.157 | ||
3 | 7 | 0.00 | 0.071 | 0.055 | 0.043 | 0.058 | 0.053 | 0.051 |
0.05 | 0.118 | 0.120 | 0.056 | 0.127 | 0.109 | 0.058 | ||
0.10 | 0.302 | 0.284 | 0.109 | 0.313 | 0.288 | 0.110 | ||
0.15 | 0.594 | 0.586 | 0.168 | 0.564 | 0.567 | 0.144 | ||
3 | 10 | 0.00 | 0.049 | 0.058 | 0.042 | 0.060 | 0.052 | 0.059 |
0.05 | 0.163 | 0.131 | 0.072 | 0.133 | 0.127 | 0.061 | ||
0.10 | 0.401 | 0.388 | 0.123 | 0.396 | 0.384 | 0.135 | ||
0.15 | 0.766 | 0.735 | 0.253 | 0.754 | 0.732 | 0.256 | ||
5 | 7 | 0.00 | 0.056 | 0.051 | 0.036 | 0.060 | 0.050 | 0.037 |
0.05 | 0.172 | 0.141 | 0.073 | 0.165 | 0.133 | 0.066 | ||
0.10 | 0.488 | 0.478 | 0.155 | 0.468 | 0.452 | 0.153 | ||
0.15 | 0.822 | 0.823 | 0.298 | 0.843 | 0.841 | 0.294 | ||
5 | 10 | 0.00 | 0.042 | 0.070 | 0.051 | 0.054 | 0.038 | 0.052 |
0.05 | 0.231 | 0.196 | 0.087 | 0.218 | 0.198 | 0.083 | ||
0.10 | 0.624 | 0.648 | 0.227 | 0.638 | 0.647 | 0.260 | ||
0.15 | 0.946 | 0.938 | 0.456 | 0.949 | 0.953 | 0.463 |
Empirical type I error rates and power of the proposed bootstrap-based (BS) test versus the dynamic correlation (DC) and cross-correlation function (CCF) to detect pairwise dependency under the dependency Pattern II. The power refers to the probability of detecting the interaction when the interaction really exists.
Replicates | Time points | DC | CCF | DC | CCF | |||
2 | 7 | 0.00 | 0.084 | 0.057 | 0.038 | 0.076 | 0.038 | 0.032 |
0.05 | 0.077 | 0.048 | 0.036 | 0.066 | 0.051 | 0.042 | ||
0.10 | 0.078 | 0.053 | 0.045 | 0.095 | 0.059 | 0.041 | ||
0.15 | 0.125 | 0.046 | 0.038 | 0.122 | 0.044 | 0.038 | ||
2 | 10 | 0.00 | 0.059 | 0.053 | 0.047 | 0.058 | 0.033 | 0.039 |
0.05 | 0.087 | 0.050 | 0.040 | 0.063 | 0.039 | 0.043 | ||
0.10 | 0.086 | 0.047 | 0.032 | 0.102 | 0.043 | 0.040 | ||
0.15 | 0.152 | 0.049 | 0.039 | 0.157 | 0.048 | 0.037 | ||
3 | 7 | 0.00 | 0.059 | 0.052 | 0.028 | 0.074 | 0.045 | 0.040 |
0.05 | 0.085 | 0.048 | 0.045 | 0.070 | 0.049 | 0.042 | ||
0.10 | 0.106 | 0.057 | 0.052 | 0.099 | 0.052 | 0.040 | ||
0.15 | 0.137 | 0.037 | 0.031 | 0.137 | 0.053 | 0.041 | ||
3 | 10 | 0.00 | 0.049 | 0.054 | 0.041 | 0.045 | 0.041 | 0.045 |
0.05 | 0.081 | 0.050 | 0.042 | 0.051 | 0.047 | 0.043 | ||
0.10 | 0.116 | 0.048 | 0.037 | 0.126 | 0.040 | 0.032 | ||
0.15 | 0.222 | 0.045 | 0.036 | 0.217 | 0.043 | 0.051 | ||
5 | 7 | 0.00 | 0.065 | 0.049 | 0.035 | 0.059 | 0.056 | 0.050 |
0.05 | 0.073 | 0.055 | 0.044 | 0.071 | 0.057 | 0.051 | ||
0.10 | 0.131 | 0.058 | 0.044 | 0.114 | 0.050 | 0.043 | ||
0.15 | 0.203 | 0.053 | 0.052 | 0.239 | 0.049 | 0.039 | ||
5 | 10 | 0.00 | 0.042 | 0.049 | 0.048 | 0.052 | 0.049 | 0.047 |
0.05 | 0.094 | 0.058 | 0.058 | 0.064 | 0.045 | 0.040 | ||
0.10 | 0.186 | 0.044 | 0.054 | 0.181 | 0.041 | 0.049 | ||
0.15 | 0.475 | 0.058 | 0.042 | 0.516 | 0.060 | 0.054 |
Why did the two correlation-measure-based methods perform well under the dependency Pattern I, but very poorly under Pattern II? This is because the correlation essentially measures the discordance and concordance between the joint expression states. For example, given the transition matrix under the null distribution specified by , when the deviation increases from to the stationary distribution of on the four possible pairs and will change from to under Pattern I. Apparently, such a stationary distribution allocates more probabilities on the concordance pairs , namely 0.32 and 0.37, and , namely 0.37 and 0.42. This causes high correlation easy to detect. In contrast, Pattern II behaves strikingly different. When increases from to , the stationary distribution of remains almost the same, from to . The stationary distribution takes almost equal probabilities on these four pairs. The evenly distributed concordant and discordant pairs lead to low correlations. This explains the poor power of the correlation-measure-based methods to detect dependency Pattern II.
4. Results on Biological Data
4.1. Apoptosis Data Analysis
To investigate the practical performance of the proposed method, we consider the neutrophil apoptosis microarray dataset produced by Kobayashi et al. [20]. The neutrophils are important cellular component of the innate immune system in humans. It is essential that neutrophils undergo spontaneous apoptosis as a mechanism to facilitate the stability of the immune system. To get a global view of the molecular events that regulate neutrophil survival and apoptosis, Kobayashi et al. [20] studied the global expression in human neutrophils during spontaneous apoptosis cultured with and without human GM-CSF, which is known to prolong neutrophils survival against apoptosis. Neutrophils were isolated from venous blood of three healthy individuals and were cultured in the medium with and without 100 ng/mL GM-CSF for up to 24 hours. At time points, 3 hours, 6 hours, 12 hours, 18 hours, and 24 hours, the expression level of 12 625 genes were measured using GeneChip hybridization technique. The time course data we analyzed contains 30 samples comparing treatment (+GM-CSF) versus control ( GM-CSF) at the corresponding 5-time points in three biological replicates.
To use this dataset and understand the gene regulatory network, as a first step, we wish to find out how genes are interacting with each other. We selected CD44 as our gene of interest and set out to find all the genes that are interacting with CD44 during the neutrophils apoptosis. CD44 is an important gene which encodes a cell surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. It is expected that CD44 interacts with a variety of genes to facilitate its various functions. Furthermore, CD44 is an important tumor marker which is released by cancerous cells and could be detected by blood tests to detect the presence of cancer. To provide a list of candidate genes which interact with CD44 can provide more insight into the biological mechanism underlying tumor progression.
To apply the proposed HMMs, we first took to be the absolute difference between the th biological replicate's expression levels under the two experiment conditions from gene evaluated at time . Next we need to determine the conditional distributions and given the NDE status and DE status. Nonparametric empirical Bayes method in [11] was employed to estimate these conditional distributions, both of which were fixed in all the subsequent hypothesis tests for computational convenience. It assumes that the underlying distribution for the statistic is a mixture distribution containing two components: where represent the components corresponding to DE state and NDE state , and and are the probability that an observed is sampled from and , respectively. Then based on one can make posterior inference whether the specific observation is from state or state Unlike the classical Bayes approach, which assumes specific parametric forms of and the nonparametric empirical Bayes uses the data to estimate the densities of and First the data is randomly permuted across the two-sample experimental conditions and the null statistic is generated. By a great number of permutations, we could obtain a large random sample from Therefore, we can estimate the densities of both and using nonparametric methods, such as the kernel estimation.
The list of the15 most significant candidate genes having interactions with CD44
Probe | Gene title | Literature | ||
---|---|---|---|---|
38336_at | 0.00110891 | FERM domain containing 4B (FRMD4B) | [21] | |
947_at | 0.04692277 | 0.00011089 | Gene function unknown | |
39237_at | 0.51548517 | 0.00017426 | Mitogen-activated protein kinase 3 (MAPKAPK3) | [22] |
40968_at | 0.00367525 | 0.00017426 | Suppressor of cytokine signaling 3 (SOCS3) | [23] |
31491_s_at | 0.00107723 | 0.00019010 | Caspase 8 (CASP8) | [24] |
36985_at | 0.02434851 | 0.00020594 | Isopentenyl-diphosphate delta isomerase (IDI1) | |
36344_at | 0.01527129 | 0.00022178 | Coagulation factor II (thrombin) receptor-like 1 (F2RL1) | |
1441_s_at | 0.01954851 | 0.00023762 | Tumor necrosis factor receptor superfamily, member 6 (FAS) | [25] |
31792_at | 0.00267723 | 0.00023762 | Annexin A3 (ANXA3) | [26] |
33289_f_at | 0.00365941 | 0.00023762 | Zinc finger protein 263 (ZNF263) | |
953_g_at | 0.01698218 | 0.00023762 | Gene function unknown | |
35799_at | 0.02151287 | 0.00025347 | DnaJ (Hsp40) homolog, subfamily B, member 9 (DNAJB9) | |
2035_s_at | 0.00327921 | 0.00026931 | Enolase 1, (alpha) (ENO1) | [27] |
31318_at | 0.03653069 | 0.00028515 | Gene function unknown | |
296_at | 0.03504159 | 0.00030099 | Gene function unknown |
The resulted test statistics is and the -value is . The strong dependency between the genes CD44 and MAPKAPK3 is revealed by the big discrepancy between the expected and the actual transition matrices. The joint state of the two genes has much smaller probability than expected to transit to states and In comparison, the estimated Pearson's correlation is only with the insignificant -value of .
This dependency pattern is very similar to Pattern I considered in our simulation. It implies that caspase 8 and CD44 are involved in the same pathway of apoptosis and they tend to be in the same states of DE or NDE, depending on whether the pathway is initiated or not. This discovery only informs us about the existence of dependency but does not provide information about the physical mechanism. Searching through the literature, we found that this dependency is caused by the event that the CD44 encoded protein ligates with A3D8, acts as a transcription factor, and initiates the transcription of caspase 8 [24]. This discovery is of great biological implication in the sense that it unveils a new apoptosis pathway and sheds light to a potential therapeutic drug—A3D8 which ligates to CD44 and initiates caspase 8 in the pathway—to treat leukemia patients who are resistant to traditional chemotherapy agents ATRA and As2O3. Based solely on gene expression profiling without extensive wet lab work, we rediscovered that gene caspase 8's transcription level is dependent on that of CD44, with stronger statistical significance compared to the dynamic correlation method. This demonstrates the power of the proposed method of detecting biological meaningful dependencies.
4.2. T-Cell Data Analysis
5. Discussion
Detecting gene-gene interaction is one of the most important tasks in the study of system biology. The advent of time series microarray data challenges statisticians to develop a statistical machinery to extract and summarize the dependency information embedded in the data. In this paper, we characterize the dependency relationships based on the dynamics of a hidden Markov model, so that we are able to monitor the gene-gene interactions through transitional probabilities. The proposed methodology is not restricted to the microarray dataset we focus on in this article. It can be viewed as a general approach to analyze time series data with complicated dependency structure, such as brain image data and proteomics data. The method can be extended in a few directions. One limitation of the proposed method is the assumption of stationarity on the hidden process. This is more constrained by the practical limitations of small replications of microarray data rather than theoretical considerations. If the number of replications at each time point is greatly increased, we could relax the homogeneous assumption and model different transition kernels at different time points.
As requested by one of the referees, we compare our method and other existing methods in this concluding paragraph, highlighting the advantages and limitations of each method. Dynamic Bayesian networks (DBNs) have been proposed to infer directed graphs from time series data [32]. This method maximizes the Bayesian scoring function over alternative network models. A prior knowledge or assumption of the hierarchical structure is needed. Furthermore, it is computationally prohibitive to go through all the possible models as the cardinality of the model space grows exponentially with the number of genes. Therefore, the DBNs method is not capable of handling large networks. Linear dynamic system is also proposed [31] to model gene networks based on time series data. It is essentially a linear autoregressive model allowing extra hidden variables. It assumes the linear relationship between genes which may not be tenable in practice. In contrast, our method focuses on exploring pairwise dependencies between the genes. The computational complexity is much less demanding than the DBNs method. This enables us to analyze much larger datasets than the DBNs method. Compared to linear dynamic system method, our method can model nonlinear and combinatorial relationships among genes, which is more realistic than the linear assumptions. In conclusion, the computational simplicity in the algorithm, the capability of handling large dataset, modeling nonlinear relationships, and no prior assumptions of the network structure are the advantages of our method. Nevertheless, the limitation of our method is that it only produces undirected graph. In practice, our method can be used as the first screening method to identify the potential candidate edges. Once we narrow down our candidate genes list to a small set, we can use the DBNs method to study a finer structure of the network with additional details such as directions.
Declarations
Acknowledgment
This research was supported by the Natural Sciences and Engineering Research Council of Canada Grant.
Authors’ Affiliations
References
- Alm E, Arkin AP: Biological networks. Current Opinion in Structural Biology 2003, 13(2):193-202. 10.1016/S0959-440X(03)00031-9View ArticleGoogle Scholar
- Zhang J, Ji Y, Zhang L: Extracting three-way gene interactions from microarray data. Bioinformatics 2007, 23(21):2903-2909. 10.1093/bioinformatics/btm482View ArticleMathSciNetGoogle Scholar
- Nakahara H, Nishimura S-I, Inoue M, Hori G, Amari S-I: Gene interaction in DNA microarray data is decomposed by information geometric measure. Bioinformatics 2003, 19(9):1124-1131. 10.1093/bioinformatics/btg098View ArticleGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 1998, 95(25):14863-14868. 10.1073/pnas.95.25.14863View ArticleGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature Genetics 1999, 22(3):281-285. 10.1038/10343View ArticleGoogle Scholar
- Ji Y, Wu C, Liu P, Wang J, Coombes KR: Applications of beta-mixture models in bioinformatics. Bioinformatics 2005, 21(9):2118-2122. 10.1093/bioinformatics/bti318View ArticleGoogle Scholar
- Dubin JA, Müller H-G: Dynamical correlation for multivariate longitudinal data. Journal of the American Statistical Association 2005, 100(471):872-881. 10.1198/016214504000001989View ArticleMathSciNetMATHGoogle Scholar
- Haugh LD: Checking the independence of two covariance-stationary time series: a univariate residual cross-correlation approach. Journal of the American Statistical Association 1976, 71(354):378-385. 10.2307/2285318View ArticleMathSciNetMATHGoogle Scholar
- Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nature Genetics 2005, 37(4):382-390. 10.1038/ng1532View ArticleGoogle Scholar
- Yuan M, Kendziorski C: Hidden Markov models for microarray time course data in multiple biological conditions. Journal of the American Statistical Association 2006, 101(476):1323-1332. 10.1198/016214505000000394View ArticleMathSciNetMATHGoogle Scholar
- Efron B, Tibshirani R, Storey JD, Tusher V: Empirical bayes analysis of a microarray experiment. Journal of the American Statistical Association 2001, 96(456):1151-1160. 10.1198/016214501753382129View ArticleMathSciNetMATHGoogle Scholar
- Leisch F: FlexMix: a general framework for finite mixture models and latent class regression in R. Journal of Statistical Software 2004, 11(8):1-18.View ArticleGoogle Scholar
- Sandland RL: Application of methods of testing for independence between two Markov chains. Biometrics 1976, 32(3):629-636. 10.2307/2529751View ArticleMATHGoogle Scholar
- Allard D, Brix A, Chadoeuf J: Testing local independence between two point processes. Biometrics 2001, 57(2):508-517. 10.1111/j.0006-341X.2001.00508.xView ArticleMathSciNetMATHGoogle Scholar
- Luse DS, Samkurashvili I: The transition from initiation to elongation by RNA polymerase II. In Proceedings of the 63rd Cold Spring Harbor Symposium on Quantitative Biology (CSH '98), Cold Spring Harbor, NY, USA, June 1998. Edited by: Stillman B. CSHL Press; 289-300.Google Scholar
- Baum LE, Petrie T, Soules G, Weiss N: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 1970, 41(1):164-171. 10.1214/aoms/1177697196View ArticleMathSciNetMATHGoogle Scholar
- Agresti A: Categorical Data Analysis. John Wiley & Sons, New York, NY, USA; 2002.View ArticleMATHGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B 1995, 57(1):289-300.MathSciNetMATHGoogle Scholar
- Reiner A, Yekutieli D, Benjamini Y: Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 2003, 19(3):368-375. 10.1093/bioinformatics/btf877View ArticleGoogle Scholar
- Kobayashi SD, Voyich JM, Whitney AR, DeLeo FR: Spontaneous neutrophil apoptosis and regulation of cell survival by granulocyte macrophage-colony stimulating factor. Journal of Leukocyte Biology 2005, 78(6):1408-1418. 10.1189/jlb.0605289View ArticleGoogle Scholar
- Sun C-X, Robb VA, Gutmann DH: Protein 4.1 tumor suppressors: getting a FERM grip on growth regulation. Journal of Cell Science 2002, 115(21):3991-4000. 10.1242/jcs.00094View ArticleGoogle Scholar
- Weg-Remers S, Ponta H, Herrlich P, König H: Regulation of alternative pre-mRNA splicing by the ERK MAP-kinase pathway. The EMBO Journal 2001, 20(24):4194-4203.View ArticleGoogle Scholar
- Cornish AL, Chong MM, Davey GM, et al.:Suppressor of cytokine signaling-1 regulates signaling in response to interleukin-2 and other c-dependent cytokines in peripheral T cells. Journal of Biological Chemistry 2003, 278(25):22755-22761. 10.1074/jbc.M303021200View ArticleGoogle Scholar
- Maquarre E, Artus C, Gadhoum Z, Jasmin C, Smadja-Joffe F, Robert-Lézénès J: CD44 ligation induces apoptosis via caspase- and serine protease-dependent pathways in acute promyelocytic leukemia cells. Leukemia 2005, 19(12):2296-2303. 10.1038/sj.leu.2403944View ArticleGoogle Scholar
- Nakano K, Saito K, Mine S, Matsushita S, Tanaka Y: Engagement of CD44 up-regulates Fas Ligand expression on T cells leading to activation-induced cell death. Apoptosis 2007, 12(1):45-54. 10.1007/s10495-006-0488-8View ArticleGoogle Scholar
- Chintagari NR, Jin N, Wang P, Narasaraju TA, Chen J, Liu L: Effect of cholesterol depletion on exocytosis of alveolar type II cells. American Journal of Respiratory Cell and Molecular Biology 2006, 34(6):677-687. 10.1165/rcmb.2005-0418OCView ArticleGoogle Scholar
- Iczkowski KA, Shanks JH, Allsbrook WC, et al.:Small cell carcinoma of urinary bladder is differentiated from urothelial carcinoma by chromogranin expression, absence of CD44 variant 6 expression, a unique pattern of cytokeratin expression, and more intense -enolase expression. Histopathology 1999, 35(2):150-156. 10.1046/j.1365-2559.1999.00715.xView ArticleGoogle Scholar
- Cheng C, Yaffe MB, Sharp PA: A positive feedback loop couples Ras activation and CD44 alternative splicing. Genes & Development 2006, 20(13):1715-1720. 10.1101/gad.1430906View ArticleGoogle Scholar
- Singh A, Jayaraman A, Hahn J: Effect of SHP-2, SOCS3, and PP2 on IL-6 signal transduction in hepatocytes. Proceedings of American Control Conference (ACC '06), Minneapolis, Minn, USA, June 2006 6.Google Scholar
- Rawlings JS, Rosler KM, Harrison DA: The JAK/STAT signaling pathway. Journal of Cell Science 2004, 117(8):1281-1283. 10.1242/jcs.00963View ArticleGoogle Scholar
- Rangel C, Angus J, Ghahramani Z, et al.: Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 2004, 20(9):1361-1372. 10.1093/bioinformatics/bth093View ArticleGoogle Scholar
- Perrin B-E, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'Alché-Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics 2003, 19(supplement 2):ii138-ii148.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.