Skip to main content

Analysis of Gene Coexpression by B-Spline Based CoD Estimation


The gene coexpression study has emerged as a novel holistic approach for microarray data analysis. Different indices have been used in exploring coexpression relationship, but each is associated with certain pitfalls. The Pearson's correlation coefficient, for example, is not capable of uncovering nonlinear pattern and directionality of coexpression. Mutual information can detect nonlinearity but fails to show directionality. The coefficient of determination (CoD) is unique in exploring different patterns of gene coexpression, but so far only applied to discrete data and the conversion of continuous microarray data to the discrete format could lead to information loss. Here, we proposed an effective algorithm, CoexPro, for gene coexpression analysis. The new algorithm is based on B-spline approximation of coexpression between a pair of genes, followed by CoD estimation. The algorithm was justified by simulation studies and by functional semantic similarity analysis. The proposed algorithm is capable of uncovering both linear and a specific class of nonlinear relationships from continuous microarray data. It can also provide suggestions for possible directionality of coexpression to the researchers. The new algorithm presents a novel model for gene coexpression and will be a valuable tool for a variety of gene expression and network studies. The application of the algorithm was demonstrated by an analysis on ligand-receptor coexpression in cancerous and noncancerous cells. The software implementing the algorithm is available upon request to the authors.



  1. 1.

    Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302(5643):249-255. 10.1126/science.1087447

    Article  Google Scholar 

  2. 2.

    Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpresion analysis of human genes across many microarray data sets. Genome Research 2004, 14(6):1085-1094. 10.1101/gr.1910904

    Article  Google Scholar 

  3. 3.

    van Noort V, Snel B, Huynen MA: The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Reports 2004, 5(3):280-284. 10.1038/sj.embor.7400090

    Article  Google Scholar 

  4. 4.

    Carter SL, Brechbühler CM, Griffin M, Bond AT: Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 2004, 20(14):2242-2250. 10.1093/bioinformatics/bth234

    Article  Google Scholar 

  5. 5.

    Graeber TG, Eisenberg D: Bioinformatic identification of potential autocrine signaling loops in cancers from gene expression profiles. Nature Genetics 2001, 29(3):295-300. 10.1038/ng755

    Article  Google Scholar 

  6. 6.

    Herrgård MJ, Covert MW, Palsson BØ: Reconciling gene expression data with known genome-scale regulatory network structures. Genome Research 2003, 13(11):2423-2434. 10.1101/gr.1330003

    Article  Google Scholar 

  7. 7.

    Imoto S, Goto T, Miyano S: Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on Biocomputing 2002, 175-186.

    Google Scholar 

  8. 8.

    Butte AJ, Kohane IS: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing 2000, 418-429.

    Google Scholar 

  9. 9.

    Zhou X, Wang X, Dougherty ER: Construction of genomic networks using mutual-information clustering and reversible-jump Markov-chain-Monte-Carlo predictor design. Signal Processing 2003, 83(4):745-761. 10.1016/S0165-1684(02)00469-3

    Article  MATH  Google Scholar 

  10. 10.

    Kim S, Li H, Dougherty ER, et al.: Can Markov chain models mimic biological regulation? Journal of Biological Systems 2002, 10(4):337-357. 10.1142/S0218339002000676

    Article  MATH  Google Scholar 

  11. 11.

    Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, Dougherty ER: Growing genetic regulatory networks from seed genes. Bioinformatics 2004, 20(8):1241-1247. 10.1093/bioinformatics/bth074

    Article  Google Scholar 

  12. 12.

    Shmulevich I, Dougherty ER, Kim S, Zhang W: Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002, 18(2):261-274. 10.1093/bioinformatics/18.2.261

    Article  Google Scholar 

  13. 13.

    Dougherty ER, Kim S, Chen Y: Coefficient of determination in nonlinear signal processing. Signal Processing 2000, 80(10):2219-2235. 10.1016/S0165-1684(00)00079-7

    Article  MATH  Google Scholar 

  14. 14.

    Li H, Zhan M: Systematic intervention of transcription for identifying network response to disease and cellular phenotypes. Bioinformatics 2006, 22(1):96-102. 10.1093/bioinformatics/bti752

    Article  Google Scholar 

  15. 15.

    Hatzimanikatis V, Lee KH: Dynamical analysis of gene networks requires both mRNA and protein expression information. Metabolic Engineering 1999, 1(4):275-281. 10.1006/mben.1999.0115

    Article  Google Scholar 

  16. 16.

    Prautzsch H, Boehm W, Paluszny M: Bézier and B-Spline Techniques. Springer, Berlin, Germany; 2002.

    Book  MATH  Google Scholar 

  17. 17.

    Ma P, Castillo-Davis CI, Zhong W, Liu JS: A data-driven clustering method for time course gene expression data. Nucleic Acids Research 2006, 34(4):1261-1269. 10.1093/nar/gkl013

    Article  Google Scholar 

  18. 18.

    Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW: Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(36):12837-12842. 10.1073/pnas.0504609102

    Article  Google Scholar 

  19. 19.

    Bar-Joseph Z, Gerber GK, Gifford DK, Jaakkola TS, Simon I: Continuous representations of time-series gene expression data. Journal of Computational Biology 2003, 10(3-4):341-356. 10.1089/10665270360688057

    Article  Google Scholar 

  20. 20.

    Bhasi K, Forrest A, Ramanathan M: SPLINDID: a semi-parametric, model-based method for obtaining transcription rates and gene regulation parameters from genomic and proteomic expression profiles. Bioinformatics 2005, 21(20):3873-3879. 10.1093/bioinformatics/bti624

    Article  Google Scholar 

  21. 21.

    He W: A spline function approach for detecting differentially expressed genes in microarray data analysis. Bioinformatics 2004, 20(17):2954-2963. 10.1093/bioinformatics/bth339

    Article  Google Scholar 

  22. 22.

    Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 2003, 19(4):474-482. 10.1093/bioinformatics/btg014

    Article  Google Scholar 

  23. 23.

    Daub CO, Steuer R, Selbig J, Kloska S: Estimating mutual information using B-spline functions—an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004, 5(1):118. 10.1186/1471-2105-5-118

    Article  Google Scholar 

  24. 24.

    Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 2003, 31(4):e15. 10.1093/nar/gng015

    Article  Google Scholar 

  25. 25.

    Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 2003, 19(10):1275-1283. 10.1093/bioinformatics/btg153

    Article  Google Scholar 

  26. 26.

    Brubaker KD, Corey E, Brown LG, Vessella RL: Bone morphogenetic protein signaling in prostate cancer cell lines. Journal of Cellular Biochemistry 2004, 91(1):151-160. 10.1002/jcb.10679

    Article  Google Scholar 

  27. 27.

    Yang S, Zhong C, Frenkel B, Reddi AH, Roy-Burman P: Diverse biological effect and Smad signaling of bone morphogenetic protein 7 in prostate tumor cells. Cancer Research 2005, 65(13):5769-5777. 10.1158/0008-5472.CAN-05-0289

    Article  Google Scholar 

  28. 28.

    Müller A, Homey B, Soto H, et al.: Involvement of chemokine receptors in breast cancer metastasis. Nature 2001, 410(6824):50-56. 10.1038/35065016

    Article  Google Scholar 

  29. 29.

    Wang JM, Deng X, Gong W, Su S: Chemokines and their role in tumor growth and metastasis. Journal of Immunological Methods 1998, 220(1-2):1-17. 10.1016/S0022-1759(98)00128-8

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Huai Li.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Li, H., Sun, Y. & Zhan, M. Analysis of Gene Coexpression by B-Spline Based CoD Estimation. J Bioinform Sys Biology 2007, 49478 (2007).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Mutual Information
  • Holistic Approach
  • Semantic Similarity
  • Nonlinear Relationship
  • Information Loss