Open Access

Motif Discovery in Tissue-Specific Regulatory Sequences Using Directed Information

  • Arvind Rao1Email author,
  • Alfred O HeroIII1,
  • David J States2 and
  • James Douglas Engel3
EURASIP Journal on Bioinformatics and Systems Biology20072007:13853

https://doi.org/10.1155/2007/13853

Received: 1 March 2007

Accepted: 17 September 2007

Published: 24 December 2007

Abstract

Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites) with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites) and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM) classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42]

Authors’ Affiliations

(1)
Departments of Electrical Engineering and Computer Science and Bioinformatics, University of Michigan
(2)
Departments of Bioinformatics and Human Genetics, University of Michigan
(3)
Department of Cell and Developmental Biology, University of Michigan

References

  1. MacIsaac KD, Fraenkel E: Practical strategies for discovering regulatory DNA sequence motifs. PLoS Computational Biology 2006, 2(4):e36. 10.1371/journal.pcbi.0020036View ArticleGoogle Scholar
  2. Kreiman G: Identification of sparsely distributed clusters of cis -regulatory elements in sets of co-expressed genes. Nucleic Acids Research 2004, 32(9):2889-2900. 10.1093/nar/gkh614View ArticleGoogle Scholar
  3. Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 1997, 268(1):78-94. 10.1006/jmbi.1997.0951View ArticleGoogle Scholar
  4. Li Q, Barkess G, Qian H: Chromatin looping and the probability of transcription. Trends in Genetics 2006, 22(4):197-202. 10.1016/j.tig.2006.02.004View ArticleGoogle Scholar
  5. Kleinjan DA, van Heyningen V: Long-range control of gene expression: emerging mechanisms and disruption in disease. The American Journal of Human Genetics 2005, 76(1):8-32. 10.1086/426833View ArticleGoogle Scholar
  6. Pennacchio LA, Loots GG, Nobrega MA, Ovcharenko I: Predicting tissue-specific enhancers in the human genome. Genome Research 2007, 17(2):201-211. 10.1101/gr.5972507View ArticleGoogle Scholar
  7. King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison RC: Evaluation of regulatory potential and conservation scores for detecting cis -regulatory modules in aligned mammalian genome sequences. Genome Research 2005, 15(8):1051-1060. 10.1101/gr.3642605View ArticleGoogle Scholar
  8. Pennacchio LA, Ahituv N, Moses AM, et al.: In vivo enhancer analysis of human conserved non-coding sequences. Nature 2006, 444(7118):499-502. 10.1038/nature05295View ArticleGoogle Scholar
  9. Kadota K, Ye J, Nakai Y, Terada T, Shimizu K: ROKU: a novel method for indentification of tissue-specific genes. BMC Bioinformatics 2006, 7: 294. 10.1186/1471-2105-7-294View ArticleGoogle Scholar
  10. Schug J, Schuller W-P, Kappen C, Salbaum JM, Bucan M, Stoeckert CJ Jr: Promoter features related to tissue specificity as measured by Shannon entropy. Genome biology 2005, 6(4):R33. 10.1186/gb-2005-6-4-r33View ArticleGoogle Scholar
  11. Werner T: Regulatory networks: linking microarray data to systems biology. Mechanisms of Ageing and Development 2007, 128(1):168-172. 10.1016/j.mad.2006.11.022View ArticleGoogle Scholar
  12. Aerts S, Van Loo P, Thijs G, et al.: TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Research 2005, 33(Web Server):W393-W396. 10.1093/nar/gki354View ArticleGoogle Scholar
  13. Chan BY, Kibler D: Using hexamers to predict cis -regulatory motifs in Drosophila. BMC Bioinformatics 2005, 6: 262. 10.1186/1471-2105-6-262View ArticleGoogle Scholar
  14. Hutchinson GB: The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Computer Applications in the Biosciences 1996, 12(5):391-398.Google Scholar
  15. Sumazin P, Chen G, Hata N, Smith AD, Zhang T, Zhang MQ: DWE: discriminating word enumerator. Bioinformatics 2005, 21(1):31-38. 10.1093/bioinformatics/bth471View ArticleGoogle Scholar
  16. Lakshmanan G, Lieuw KH, Lim K-C, et al.: Localization of distant urogenital system-, central nervous system-, and endocardium-specific transcriptional regulatory elements in the GATA-3 locus. Molecular and Cellular Biology 1999, 19(2):1558-1568.View ArticleGoogle Scholar
  17. Khandekar M, Suzuki N, Lewton J, Yamamoto M, Engel JD: Multiple, distant Gata2 enhancers specify temporally and tissue-specific patterning in the developing urogenital system. Molecular and Cellular Biology 2004, 24(23):10263-10276. 10.1128/MCB.24.23.10263-10276.2004View ArticleGoogle Scholar
  18. Peng H, Long F, Ding C: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27(8):1226-1238.View ArticleGoogle Scholar
  19. Proceedings of NIPS 2006 Workshop on Causality and Feature Selection. [http://research.ihost.com/cws2006/]
  20. Guyon I, Elisseeff A: An introduction to variable and feature selection. The Journal of Machine Learning Research 2003, 3: 1157-1182.MATHGoogle Scholar
  21. Marko H: The bidirectional communication theory—a generalization of information theory. IEEE Transactions on Communications 1973, COM-21(12):1345-1351.View ArticleGoogle Scholar
  22. Massey J: Causality, feedback and directed information. Proceedings of the International Symposium on Information Theory and Its Applications (ISITA '90), Waikiki, Hawaii, USA, November 1990 303-305.Google Scholar
  23. Venkataramanan R, Pradhan SS: Source coding with feed-forward: rate-distortion theorems and error exponents for a general source. IEEE Transactions on Information Theory 2007, 53(6):2154-2179.View ArticleMathSciNetGoogle Scholar
  24. Cover TM, Thomas JA: Elements of Information Theory. John Wiley & Sons, New York, NY, USA; 1991.View ArticleMATHGoogle Scholar
  25. Miller EG: A new class of entropy estimators for multidimensional densities. Proceedings of the IEEE International Conference on Accoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, April 2003 3: 297-300.Google Scholar
  26. Willett RM, Nowak RD: Complexity-regularized multiresolution density estimation. Proceedings of the International Symposium on Information Theory (ISIT '04), Chicago, Ill, USA, June-July 2004 303-305.Google Scholar
  27. Nemenman I, Shafee F, Bialek W: Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14. Edited by: Dietterich TG, Becker S, Ghahramani Z. MIT Press, Cambridge, Mass, USA; 2002.Google Scholar
  28. Paninski L: Estimation of entropy and mutual information. Neural Computation 2003, 15(6):1191-1253. 10.1162/089976603321780272View ArticleMATHGoogle Scholar
  29. Joe H: Relative entropy measures of multivariate dependence. Journal of the American Statistical Association 1989, 84(405):157-164. 10.2307/2289859View ArticleMathSciNetMATHGoogle Scholar
  30. Efron B, Tibshirani RJ: An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, Fla, USA; 1994.Google Scholar
  31. Ramsay JO, Silverman BW: Functional Data Analysis, Springer Series in Statistics. Springer, New York, NY, USA; 1997.View ArticleGoogle Scholar
  32. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B 1995, 57(1):289-300.MathSciNetMATHGoogle Scholar
  33. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. Springer, New York, NY, USA; 2001.View ArticleMATHGoogle Scholar
  34. Kendall MG: A new measure of rank correlation. Biometrika 1938, 30(1/2):81-93. 10.2307/2332226View ArticleMathSciNetMATHGoogle Scholar
  35. NCBI Pubmed URL[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi]
  36. Murphy AM, Thompson WR, Peng LF, Jones L II: Regulation of the rat cardiac troponin I gene by the transcription factor GATA-4. Biochemical Journal 1997, 322, part 2: 393-401.View ArticleGoogle Scholar
  37. Azakie A, Fineman JR, He Y: Myocardial transcription factors are modulated during pathologic cardiac hypertrophy in vivo. The Journal of Thoracic and Cardiovascular Surgery 2006, 132(6):1262-1271.e4. 10.1016/j.jtcvs.2006.08.005View ArticleGoogle Scholar
  38. Vanhoutte P, Nissen JL, Brugg B, et al.: Opposing roles of Elk-1 and its brain-specific usoform, short Elk-1, in nerve growth factor-induced PC12 differentiation. Journal of Biological Chemistry 2001, 276(7):5189-5196. 10.1074/jbc.M006678200View ArticleGoogle Scholar
  39. Olson EN: Regulation of muscle transcription by the MyoD family: the heart of the matter. Circulation Research 1993, 72(1):1-6.View ArticleGoogle Scholar
  40. Dressler GR, Douglass EC: Pax-2 is a DNA-binding protein expressed in embryonic kidney and Wilms tumor. Proceedings of the National Academy of Sciences of the United States of America 1992, 89(4):1179-1183. 10.1073/pnas.89.4.1179View ArticleGoogle Scholar
  41. Grote D, Souabni A, Busslinger M, Bouchard M: Pax2/8-regulated Gata3 expression is necessary for morphogenesis and guidance of the nephric duct in the developing kidney. Development 2006, 133(1):53-61. 10.1242/dev.02184View ArticleGoogle Scholar
  42. Rao A, Hero AO, States DJ, Engel JD: Inference of biologically relevant gene influence networks using the directed information criterion. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), Toulouse, France, May 2006 2: 1028-1031.Google Scholar

Copyright

© Arvind Rao et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.