Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation

Chiu, David KY; Wang, Yan

doi:10.1155/BSB/2006/35809

Research Article
Open access
Published: 13 August 2006

Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation

David KY Chiu¹ &
Yan Wang¹

EURASIP Journal on Bioinformatics and Systems Biology volume 2006, Article number: 35809 (2006) Cite this article

2333 Accesses
7 Citations
Metrics details

Abstract

Decomposing a biological sequence into its functional regions is an important prerequisite to understand the molecule. Using the multiple alignments of the sequences, we evaluate a segmentation based on the type of statistical variation pattern from each of the aligned sites. To describe such a more general pattern, we introduce multipattern consensus regions as segmented regions based on conserved as well as interdependent patterns. Thus the proposed consensus region considers patterns that are statistically significant and extends a local neighborhood. To show its relevance in protein sequence analysis, a cancer suppressor gene called p53 is examined. The results show significant associations between the detected regions and tendency of mutations, location on the 3D structure, and cancer hereditable factors that can be inferred from human twin studies.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27]

References

Chiu DKY, Kolodziejczak T: Inferring consensus structure from nucleic acid sequences. Computer Applications in the Biosciences 1991,7(3):347-352.
Google Scholar
Chiu DKY, Harauz G: A method for inferring probabilistic consensus structure with applications to molecular sequence data. Pattern Recognition 1993,26(4):643-654. 10.1016/0031-3203(93)90117-F
Article Google Scholar
Chiu DKY, Lui TWH: Integrated use of multiple interdependent patterns for biomolecular sequence analysis. International Journal of Fuzzy Systems 2002,4(3):766-775.
Google Scholar
Chiu DKY, Wong AKC: Multiple pattern associations for interpreting structural and functional characteristics of biomolecules. Information Sciences 2004,167(1–4):23-39.
Article MATH MathSciNet Google Scholar
Chiu DKY, Lui TWH: A multiple-pattern biosequence analysis method for diverse source association mining. Applied Bioinformatics 2005,4(2):85-92. 10.2165/00822942-200504020-00002
Article Google Scholar
Greenblatt MS, Bennett WP, Hollstein M, Harris CC: Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. Cancer Research 1994,54(18):4855-4878.
Google Scholar
Boys RJ, Henderson DA: A Bayesian approach to DNA sequence segmentation. Biometrics 2004, 60: 573-588. 10.1111/j.0006-341X.2004.00206.x
Article MATH MathSciNet Google Scholar
Li W, Bernaola-Galván P, Haghighi F, Grosse I: Applications of recursive segmentation to the analysis of DNA sequences. Computers and Chemistry 2002,26(5):491-510. 10.1016/S0097-8485(02)00010-4
Article Google Scholar
Chiu DKY, Rao G: The 2-level pattern analysis of genome comparisons. WSEAS Transactions on Biology and Biomedicine 2006,3(3):167-174.
Google Scholar
Yan W: A segmentation algorithm for consensus regions in biosequences, M.S. thesis. Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada; 2003.
Google Scholar
Zhang J: Analysis of information content for biological sequences. Journal of Computational Biology 2002,9(3):487-503. 10.1089/106652702760138583
Article Google Scholar
Lichtenstein P, Holm NV, Verkasalo PK, et al.: Environmental and heritable factors in the causation of cancer: analyses of cohorts of twins from Sweden, Denmark, and Finland. New England Journal of Medicine 2000,343(2):78-85. 10.1056/NEJM200007133430201
Article Google Scholar
Magnusson PKE, Sparen P, Gyllensten UB: Genetic link to cervical tumours. Nature 1999,400(6739):29-30. 10.1038/21801
Article Google Scholar
Wong AKC, Liu TS, Wang CC: Statistical analysis of residue variability in cytochrome c. Journal of Molecular Biology 1976,102(2):287-295. 10.1016/S0022-2836(76)80054-X
Article Google Scholar
Shannon CE: A mathematical theory of communication. Bell System Technical Journal 1948, 27: 379-423, 623–656. reprinted in C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, Ill, USA, 1949
Article MATH MathSciNet Google Scholar
Gatlin LL: The information content of DNA. Journal of Theoretical Biology 1966,10(2):281-300. 10.1016/0022-5193(66)90127-5
Article Google Scholar
Wong AKC, Wang Y: High-order pattern discovery from discrete-valued data. IEEE Transactions on Knowledge and Data Engineering 1997,9(6):877-893. 10.1109/69.649314
Article Google Scholar
Haberman SJ: The analysis of residuals in cross-classified tables. Biometrics 1973, 29: 205-220. 10.2307/2529686
Article Google Scholar
Kalbfleisch JG: Probability and Statistical Inference, Vol. 2: Statistical Inference. 2nd edition. Springer, New York, NY, USA; 1985.
Google Scholar
Berman HM, Westbrook J, Feng Z, et al.: The protein data bank. Nucleic Acids Research 2000,28(1):235-242. 10.1093/nar/28.1.235
Article Google Scholar
Hollstein M, Sidransky D, Vogelstein B, Harris CC: p53 mutations in human cancers. Science 1991,253(5015):49-53. 10.1126/science.1905840
Article Google Scholar
Levine AJ, Momand J, Finlay CA: The p53 tumour suppressor gene. Nature 1991,351(6326):453-456. 10.1038/351453a0
Article Google Scholar
Levine AJ: p53, the cellular gatekeeper for growth and division. Cell 1997,88(3):323-331. 10.1016/S0092-8674(00)81871-1
Article Google Scholar
Boeckmann B, Bairoch A, Apweiler R, et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 2003,31(1):365-370. 10.1093/nar/gkg095
Article Google Scholar
Cho Y, Gorina S, Jeffrey PD, Pavletich NP: Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science 1994,265(5170):346-355. 10.1126/science.8023157
Article Google Scholar
Hamroun D, Kato S, Ishioka C, Claustres M, Beroud C, Soussi T: The UMD TP53 database and website: update and revisions. Human Mutation 2005,27(1):14-20.
Article Google Scholar
Chiu DKY, Chen X, Wong AKC: Association between statistical and functional patterns in biomolecules. Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technolgoy (CBGIST '01), Durham, NC, USA March 2001 64-69.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Information Science, University of Guelph, Guelph, ON, Canada, N1G 2W1
David KY Chiu & Yan Wang

Authors

David KY Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Chiu, D.K., Wang, Y. Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation. J Bioinform Sys Biology 2006, 35809 (2006). https://doi.org/10.1155/BSB/2006/35809

Download citation

Received: 22 May 2005
Revised: 23 November 2005
Accepted: 07 June 2006
Published: 13 August 2006
DOI: https://doi.org/10.1155/BSB/2006/35809

Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation

Abstract

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords