Open Access

Splitting the BLOSUM Score into Numbers of Biological Significance

EURASIP Journal on Bioinformatics and Systems Biology20072007:31450

https://doi.org/10.1155/2007/31450

Received: 2 October 2006

Accepted: 30 March 2007

Published: 4 June 2007

Abstract

Mathematical tools developed in the context of Shannon information theory were used to analyze the meaning of the BLOSUM score, which was split into three components termed as the BLOSUM spectrum (or BLOSpectrum). These relate respectively to the sequence convergence (the stochastic similarity of the two protein sequences), to the background frequency divergence (typicality of the amino acid probability distribution in each sequence), and to the target frequency divergence (compliance of the amino acid variations between the two sequences to the protein model implicit in the BLOCKS database). This treatment sharpens the protein sequence comparison, providing a rationale for the biological significance of the obtained score, and helps to identify weakly related sequences. Moreover, the BLOSpectrum can guide the choice of the most appropriate scoring matrix, tailoring it to the evolutionary divergence associated with the two sequences, or indicate if a compositionally adjusted matrix could perform better.

[1234567891011121314151617181920212223242526272829]

Authors’ Affiliations

(1)
Dipartimento di Matematica e Informatica, Università degli Studi di Trieste
(2)
Centro di Biomedicina Molecolare, AREA Science Park
(3)
Dipartimento di Biochimica, Biofisica, e Chimica delle Macromolecole, Università degli Studi di Trieste

References

  1. Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 1970, 48(3):443-453. 10.1016/0022-2836(70)90057-4View ArticleGoogle Scholar
  2. McLachlan AD:Tests for comparing related amino-acid sequences. Cytochrome and cytochrome . Journal of Molecular Biology 1971, 61(2):409-424. 10.1016/0022-2836(71)90390-1View ArticleGoogle Scholar
  3. Sankoff D: Matching sequences under deletion-insertion constraints. Proceedings of the National Academy of Sciences of the United States of America 1972, 69(1):4-6. 10.1073/pnas.69.1.4View ArticleMathSciNetMATHGoogle Scholar
  4. Sellers PH: On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics 1974, 26(4):787-793. 10.1137/0126070View ArticleMathSciNetMATHGoogle Scholar
  5. Waterman MS, Smith TF, Beyer WA: Some biological sequence metrics. Advances in Mathematics 1976, 20(3):367-387. 10.1016/0001-8708(76)90202-4View ArticleMathSciNetMATHGoogle Scholar
  6. Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Volume 5. Edited by: Dayhoff MO. National Biomedical Research Foundation, Washington, DC, USA; 1978:345-352.Google Scholar
  7. Altschul SF: Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology 1991, 219(3):555-565. 10.1016/0022-2836(91)90193-AView ArticleGoogle Scholar
  8. Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America 1990, 87(6):2264-2268. 10.1073/pnas.87.6.2264View ArticleMATHGoogle Scholar
  9. Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 1992, 89(22):10915-10919. 10.1073/pnas.89.22.10915View ArticleGoogle Scholar
  10. Feller W: An Introduction to Probability and Its Applications. John Wiley & Sons, New York, NY, USA; 1968.MATHGoogle Scholar
  11. Yu Y-K, Wootton JC, Altschul SF: The compositional adjustment of amino acid substitution matrices. Proceedings of the National Academy of Sciences of the United States of America 2003, 100(26):15688-15693. 10.1073/pnas.2533904100View ArticleGoogle Scholar
  12. Altschul SF: A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution 1993, 36(3):290-300. 10.1007/BF00160485View ArticleGoogle Scholar
  13. States DJ, Gish W, Altschul SF: Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 1991, 3(1):66-70. 10.1016/S1046-2023(05)80165-3View ArticleGoogle Scholar
  14. Sunyaev SR, Bogopolsky GA, Oleynikova NV, Vlasov PK, Finkelstein AV, Roytberg MA: From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins: Structure, Function, and Bioinformatics 2004, 54(3):569-582.View ArticleGoogle Scholar
  15. Zachariah MA, Crooks GE, Holbrook SR, Brenner SE: A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins: Structure, Function, and Bioinformatics 2005, 58(2):329-338.View ArticleGoogle Scholar
  16. Shannon CE: A mathematical theory of communication—part I. Bell System Technical Journal 1948, 27: 379-423.View ArticleMathSciNetMATHGoogle Scholar
  17. Shannon CE: A mathematical theory of communication—part II. Bell System Technical Journal 1948, 27: 623-656.View ArticleMathSciNetGoogle Scholar
  18. Csiszár I, Körner J: Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, NY, USA; 1981.MATHGoogle Scholar
  19. Schäffer AA, Aravind L, Madden TL, et al.: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Research 2001, 29(14):2994-3005. 10.1093/nar/29.14.2994View ArticleGoogle Scholar
  20. Frommlet F, Futschik A, Bogdan M: On the significance of sequence alignments when using multiple scoring matrices. Bioinformatics 2004, 20(6):881-887. 10.1093/bioinformatics/btg498View ArticleGoogle Scholar
  21. Altschul SF, Wootton JC, Gertz EM, et al.: Protein database searches using compositionally adjusted substitution matrices. FEBS Journal 2005, 272(20):5101-5109. 10.1111/j.1742-4658.2005.04945.xView ArticleGoogle Scholar
  22. Schäffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 1999, 15(12):1000-1011. 10.1093/bioinformatics/15.12.1000View ArticleGoogle Scholar
  23. Rypniewski WR, Perrakis A, Vorgias CE, Wilson KS: Evolutionary divergence and conservation of trypsin. Protein Engineering 1994, 7(1):57-64. 10.1093/protein/7.1.57View ArticleGoogle Scholar
  24. Hughes AL: Evolutionary diversification of the mammalian defensins. Cellular and Molecular Life Sciences 1999, 56(1-2):94-103. 10.1007/s000180050010View ArticleGoogle Scholar
  25. Bauer F, Schweimer K, Klüver E, et al.:Structure determination of human and murine -defensins reveals structural conservation in the absence of significant sequence similarity. Protein Science 2001, 10(12):2470-2479. 10.1110/ps.ps.24401View ArticleGoogle Scholar
  26. Tossi A, Sandri L: Molecular diversity in gene-encoded, cationic antimicrobial polypeptides. Current Pharmaceutical Design 2002, 8(9):743-761. 10.2174/1381612023395475View ArticleGoogle Scholar
  27. Gennaro R, Zanetti M, Benincasa M, Podda E, Miani M: Pro-rich antimicrobial peptides from animals: structure, biological functions and mechanism of action. Current Pharmaceutical Design 2002, 8(9):763-778. 10.2174/1381612023395394View ArticleGoogle Scholar
  28. Selsted ME, Novotny MJ, Morris WL, Tang Y-Q, Smith W, Cullor JS: Indolicidin, a novel bactericidal tridecapeptide amide from neutrophils. Journal of Biological Chemistry 1992, 267(7):4292-4295.Google Scholar
  29. Kullback S: Information Theory and Statistics. Dover, Mineola, NY, USA; 1997.MATHGoogle Scholar

Copyright

© Francesco Fabris et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.