Splitting the BLOSUM Score into Numbers of Biological Significance

Fabris, Francesco; Sgarro, Andrea; Tossi, Alessandro

doi:10.1155/2007/31450

Research Article
Open access
Published: 04 June 2007

Splitting the BLOSUM Score into Numbers of Biological Significance

Francesco Fabris^1,2,
Andrea Sgarro^1,2 &
Alessandro Tossi³

EURASIP Journal on Bioinformatics and Systems Biology volume 2007, Article number: 31450 (2007) Cite this article

2840 Accesses
2 Citations
Metrics details

Abstract

Mathematical tools developed in the context of Shannon information theory were used to analyze the meaning of the BLOSUM score, which was split into three components termed as the BLOSUM spectrum (or BLOSpectrum). These relate respectively to the sequence convergence (the stochastic similarity of the two protein sequences), to the background frequency divergence (typicality of the amino acid probability distribution in each sequence), and to the target frequency divergence (compliance of the amino acid variations between the two sequences to the protein model implicit in the BLOCKS database). This treatment sharpens the protein sequence comparison, providing a rationale for the biological significance of the obtained score, and helps to identify weakly related sequences. Moreover, the BLOSpectrum can guide the choice of the most appropriate scoring matrix, tailoring it to the evolutionary divergence associated with the two sequences, or indicate if a compositionally adjusted matrix could perform better.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]

References

Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 1970, 48(3):443-453. 10.1016/0022-2836(70)90057-4
Article Google Scholar
McLachlan AD:Tests for comparing related amino-acid sequences. Cytochrome and cytochrome . Journal of Molecular Biology 1971, 61(2):409-424. 10.1016/0022-2836(71)90390-1
Article Google Scholar
Sankoff D: Matching sequences under deletion-insertion constraints. Proceedings of the National Academy of Sciences of the United States of America 1972, 69(1):4-6. 10.1073/pnas.69.1.4
Article MathSciNet MATH Google Scholar
Sellers PH: On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics 1974, 26(4):787-793. 10.1137/0126070
Article MathSciNet MATH Google Scholar
Waterman MS, Smith TF, Beyer WA: Some biological sequence metrics. Advances in Mathematics 1976, 20(3):367-387. 10.1016/0001-8708(76)90202-4
Article MathSciNet MATH Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Volume 5. Edited by: Dayhoff MO. National Biomedical Research Foundation, Washington, DC, USA; 1978:345-352.
Google Scholar
Altschul SF: Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology 1991, 219(3):555-565. 10.1016/0022-2836(91)90193-A
Article Google Scholar
Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America 1990, 87(6):2264-2268. 10.1073/pnas.87.6.2264
Article MATH Google Scholar
Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 1992, 89(22):10915-10919. 10.1073/pnas.89.22.10915
Article Google Scholar
Feller W: An Introduction to Probability and Its Applications. John Wiley & Sons, New York, NY, USA; 1968.
MATH Google Scholar
Yu Y-K, Wootton JC, Altschul SF: The compositional adjustment of amino acid substitution matrices. Proceedings of the National Academy of Sciences of the United States of America 2003, 100(26):15688-15693. 10.1073/pnas.2533904100
Article Google Scholar
Altschul SF: A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution 1993, 36(3):290-300. 10.1007/BF00160485
Article Google Scholar
States DJ, Gish W, Altschul SF: Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 1991, 3(1):66-70. 10.1016/S1046-2023(05)80165-3
Article Google Scholar
Sunyaev SR, Bogopolsky GA, Oleynikova NV, Vlasov PK, Finkelstein AV, Roytberg MA: From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins: Structure, Function, and Bioinformatics 2004, 54(3):569-582.
Article Google Scholar
Zachariah MA, Crooks GE, Holbrook SR, Brenner SE: A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins: Structure, Function, and Bioinformatics 2005, 58(2):329-338.
Article Google Scholar
Shannon CE: A mathematical theory of communication—part I. Bell System Technical Journal 1948, 27: 379-423.
Article MathSciNet MATH Google Scholar
Shannon CE: A mathematical theory of communication—part II. Bell System Technical Journal 1948, 27: 623-656.
Article MathSciNet Google Scholar
Csiszár I, Körner J: Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, NY, USA; 1981.
MATH Google Scholar
Schäffer AA, Aravind L, Madden TL, et al.: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Research 2001, 29(14):2994-3005. 10.1093/nar/29.14.2994
Article Google Scholar
Frommlet F, Futschik A, Bogdan M: On the significance of sequence alignments when using multiple scoring matrices. Bioinformatics 2004, 20(6):881-887. 10.1093/bioinformatics/btg498
Article Google Scholar
Altschul SF, Wootton JC, Gertz EM, et al.: Protein database searches using compositionally adjusted substitution matrices. FEBS Journal 2005, 272(20):5101-5109. 10.1111/j.1742-4658.2005.04945.x
Article Google Scholar
Schäffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 1999, 15(12):1000-1011. 10.1093/bioinformatics/15.12.1000
Article Google Scholar
Rypniewski WR, Perrakis A, Vorgias CE, Wilson KS: Evolutionary divergence and conservation of trypsin. Protein Engineering 1994, 7(1):57-64. 10.1093/protein/7.1.57
Article Google Scholar
Hughes AL: Evolutionary diversification of the mammalian defensins. Cellular and Molecular Life Sciences 1999, 56(1-2):94-103. 10.1007/s000180050010
Article Google Scholar
Bauer F, Schweimer K, Klüver E, et al.:Structure determination of human and murine -defensins reveals structural conservation in the absence of significant sequence similarity. Protein Science 2001, 10(12):2470-2479. 10.1110/ps.ps.24401
Article Google Scholar
Tossi A, Sandri L: Molecular diversity in gene-encoded, cationic antimicrobial polypeptides. Current Pharmaceutical Design 2002, 8(9):743-761. 10.2174/1381612023395475
Article Google Scholar
Gennaro R, Zanetti M, Benincasa M, Podda E, Miani M: Pro-rich antimicrobial peptides from animals: structure, biological functions and mechanism of action. Current Pharmaceutical Design 2002, 8(9):763-778. 10.2174/1381612023395394
Article Google Scholar
Selsted ME, Novotny MJ, Morris WL, Tang Y-Q, Smith W, Cullor JS: Indolicidin, a novel bactericidal tridecapeptide amide from neutrophils. Journal of Biological Chemistry 1992, 267(7):4292-4295.
Google Scholar
Kullback S: Information Theory and Statistics. Dover, Mineola, NY, USA; 1997.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, via Valerio 12b, Trieste, 34127, Italy
Francesco Fabris & Andrea Sgarro
Centro di Biomedicina Molecolare, AREA Science Park, Strada Statale 14, Basovizza, Trieste, 34012, Italy
Francesco Fabris & Andrea Sgarro
Dipartimento di Biochimica, Biofisica, e Chimica delle Macromolecole, Università degli Studi di Trieste, via Licio Giorgieri 1, Trieste, 34127, Italy
Alessandro Tossi

Authors

Francesco Fabris
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Sgarro
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Tossi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Fabris.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Fabris, F., Sgarro, A. & Tossi, A. Splitting the BLOSUM Score into Numbers of Biological Significance. J Bioinform Sys Biology 2007, 31450 (2007). https://doi.org/10.1155/2007/31450

Download citation

Received: 02 October 2006
Accepted: 30 March 2007
Published: 04 June 2007
DOI: https://doi.org/10.1155/2007/31450

Splitting the BLOSUM Score into Numbers of Biological Significance

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords