Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use

Jose, Hena; Vadivukarasi, Thangavel; Devakumar, Jyothi

doi:10.1155/2007/53096

Research Article
Open access
Published: 09 December 2007

Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use

Hena Jose¹,
Thangavel Vadivukarasi¹ &
Jyothi Devakumar¹

EURASIP Journal on Bioinformatics and Systems Biology volume 2007, Article number: 53096 (2007) Cite this article

2470 Accesses
2 Citations
Metrics details

Abstract

Several natural language processing tools, both commercial and freely available, are used to extract protein interactions from publications. Methods used by these tools include pattern matching to dynamic programming with individual recall and precision rates. A methodical survey of these tools, keeping in mind the minimum interaction information a researcher would need, in comparison to manual analysis has not been carried out. We compared data generated using some of the selected NLP tools with manually curated protein interaction data (PathArt and IMaps) to comparatively determine the recall and precision rate. The rates were found to be lower than the published scores when a normalized definition for interaction is considered. Each data point captured wrongly or not picked up by the tool was analyzed. Our evaluation brings forth critical failures of NLP tools and provides pointers for the development of an ideal NLP tool.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

References

Hunter L, Cohen KB: Biomedical language processing: what's beyond PubMed? Molecular Cell 2006, 21(5):589-594. 10.1016/j.molcel.2006.02.012
Article Google Scholar
Fukuda K, Tamura A, Tsunoda T, Takagi T: Toward information extraction: identifying protein names from biological papers. Pacific Symposium on Biocomputing 1998, 707-718.
Google Scholar
Stephens M, Palakal M, Mukhopadhyay S, Raje R, Mostafa J: Detecting gene relations from Medline abstracts. Pacific Symposium on Biocomputing 2001, 483-495.
Google Scholar
Sekimizu T, Park HS, Tsujii J: Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. Genome informatics 1998, 9: 62-71.
Google Scholar
Novichkova S, Egorov S, Daraselia N: MedScan, a natural language processing engine for Medline abstracts. Bioinformatics 2003, 19(13):1699-1706. 10.1093/bioinformatics/btg207
Article Google Scholar
Yakushiji A, Tateisi Y, Miyao Y, Tsujii J: Event extraction from biomedical papers using a full parser. Pacific Symposium on Biocomputing 2001, 408-419.
Google Scholar
Thomas J, Milward D, Ouzounis C, Pulman S, Carroll M: Automatic extraction of protein interactions from scientific abstracts. Pacific Symposium on Biocomputing 2000, 541-552.
Google Scholar
Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 2004, 20(18):3604-3612. 10.1093/bioinformatics/bth451
Article Google Scholar
Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH: Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics 2005, 21(11):2759-2765. 10.1093/bioinformatics/bti390
Article Google Scholar
Jenssen T-K, Lægreid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 2001, 28(1):21-28.
Google Scholar
Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 2001, 17(1):S74-S82. 10.1093/bioinformatics/17.suppl_1.S74
Article Google Scholar
Corney DPA, Buxton BF, Langdon WB, Jones DT: BioRAT: extracting biological information from full-length papers. Bioinformatics 2004, 20(17):3206-3213. 10.1093/bioinformatics/bth386
Article Google Scholar
Ahmed ST, Chidambaram D, Davulcu H, Baral C: IntEx: a syntactic role driven protein-protein interaction extractor for bio-medical text. Association for Computational Linguistics 2005, 54-61.
Google Scholar
Eom J, Zhang B: PubMiner: machine learning-based text mining for biomedical information analysis. Genomics & Informatics 2004, 2(2):99-106.
Google Scholar
Donaldson I, Martin J, de Bruijn B, et al.: PreBIND and Textomy—mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 2003, 4(1):11-23. 10.1186/1471-2105-4-11
Article Google Scholar
Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I: Extracting human protein interactions from Medline using a full-sentence parser. Bioinformatics 2004, 20(5):604-611. 10.1093/bioinformatics/btg452
Article Google Scholar
Jang H, Lim J, Lim J-H, Park S-J, Lee K-C, Park S-H: Finding the evidence for protein-protein interactions from PubMed abstracts. Bioinformatics 2006, 22(14):e220-e226. 10.1093/bioinformatics/btl203
Article Google Scholar
Corney DPA, Buxton BF, Langdon WB, Jones DT: BioRAT: extracting biological information from full-length papers. Bioinformatics 2004, 20(17):3206-3213. 10.1093/bioinformatics/bth386
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jubilant Biosys Ltd., #96, Industrial Suburb, 2nd Stage, Yeshwanthpur, Bangalore, 560 022, India
Hena Jose, Thangavel Vadivukarasi & Jyothi Devakumar

Authors

Hena Jose
View author publications
You can also search for this author in PubMed Google Scholar
Thangavel Vadivukarasi
View author publications
You can also search for this author in PubMed Google Scholar
Jyothi Devakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jyothi Devakumar.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Jose, H., Vadivukarasi, T. & Devakumar, J. Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use. J Bioinform Sys Biology 2007, 53096 (2007). https://doi.org/10.1155/2007/53096

Download citation

Received: 31 March 2007
Accepted: 08 October 2007
Published: 09 December 2007
DOI: https://doi.org/10.1155/2007/53096

Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords