Skip to main content

Question Processing and Clustering in INDOC: A Biomedical Question Answering System


The exponential growth in the volume of publications in the biomedical domain has made it impossible for an individual to keep pace with the advances. Even though evidence-based medicine has gained wide acceptance, the physicians are unable to access the relevant information in the required time, leaving most of the questions unanswered. This accentuates the need for fast and accurate biomedical question answering systems. In this paper we introduce INDOC—a biomedical question answering system based on novel ideas of indexing and extracting the answer to the questions posed. INDOC displays the results in clusters to help the user arrive the most relevant set of documents quickly. Evaluation was done against the standard OHSUMED test collection. Our system achieves high accuracy and minimizes user effort.



  1. []

  2. Gorman P, Ash J, Wykoff L: Can primary care physicians' questions be answered using the medical journal literature? Bulletin of the Medical Library Association 1994, 82(2):140-146.

    Google Scholar 

  3. Straus SE, Sackett DL: Bringing evidence to the point of care. Journal of the American Medical Association 1999, 281: 1171-1172. 10.1001/jama.281.13.1171

    Article  Google Scholar 

  4. Guyatt GH, Meade MO, Jaeschke RZ, Cook DJ, Haynes RB: Practitioners of evidence based care. British Medical Journal 2000, 320(7240):954-955. 10.1136/bmj.320.7240.954

    Article  Google Scholar 

  5. Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB: Evidence-Based Medicine: How to Practice and Teach ENB. Churchill Livingstone, New York, NY, USA; 1997.

    Google Scholar 

  6. Gorman PN, Helfand M: Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Medical Decision Making 1995, 15(2):113-119. 10.1177/0272989X9501500203

    Article  Google Scholar 

  7. Jacquemart P, Zweigenbaum P: Towards a medical question-answering system: a feasibility study. In Proceedings of Medical Informatics Europe (MIE '03), Studies in Health Technology and Informatics. Volume 95. Edited by: Beux PL, Baud R. IOS Press, San Palo, Calif, USA; 2003:463-468.

    Google Scholar 

  8. Schultz S, Honeck M, Hahn H: Biomedical text retrieval in languages with complex morphology. Proceedings of the Workshop on Natural Language Processing in the Biomedical domain, Philadelphia, Pa, USA, July 2002 61-68.

    Google Scholar 

  9. Ely J, Osheroff JA, Ebell MH: Analysis of questions asked by family doctors regarding patient care. British Medical Journal 1999, 319(7206):358-361.

    Article  Google Scholar 

  10. Ely JW, Osheroff JA, Ebell MH, et al.: Obstacles to answering doctors' questions about patient care with evidence: qualitative study. British Medical Journal 2002, 324(7339):710-713. 10.1136/bmj.324.7339.710

    Article  Google Scholar 

  11. Bergus GR, Randall CS, Sinift SD, Rosenthal DM: Does the structure of clinical questions affect the outcome of curbside consultations with specialty colleagues? Archives of Family Medicine 2000, 9(6):541-547. 10.1001/archfami.9.6.541

    Article  Google Scholar 

  12. Niu Y, Hirst G: Analysis of semantic classes in medical text for question answering. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Workshop on Question Answering in Restricted Domains, Barcelona, Spain, July 2004 54-61.

    Google Scholar 

  13. Niu Y, Hirst G, McArthur G, Rodriguez-Gianolli P: Answering clinical questions with role identification. Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Workshop on Natural Language Processing in Biomedicine, Sapporo, Japan, July 2003 73-80.

    Google Scholar 

  14. Sang ETK, Bouma G, De Rijke M: Developing offline strategies for answering medical questions. Proceedings of the AAAI-05 Workshop on Question Answering in Restricted Domains, Pittsburgh, Pa, USA, 2005 WS-05-10: 41-45.

    Google Scholar 

  15. Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Briefings in Bioinformatics 2005, 6(1):57-71. 10.1093/bib/6.1.57

    Article  Google Scholar 

  16. []

  17. []

  18. Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings of the AMIA Symposium, 2001 17-21.

    Google Scholar 

  19. McCray AT, Burgun A, Bodenreider O: Aggregating UMLS semantic types for reducing conceptual complexity. Medinfo 2001, 10(part 1):216-220.

    Google Scholar 

  20. Bodenreider O, McCray AT: Exploring semantic groups through visual approaches. Journal of Biomedical Informatics 2003, 36(6):414-432. 10.1016/j.jbi.2003.11.002

    Article  Google Scholar 

  21. Hersh WR: OHSUMED: an interactive retrieval evaluation and new large test collection for research. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), Springer, Dublin, Ireland, July 1994 192-201.

    Google Scholar 

  22. []

  23. MacQueen JB: Some methods for classification and analysis of multivariate observations. Proceedings of 5th the Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, Calif, University of California Press, USA, June-July 1967 281-297.

    Google Scholar 

  24. Ely JW, Osheroff JA, Gorman PN, et al.: A taxonomy of generic clinical questions: classification study. British Medical Journal 2000, 321(7258):429-432. 10.1136/bmj.321.7258.429

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Parikshit Sondhi.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Sondhi, P., Raj, P., Kumar, V.V. et al. Question Processing and Clustering in INDOC: A Biomedical Question Answering System. J Bioinform Sys Biology 2007, 28576 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Relevant Information
  • Exponential Growth
  • System Biology
  • Require Time
  • Wide Acceptance