BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a “shotgun” approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. 


Siddhartha Jonnalagadda

Associated Institutions

Arizona State University

Application Domains
  • Biology
  • Clinical
  • Clinical records
  • Domain independent
  • Genomics
  • Literature
  • Metabolomics
  • Proteomics
Other Resource Type
Software Subtype
  • NLP / information extraction
  • Other
  • Text mining
Programming Languages
  • Java
Operating Systems
  • Linux
  • OS X
  • Unix
  • Windows
Included Components
  • Library of modular components
Dataset Subtype
Data Model Subtype
Online Resource Subtype
Knowledge Base Subtype
Intended User Types
  • Informatics researcher
  • NLP researcher or developer
  • Software developer

Siddhartha Jonnalagadda, Graciela Gonzalez. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction. In Annual Proceedings of AMIA 2010, Washington D.C., November 13-17, 2010

Siddhartha Jonnalagadda, Graciela Gonzalez. Sentence Simplification Aids Protein-Protein Interaction Extraction. The 3rd International Symposium on Languages in Biology and Medicine, Jeju Island, South Korea, November 8-10, 2009

Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June

Joerg Hakenberg, Robert Leaman, Nguyen Ha Vo, Siddhartha Jonnalagadda, Ryan Sullivan, Christopher Miller, Luis Tari, Chitta Baral, Graciela Gonzalez. Efficient extraction of protein-protein interactions from full-text articles. IEEE/ACM TCBB. 2010

Available Documentation
Licensing Type
Open source
Date of Latest Version