AIMed Corpora Popular

The AImed corpus consists of 225 Medline abstracts. 200 abstracts describe interactions between human proteins, 25 do not refer to any interaction. There are 4084 protein references and around 1000 tagged interactions in this data set. In this data set th ...

www2.informatik.hu-berlin.de/~hakenber/corpora/

This corpus originated from the BioCreAtIvE task 1A data set for named entity recognition of gene/protein names. We randomly selected 1000 sentences from this set and added additional annotation for interactions between genes/proteins. 173 sentences conta ...

BioInfer: Bio Information Extraction Resource

mars.cs.utu.fi/BioInfer

Biomedical Information Extraction Resource (BioInfer) is a public resource providing a manually annotated corpus and related resources for information extraction in the biomedical domain. The corpus contains sentences from abstracts of biomedical researc ...

web.science.mq.edu.au/~diego/medicalnlp/

A collection of query-based summaries sourced from the Clinical Inquiries section of the Journal of Family Practice. The data are formatted in XML and are annotated with: The clinical question; The answer(s) to the question; The evidence grade of the ...

www.cincinnatichildrens.org/research/divisions/b/bmi/labs/pe...

See link to CMC resource catalog on upper right side of page.  All resources are fully open-access, but a registration is needed.  We simply take user and dowload counts and report back to our benifactors.  No data are shared. You will need to download t ...

labda.sintonia.inf.uc3m.es/DrugDDI/DrugNerAr.html

DrugNerAr Corpus: a corpus annotated with drug anaphoras. Text were collected from the Drugbank database. There is no corpus dedicated to the resolution of the anaphoric expressions occurring in drug interaction descriptions in pharmacological documents, ...

GeneReg Corpus Popular

www.julielab.de/Resources/Corpora/GeneReg.html

The GeneReg corpus consists of 314 Medline abstracts dealing with the regulation of gene expression in the model organism E. coli. The regulation of gene expression can be described as the process that modulates the frequency, rate or extent of gene expre ...

GENIA Corpus Popular

www.nactem.ac.uk/genia/genia-corpus

Corpus annotation is now a key topic for all areas of natural language processing (NLP) and information extraction (IE) which employ supervised learning. With the explosion of results in molecular-biology there is an increased need for IE to extract knowl ...

LLL Corpora Popular

genome.jouy.inra.fr/texte/LLLchallenge

The dataset was prepared for the Genic Interaction Extraction Challenge. Extracting gene interaction means extracting the agent (proteins) and the target (genes) of all couples of genic interactions from sentences.  MIG-INRA has annotated hundreds of such ...

NeuroLex - A dynamic lexicon of neuroscience terms

neurolex.org

 The NeuroLex project, supported by the Neuroscience Information Framework project, is a dynamic lexicon of neuroscience terms. Unlike an encyclopedia, a lexicon provides the meaning of a term, and not all there is to know about it. The NeuroLex is being ...

labda.sintonia.inf.uc3m.es/DrugDDI/

 "DrugDDI: an annotated corpus for drug-drug interactions" submitted for publication.   The DrugDDI corpus is part of a larger study about automatic Drug-Drug Interaction Extraction. The corpus provides data for the development and automatic evaluatio ...

dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/package-insert-DDI...

The "PK DDI" corpus is a new corpus of sections from FDA-approved drug  package inserts (PIs) that have been manually annotated for  pharmacokinetic drug-drug interactions by a pharmacist and a drug  information expert. The two annotators reached consensu ...