Tags: Human annotated

Word sense ambiguity is a pervasive characteristic of natural language. For example, the word "cold" has several senses and may refer to a disease, a temperature sensation, or an environmental condition. The specific sense intended is determined by the te ...

Category Structured Data

The "PK DDI" corpus is a new corpus of sections from FDA-approved drug  package inserts (PIs) that have been manually annotated for  pharmacokinetic drug-drug interactions by a pharmacist and a drug  information expert. The two annotators reached consensu ...

Category Human Annotated

 "DrugDDI: an annotated corpus for drug-drug interactions" submitted for publication.   The DrugDDI corpus is part of a larger study about automatic Drug-Drug Interaction Extraction. The corpus provides data for the development and automatic evaluatio ...

Category Human Annotated
The Arrowsmith Project Home Page

 Arrowsmith Two Node Search tool,  Anne O'Tate value-added PubMed search tool,  Author-ity Author Name Disambiguation tools, ADAM abbreviation database,    WETLAB prototype electronic lab notebook,   Compendium of Biomedical Text Mining tools,   ...

Structured abstracts contain distinct labeled sections (e.g., “RESULTS”) for key information from articles they summarize.  If English-language structured abstracts appear in journals that the US National Library of Medicine (NLM) indexes, the labels in t ...

Category Structured Data
NeuroLex - A dynamic lexicon of neuroscience terms

 The NeuroLex project, supported by the Neuroscience Information Framework project, is a dynamic lexicon of neuroscience terms. Unlike an encyclopedia, a lexicon provides the meaning of a term, and not all there is to know about it. The NeuroLex is being ...

Category Human Annotated

This package provides machine learning algorithms optimized for large text categorization tasks and is able to combine several text categorization solutions. The advantages of this package compared to existing approaches are: 1) its speed, 2) it is able t ...

The dataset was prepared for the Genic Interaction Extraction Challenge. Extracting gene interaction means extracting the agent (proteins) and the target (genes) of all couples of genic interactions from sentences.  MIG-INRA has annotated hundreds of such ...

Category Human Annotated

Corpus annotation is now a key topic for all areas of natural language processing (NLP) and information extraction (IE) which employ supervised learning. With the explosion of results in molecular-biology there is an increased need for IE to extract knowl ...

Category Human Annotated

The GeneReg corpus consists of 314 Medline abstracts dealing with the regulation of gene expression in the model organism E. coli. The regulation of gene expression can be described as the process that modulates the frequency, rate or extent of gene expre ...

Category Human Annotated

DrugNerAr Corpus: a corpus annotated with drug anaphoras. Text were collected from the Drugbank database. There is no corpus dedicated to the resolution of the anaphoric expressions occurring in drug interaction descriptions in pharmacological documents, ...

Category Human Annotated

See link to CMC resource catalog on upper right side of page.  All resources are fully open-access, but a registration is needed.  We simply take user and dowload counts and report back to our benifactors.  No data are shared. You will need to download t ...

Category Human Annotated

A collection of query-based summaries sourced from the Clinical Inquiries section of the Journal of Family Practice. The data are formatted in XML and are annotated with: The clinical question; The answer(s) to the question; The evidence grade of the ...

Category Human Annotated
BioInfer: Bio Information Extraction Resource

Biomedical Information Extraction Resource (BioInfer) is a public resource providing a manually annotated corpus and related resources for information extraction in the biomedical domain. The corpus contains sentences from abstracts of biomedical researc ...

Category Human Annotated

This corpus originated from the BioCreAtIvE task 1A data set for named entity recognition of gene/protein names. We randomly selected 1000 sentences from this set and added additional annotation for interactions between genes/proteins. 173 sentences conta ...

Category Human Annotated

The AImed corpus consists of 225 Medline abstracts. 200 abstracts describe interactions between human proteins, 25 do not refer to any interaction. There are 4084 protein references and around 1000 tagged interactions in this data set. In this data set th ...

Category Human Annotated