Researchers have requested the ability to have available MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The MEDLINE/PubMed Baseline Repository was set up to provide this capability. We have stored the end-of-year baseline of the MEDLINE/PubMed database for each year starting in 2002 along with a selection of the associated MeSH Vocabulary data files. Please Note: The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.
The baselines are normally generated towards the middle of November each year and contain all completed citations in MEDLINE as of that date. The baselines represent MEDLINE after the year-end processing has been completed. This means that the records have been revised with the upcoming year's new MeSH vocabulary terms. We currently have available the 2002 - 2011 MEDLINE/PubMed Baselines. The naming of the baselines represents this year-end processing. For example, the 2002 MEDLINE/PubMed Baseline contains all completed citations from the mid-1960's until the date the baseline was created in late November 2001 with the year-end processing assigning appropriate 2002 MeSH vocabulary terms, thus it is a baseline for the 2002 year.
The baselines also contain citations that are not MEDLINE. All of the baselines we have stored (2002 on) contain "Out-of-scope" citations which were renamed to "PubMed-not-MEDLINE" starting with the 2004 MEDLINE/PubMed Baseline.
Available Resources include:
MBR Query Tool Database: Baseline databases 2002 forward available for searching. Includes tables with MH, SH, MH/SH combination, Chemicals, and PMID data; also can limit or filter by Date Created, Date Completed, Date Last Revised, Publication Year, and Status.
DTD Files: We save a copy of the relevant DTD (Document Type Definition) files each year for working with the Baseline XML files.
Frequency Count Files: Basic frequency counts for the entire MEDLINE/PubMed Baseline sorted into alphabetical and numerical order for the following MEDLINE fields. For all fields but the NM field, we also provide a sort and count of their occurrences as starred (Index Medicus) items.
a) MH (MeSH Headings)
b) SH (MeSH Subheadings)
c) MH/SH combinations
d) NM (Chemicals)
Raw Data Files: Files containing the raw data similar to what was used to create our MBR Query Tool Database for this Baseline year. There is a README file describing the various files available and their layouts.
Histogram/Summary Files: File showing the number of MH terms assigned to each of the MeSH Tree top-level and top-level + 1 categories during the latest year to see how assignment of terms vary from year to year. Also, a file showing the number of MH terms assigned to each of the UMLS Semantic Type Groupings categories during the latest year to see how assignment of terms vary from year to year from a different perspective.
Related MeSH Files: We save a copy of MeSH Vocabulary data files for each year and a copy of their associated DTD (Document Type Definition) files for working with the Baseline XML files.
Unique Words from Medline Baseline: We use a very simplified idea of a word -- we throw away anything with all numbers, throw away anything with non-ascii characters, and break at anything that is not alphanumeric. The "words" files contains single words and bigram words.