The MITRE Identification Scrubber Toolkit (MIST) is a suite of tools for identifying and redacting personally identifiable information (PII) in free-text medical records. MIST helps you replace these PII either with obscuring fillers, such as [NAME], or with artificial, synthesized, but realistic English fillers.
MIST decomposes the deidentification task into two subtasks:
- an annotation subtask, where the tools of trainable, corpus-based natural language processing are brought to bear to identify the PII phrases, and
- a replacement subtask, where information in the PII phrases is used to generate suitable replacements, given a chosen replacement strategy.
The first subtask is addressed by the MITRE Annotation Toolkit (MAT), which is a highly customizable suite of tools for natural language processing upon which MIST is built. The customizations for MIST itself address the second subtask. The MIST documentation uses the terms annotation and tagging interchangeably for the task of identifying, either by hand or automatically, the PII phrases in your documents. The labels for your PII types (e.g., NAME, PHYSICIAN, AGE, DATE) will be the tags that you'll be applying to your documents.