November 21, 2022

Research roundup on natural language processing and machine learning


Using doctor's notes to learn about drug reactions, dementia, and cannabis use

Detecting severe allergic reactions to drugs, identifying people at risk of Alzheimer's disease, and learning about medical cannabis use may seem unrelated. But all might be advanced by applying natural language processing (NLP) and machine learning to clinician's written notes.

"When people visit a clinic, their care team documents the visit in clinical notes," said Associate Investigator David Carrell, PhD, who leads NLP research at Kaiser Permanente Washington Health Research Institute (KPWHRI). At Kaiser Permanente Washington alone, health care teams write millions of notes annually into an electronic health record (EHR). The notes contain valuable data for improving care.

NLP algorithms analyze text — a complex task because the same idea may be expressed in many different ways. For example, one doctor might write "difficulty breathing" while another might use the clinical term "dyspnea." Spelling errors, abbreviations, and missing punctuation make the task even harder. Carrell and colleagues develop NLP algorithms to identify notes about conditions or behaviors so they can be included in computer models with applications ranging from monitoring national drug safety to improving primary care. Some models also use machine learning methods, which can better represent the complex relationships between data from clinical notes and a condition or behavior.

How NLP works

Generally, Carrell explained, the steps of NLP are:

  • Obtain anonymous EHR data on a representative sample — for example, people potentially at risk of Alzheimer's disease. Divide the data into a development set and a validation set.
  • Create lists of terms about the condition. Researchers might manually read EHR notes, finding words and phrases associated with the condition — for example, "rash" about an allergic reaction. This might be automated by mining key terms from medical literature — using machine learning.
  • Develop the NLP model. Using the word list, researchers create algorithms that detect EHR word patterns related to the condition. They use the development dataset to refine and strengthen the model.
  • Test the performance of the model using the validation dataset.

Carrell and colleagues recently published 3 studies showing the versatility of NLP and machine learning.


In the American Journal of Epidemiology, Carrell and colleagues showed how NLP might increase the accuracy of drug safety monitoring by the U.S. FDA (Food and Drug Administration). Medications are the most common cause of fatal anaphylaxis (serious allergic reaction). The FDA uses automated algorithms — but not yet NLP — to monitor for medication-related anaphylaxis in EHRs.

Anaphylaxis is difficult to identify in text because of varied symptoms including low blood pressure, vomiting, and rash. Anaphylaxis is also rare, so cases in EHR data are scarce.

Carrell and team used NLP and anonymized EHR data from Kaiser Permanente Washington to help create an anaphylaxis word list. They added this dictionary to anaphylaxis-predicting models. When tested using validation data from Kaiser Permanente Northwest, several models improved on the FDA anaphylaxis-identifying algorithms. The methods might also be used to better track other rare and serious conditions, such as emerging infectious diseases.

KPWHRI coauthors on the study include Kara Cushing-Haugen, Ron Johnson, Vina Graham, David Cronkite, and Jennifer Nelson.

Mild cognitive impairment

A study in BMC Medical Informatics and Decision Making led by Senior Investigator Rob Penfold, PhD, asked: Can NLP help develop a model to identify patients with mild cognitive impairment (MCI)?

MCI is a decline in memory, thinking, or behavior that is greater than expected with age. A possible sign of future Alzheimer's disease or related dementia, MCI can be detected in primary care. An NLP-based resource might assist clinicians in knowing who could benefit from MCI screening.

Carrell and colleagues used NLP and anonymized Kaiser Permanente Washington EHR data, including from the Adult Changes in Thought (ACT) study, to identify MCI-related concepts from clinical notes. They used the results to develop a machine learning MCI-prediction model. The model's ability to identify people with MCI through MCI-associated concepts was similar to the ability of screening tests to identify people with other conditions such as cancer. This shows the potential of the approach for developing a tool to help clinicians care for patients with possible future Alzheimer's disease and assist health care organizations in planning for members' needs.

KPWHRI coauthors on the study include David Cronkite, Chester Pabiniak, Tammy Dodd, Ashley Glass, Eric Johnson, and Ella Thompson.

Medical cannabis use

In Substance Abuse, the NLP team researched medical cannabis use documented in EHRs for a study led by Assistant Investigator Gwen Lapham, PhD, MPH, MSW. Knowing when and why people use medical cannabis is important for studying its safety and effectiveness for conditions including pain and anxiety.

Anonymized development and validation data came from Kaiser Permanente Washington, which since 2015 has routinely asked adult primary care patients about past-year cannabis use. The reasons for use may be in clinical notes, but challenges in identifying this information include the different terms associated with cannabis.

Despite the difficulties, the study reports an NLP model identified more than half of the 5.6% of records with documented medical cannabis use. NLP-assisted manual review identified the remainder. The study shows NLP could help obtain data for research and assist clinicians and patients with decision-making about cannabis use.

KPWHRI coauthors on the study include David Cronkite, Mary Shea, Malia Oliver, Casey Luce, Theresa Matson, Jennifer Bobb, Clarissa Hsu, and Katharine Bradley.

The KPWHRI NLP team continues to advance NLP and machine learning in studies applying these methods to cancer, acute pancreatitis, COVID-19, mental health conditions, and substance use disorders. They also continue to develop methods to streamline FDA safety monitoring of medications and medical devices.

By Chris Tachibana

Vaccine Safety


Biostatisticians track COVID-19 vaccine safety

Dr. Jennifer Nelson explains how KP scientists are helping the CDC and FDA keep an eye out for rare adverse events.

ACT Study news

person typing computer monitor

Healthy aging: New tool for research and collaboration

Adult Changes in Thought (ACT) Study launches a new website to advance our understanding of brain aging.

Electronic health records

Health conditoin risk in patients cannabis use, venn diagram, KP health form

EHR study offers insights on medical cannabis use

New research examines providers’ notes to understand patients’ cannabis use and health conditions.