David Carrell, PhD, is an assistant investigator who develops and applies technology for extracting rich information from unstructured clinical text, such as physician progress notes. This work uses state-of-the-art clinical natural language processing (NLP) technologies in single- and multi-site settings.
An example of this work is an NLP system to identify women who have been diagnosed with recurrent breast cancer. Despite being a common and consequential clinical diagnosis, recurrent breast cancer cannot be tracked reliably using standard medical codes found in a person’s chart. Supported by a grant from the National Cancer Institute, he and his colleagues used information from clinician progress notes, radiology reports, and pathology reports to classify women by breast cancer recurrence.
Working with teams of researchers inside and outside Kaiser Permanente Washington Health Research Institute, Dr. Carrell has applied similar precision phenotyping methods to identify evidence of carotid artery stenosis, colon polyps, problem use of prescription opioids, and colonoscopy quality.
Dr. Carrell’s current research projects are applying NLP and machine learning methods to improve medication safety surveillance (through the Food and Drug Administration Sentinel Initiative) and to evaluate the impact on drug use disorder diagnosis and treatment of Kaiser Permanente Washington patients screened for unhealthy cannabis and other drug use. His ongoing work also includes development and application of automated algorithms based on electronic health record data to identify patients with particular health conditions (called “patient phenotypes”) for use in genetic and epidemiological research.
Surveillance methods for adverse events associated with medication exposure, including problem use of prescription opioids
Methods for using structured and unstructured electronic health record data to identify patients with (or without) specific clinical conditions or phenotypes for large scale epidemiological and genomic studies
Identifying recurrent breast cancer using EHR text; Colonoscopy quality metrics
Recurrent breast cancer; Colonoscopy quality; Extracting information from clinical text; Automated de-identification of clinical text; Methods for applying NLP methods in multi-site research
Prevention and treatment
Penfold RB, Carrell DS, Cronkite DJ, Pabiniak C, Dodd T, Glass AM, Johnson E, Thompson E, Arrighi HM, Stang PE. Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Med Inform Decis Mak. 2022 May 12;22(1):129. doi: 10.1186/s12911-022-01864-z. PubMed
Lapham GT, Matson TE, Carrell DS, Bobb JF, Luce C, Oliver MM, Ghitza UE, Hsu C, Browne KC, Binswanger IA, Campbell CI, Saxon AJ, Vandrey R, Schauer GL, Pacula RL, Horberg MA, Bailey SR, McClure EA, Bradley KA. Comparison of medical cannabis use reported on a confidential survey vs documented in the electronic health record among primary care patients. JAMA Netw Open. 2022 May 2;5(5):e2211677. doi: 10.1001/jamanetworkopen.2022.11677. PubMed
Yu J, Pacheco JA, Ghosh AS, Luo Y, Weng C, Shang N, Benoit B, Carrell DS, Carroll RJ, Dikilitas O, Freimuth RR, Gainer VS, Hakonarson H, Hripcsak G, Kullo IJ, Mentch F, Murphy SN, Peissig PL, Ramirez AH, Walton N, Wei WQ, Rasmussen LV. Under-specification as the source of ambiguity and vagueness in narrative phenotype algorithm definitions. BMC Med Inform Decis Mak. 2022 Jan 28;22(1):23. doi: 10.1186/s12911-022-01759-z. PubMed
Carrell DS, Cronkite DJ, Shea M, Oliver M, Luce C, Matson TE, Bobb JF, Hsu C, Binswanger IA, Browne KC, Saxon AJ, McCormack J, Jelstrom E, Ghitza UE, Campbell CI, Bradley KA, Lapham GT. Clinical documentation of patient-reported medical cannabis use in primary care: toward scalable extraction using natural language processing methods. Subst Abus. 2022;43(1):917-924. doi: 10.1080/08897077.2021.1986767. PubMed
Gruber S, Carrell DS, Floyd JS, Nelson JC, Hazlehurst BL, Heagerty PJ. Letter to the editor re beachler, et al, 2021. Pharmacoepidemiol Drug Saf. 2021 Aug 19. doi: 10.1002/pds.5342. [Epub ahead of print]. PubMed
Matson TE, Carrell DS, Bobb JF, Cronkite DJ, Oliver MM, Luce C, Ghitza UE, Hsu CW, Campbell CI, Browne KC, Binswanger IA, Saxon AJ, Bradley KA, Lapham GT. Prevalence of medical cannabis use and associated health conditions documented in electronic health records among primary care patients in Washington state. JAMA Netw Open. 2021 May 3;4(5):e219375. doi: 10.1001/jamanetworkopen.2021.9375. PubMed
Shang N, Khan A, Polubriaginof F, Zanoni F, Mehl K, Fasel D, Drawz PE, Carrol RJ, Denny JC, Hathcock MA, Arruda-Olson AM, Peissig PL, Dart RA, Brilliant MH, Larson EB, Carrell DS, Pendergrass S, Verma SS, Ritchie MD, Benoit B, Gainer VS, Karlson EW, Gordon AS, Jarvik GP, Stanaway IB, Crosslin DR, Mohan S, Ionita-Laza I, Tatonetti NP, Gharavi AG, Hripcsak G, Weng C, Kiryluk K. Medical records-based chronic kidney disease phenotype for clinical care and "big data" observational and genetic studies. NPJ Digit Med. 2021;4(1):70. doi: 10.1038/s41746-021-00428-1. PubMed
KPWHRI researchers are contributing to better mental health care for people nationwide.
Using doctor's notes to learn about drug reactions, dementia, and cannabis use.
Dr. Jennifer Nelson explains how KP scientists are helping the CDC and FDA keep an eye out for rare adverse events.
A Kaiser Permanente-led BCSC study is among the largest ever to evaluate adding MRI surveillance for breast cancer survivors.