Models that can successfully predict suicides in a general population sample can perform poorly in some racial or ethnic groups, according to a study by Kaiser Permanente researchers published April 28 in JAMA Psychiatry.
The new findings show that 2 suicide prediction models are less accurate for Black, American Indian, and Alaska Native people and demonstrate how all prediction models should be evaluated before they are used. The study is believed to be the first to look at how the latest statistical methods to assess suicide risk perform when tested specifically in different ethnic and racial groups.
More than 47,500 people died from suicide in the United States in 2019, an increase of 33% since 1999. Health care leaders hope to reduce suicide by using records of mental health visits, diagnoses, and other factors to identify patients at highest risk of suicide and intervene. In recent years, the Veterans Health Administration, HealthPartners, and several Kaiser Permanente regions have started using suicide prediction models to guide care.
“With enthusiasm growing for suicide prediction modeling, we must be sure that such efforts consider health equity,” said Yates Coley, PhD, the study’s first author and an assistant investigator at Kaiser Permanente Washington Health Research Institute. “Current methods maximize predictive performance across the entire population, with predictive accuracy in the largest subgroup — white patients — eclipsing the performance for less-prevalent racial and ethnic subgroups.”
The JAMA Psychiatry article follows several studies that have uncovered similar concerns with racial/ethnic bias in algorithms in domains ranging from criminal justice and policing to health care.
In the new study, Dr. Coley and colleagues gathered electronic health records for nearly 14 million outpatient mental health visits over a 7-year period from 7 health care systems.
Using these health records, the research team developed 2 different models — a standard statistical logistic regression approach and a random forest machine learning algorithm — to predict suicide deaths within 90 days of a mental health visit. The models used demographic characteristics, comorbidities, mental health and substance use diagnoses, dispensed psychiatric medications, prior suicide attempts, prior mental health encounters, and responses to Patient Health Questionnaire 9, which is filled out routinely at mental health visits.
The models had a high degree of precision in identifying suicides and avoiding false positives across the entire sample, and for white, Hispanic, and Asian patients. The models fared much worse with Black, American Indian, and Alaska Native patients and patients without race/ethnicity recorded.
For example, after one of the models was used to predict the 5% of visits with the highest risk of suicide, the researchers zeroed in on white patient visits in that risk group and found that nearly half of eventual suicides were identified. By contrast, that risk group only included about 7% of suicide deaths by American Indian, Alaska Native, and Black patients. This means that people in these populations who die by suicide would be much less likely to be identified by the model, indicating that this particular tool is not useful for guiding their care.
Researchers cited several possible reasons for shortcomings in prediction accuracy that could be the result of embedded biases in the data:
The 2 prediction models in the study are not the same as the ones now being implemented in health systems. This study examined models that predict suicide deaths, while models being used in clinical care at Kaiser Permanente predict self-harm, or suicide attempts. An audit, like the one in this study, of the suicide-attempt prediction model in use at Kaiser Permanente in Washington did not find racial or ethnic disparities in prediction performance. Such audits may be needed at other health organizations using suicide prediction models, and the study lays out the steps that should be taken when implementing a prediction model to ensure inequities are not perpetuated.
“Before we implement any prediction model in our clinics, we must test for disparities in accuracy and think about possible negative consequences,” said Gregory Simon, MD, MPH, a study co-author and Kaiser Permanente Washington Health Research Institute senior investigator. “Some prediction models pass that test and some don’t, but we only know by doing research like this.”
In addition to Drs. Coley and Simon, the other co-authors are Eric Johnson, MS; Maricela Cruz, PhD; and Susan Shortreed, PhD. Research reported in this work was supported by the Mental Health Research Network with funding from the National Institute of Mental Health.
By Jonathan Rabinovitz
How our Learning Health System Program is using statistical methods and machine learning
KPWHRI’s Dr. Michael Jackson describes how better testing will improve our ability to forecast the pandemic’s path.
‘Nice warning light,’ Dr. Greg Simon says, ‘but my hands stay on the wheel,’ balancing models with judgment.
NIMH funding will enable the MHRN to conduct larger studies in integrated health systems on topics that matter most.