TY - JOUR
T1 - Who was at risk for COVID-19 late in the US pandemic? Insights from a population health machine learning model
AU - Adeoye, Elijah A.
AU - Rozenfeld, Yelena
AU - Beam, Jennifer
AU - Boudreau, Karen
AU - Cox, Emily J.
AU - Scanlan, James M.
N1 - Publisher Copyright:
© 2022, International Federation for Medical and Biological Engineering.
PY - 2022/7
Y1 - 2022/7
N2 - Notable discrepancies in vulnerability to COVID-19 infection have been identified between specific population groups and regions in the USA. The purpose of this study was to estimate the likelihood of COVID-19 infection using a machine-learning algorithm that can be updated continuously based on health care data. Patient records were extracted for all COVID-19 nasal swab PCR tests performed within the Providence St. Joseph Health system from February to October of 2020. A total of 316,599 participants were included in this study, and approximately 7.7% (n = 24,358) tested positive for COVID-19. A gradient boosting model, LightGBM (LGBM), predicted risk of initial infection with an area under the receiver operating characteristic curve of 0.819. Factors that predicted infection were cough, fever, being a member of the Hispanic or Latino community, being Spanish speaking, having a history of diabetes or dementia, and living in a neighborhood with housing insecurity. A model trained on sociodemographic, environmental, and medical history data performed well in predicting risk of a positive COVID-19 test. This model could be used to tailor education, public health policy, and resources for communities that are at the greatest risk of infection. Graphical abstract: [Figure not available: see fulltext.].
AB - Notable discrepancies in vulnerability to COVID-19 infection have been identified between specific population groups and regions in the USA. The purpose of this study was to estimate the likelihood of COVID-19 infection using a machine-learning algorithm that can be updated continuously based on health care data. Patient records were extracted for all COVID-19 nasal swab PCR tests performed within the Providence St. Joseph Health system from February to October of 2020. A total of 316,599 participants were included in this study, and approximately 7.7% (n = 24,358) tested positive for COVID-19. A gradient boosting model, LightGBM (LGBM), predicted risk of initial infection with an area under the receiver operating characteristic curve of 0.819. Factors that predicted infection were cough, fever, being a member of the Hispanic or Latino community, being Spanish speaking, having a history of diabetes or dementia, and living in a neighborhood with housing insecurity. A model trained on sociodemographic, environmental, and medical history data performed well in predicting risk of a positive COVID-19 test. This model could be used to tailor education, public health policy, and resources for communities that are at the greatest risk of infection. Graphical abstract: [Figure not available: see fulltext.].
KW - COVID-19
KW - Infection
KW - Risk
KW - Social determinants of health
UR - http://www.scopus.com/inward/record.url?scp=85129770284&partnerID=8YFLogxK
U2 - 10.1007/s11517-022-02549-5
DO - 10.1007/s11517-022-02549-5
M3 - Article
C2 - 35538201
AN - SCOPUS:85129770284
SN - 0140-0118
VL - 60
SP - 2039
EP - 2049
JO - Medical and Biological Engineering and Computing
JF - Medical and Biological Engineering and Computing
IS - 7
ER -