Machine learning techniques can be used to improve the accuracy of predicting cardiovascular events in asymptomatic populations and offer greater insights regarding risk factors, researchers report.
In a study published online August 9 in Circulation Research, the authors characterize machine learning as “an effective statistical methodology for handling biomedical data of increased volume, velocity, and variety.”
“We provide a framework for the use of machine learning in outcome prediction and risk assessment in a large population,” Dr. Bharath Ambale-Venkatesh from Johns Hopkins University, Baltimore, told Reuters Health by email. “One of the most powerful aspects of using machine learning is that it opens the possibility of discovering new relationships and new biomarkers that are not hypothesis driven and without prior assumptions. This allows us to study nature’s mechanism with less implicit human bias.”
Dr. Ambale-Venkatesh and colleagues used data from the Multi-Ethnic Study of Atherosclerosis (MESA), involving 6,814 patients (mean age, 62; 53 percent women), to compare machine learning approaches with the commonly used Cox proportional hazards regression model and traditional cardiovascular risk scores.
The team aimed specifically to identify predictors of six clinical outcomes: all-cause death, stroke, all cardiovascular disease (CVD), coronary heart disease (CHD), atrial fibrillation, and heart failure (HF) events.
For each of the outcomes, the machine learning approach required fewer than 20 variables (from among 735 assessed) to obtain a stable and high C-index, a measure of the ability to discriminate outcomes.Among the top 20 markers of all-cause mortality were biomarkers of inflammation and thrombosis, as well as economic status/income, the latter highlighting the role of inequality as a mortality risk factor. Across outcomes, variables from imaging tests, ankle-brachial index, and serum biomarkers were of greater predictive importance, whereas questionnaires and medication exposures were of lower importance.
The 20-variable machine learning model that included biomarkers and measures of subclinical disease outperformed, according to C-index calculations, the MESA CHD Risk score for predicting incident CHD and the MESA-HF risk score for predicting incident heart failure.
MESA involved a middle-aged population free of cardiovascular disease at baseline, so these results may not generalize to other study populations, although similar machine learning methods could be employed in other populations.
“While we have performed this in the setting of an epidemiological study, we envision that similar techniques could be applied to other databases, such as electronic health records and insurance claims, to improve population health and screening strategies,” Dr. Ambale-Venkatesh said by email.
“The methods used here are quite flexible to different forms of data—continuous or categorical, linear or nonlinear, correlated or uncorrelated—and from a variety of domains,” he noted. “This allows us to develop precise patient-specific risk at a more granular level from all the information collected.”