Predictive Modeling of Alzheimer’s Outcomes

Could we identify individuals at risk of developing Alzheimer’s?

When deciding on a project proposal, I was looking at datasets relating to the brain and found one on Alzheimer’s. Because Alzheimer’s Disease remains a significant challenge in the modern age, the choice was obvious.

What is Alzheimer’s Disease?

Alzheimer’s Disease (AD) is a neurodegenerative disorder characterized by significant cognitive decline. It is the most common form of dementia, accounting for more than 70% of all cases worldwide, and the number of cases is expected to rise in the near future. Before developing AD, individuals first undergo mild cognitive impairment (MCI), a condition where cognitive issues emerge but are not severe enough to disrupt daily life. Not everyone with MCI will progress to AD, but currently it is difficult to identify the individuals that will. Additionally, treatment options are available for early-to-mid stage AD yet over half of people are diagnosed at a more advanced stage. Thus, I aimed to classify MCI patients from AD patients and predict who will progress to AD at 1yr, 3yr and 5y intervals.

Dataset Overview

For my project, I used the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a comprehensive longitudinal study that contains clinical, imaging, demographic, and biomarker information. Below are the features that I selected which are all relevant to AD.

Feature	Category	Description
Sex	Demographic	Participant's Sex
Race	Demographic	Participant's Race
Ethnicity	Demographic	Participant's Ethnicity
Depression	Clinical	Has Depression
Abnormal Cholesterol	Clinical	Has abnormal cholesterol levels
Hypertension	Clinical	Has hypertension
Fusiform	Biomarker	Fusiform gyrus volume
Hippocampus	Biomarker	Hippocampal volume
BMI	Biomarker	Body Mass Index
Ravlt Percent Forgetting	Neuropsychological Assessments	Indicator of memory dysfunction
Education_level	Neuropsychological Assessments	Years of education
ADAS13	Neuropsychological Assessments	13-item Alzheimer's Disease Assessment Scale
FAQ	Neuropsychological Assessments	Functional Assessment Questionnaire
Ravlt Immediate	Neuropsychological Assessments	Assesses immediate memory
Ravlt Learning	Neuropsychological Assessments	Assesses verbal learning
Age	Demographic	Participant's age
Entorhinal	Biomarker	Entorhinal volume
Midtemp	Biomarker	Midtemporal gyrus volume
APOE4 copy number	Biomarker	Number of APOE4 copies
ADAS11	Neuropsychological Assessments	11-item Alzheimer's Disease Assessment Scale
ADASQ4	Neuropsychological Assessments	Task 4 of the Alzheimer's Disease Assessment Scale

I do want to point out that I used only the features taken at baseline which is around the time a participant enrolled in the study for all the models. I also excluded any variables that were used for the inclusion criteria, which would have affected the classification results.

EDA

Early exploration centered on the practical constraints of the dataset which were primarily class imbalance and missing values. The class distribution at baseline had a modest 3:1 ratio of AD vs MCI participants while the ratio of non-converters vs converters within one year from baseline were 9:1. I won’t get into the distribution of the features here but race, ethnicity, and clinical history of depression were noticeably skewed.

MCI and AD diagnosis at baseline

MCI converters vs non-converters

The features that contained the most missing values were the MRI-derived brain measurements followed by the APOE4 copy number.

Feature Preparation and Engineering

While most features were used as-is, others were derived from the raw data. BMI was calculated using metric units after converting weight and height from pounds and inches if they weren’t already in metric units. Clinical features were extracted from the participants’ clinical history. Regional brain volumes for each subject were normalized using their respective intracranial volume. I also performed QC and found some outrageous BMI values (100+) so I removed them. Finally, I searched for the Alzheimer’s status for participants at the different time windows to identify the converters. For feature engineering, I encoded the categorical features as numeric arrays and performed feature scaling. Missing values were either dropped or imputed.

Models Used and Metrics

The models that I used were random forest, logistic regression, and support vector machine for their strengths in classification tasks. 5-fold cross-validation was used to evaluate performance. I used the F1-Score, average-precision and accuracy to compare the models. The training and testing split was 70%/30%.

Results

For baseline classification, the models performed similarly. Imputing the values did not affect performance. Using all the features, the Random Forest classifier had an accuracy of 87%, an F1-score of 74% and an average-precision score of 85%. Here, the top 10 features were shown to be brain measurements and neuropsychological assessments.

Top 10 Features

I also tested the three models using the top 10 features and then the top 3. The logistic regression model trained on three features had the best performance , with an accuracy of 0.88, F1-score of 0.76 and an average-precision of 0.87.

Predicting AD progression was much more difficult. Notice that the performance was worst when predicting conversion at year 1. The performance improved at year 3 and was even better at year 5, which led me thinking that the class imbalance was the likely cause, even when SMOTE and imputing were implemented

Prediction results

Conclusions

This project explored AD conversion using baseline classification and predictive modeling. Baseline classification highlighted differences between the MCI and AD subgroups and predictive modeling suggested which features might help identify high-risk MCI participants. Brain volume and measurements and especially neuropsychological assessments taken at baseline were found to have contributed the most to the model’s performance.

After Thoughts

This was the first time that I worked on a real-world clinical dataset. Here, I defined cohorts, created variables and applied and evaluated ML models for classification tasks. I spent most of my time reading on ADNI, the changes it had undergone for each phase and how the data was obtained and standardized.

While I didn't discover anything groundbreaking, I'm happy to report that my results were comparable to those found in the literature 🥳. Honestly, I wish I had done more given the rich data, but I had less than a semester and overall I'm pretty satisfied on how things went. I also enjoyed learning more about Alzheimer's (putting that neuroscience major to good use).

One last thing, I will not share my code as it contains private data and I used Jupyter Notebooks which shows snippets of the raw data (main reason why I decided on this write-up).

References

[1] World Health Organization. (n.d.). Dementia. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/dementia [2] Centers for Disease Control and Prevention. (n.d.). About dementia. Centers for Disease Control and Prevention. https://www.cdc.gov/alzheimers-dementia/about/index.html [3] Alzheimer’s Association. (2025). 2025 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 21(5) https://www.alz.org/getmedia/ef8f48f9-ad36-48ea-87f9-b74034635c1e/alzheimers-facts-and-figures.pdf [4] Ward, A., Tardiff, S., Dye, C., & Arrighi, H. M. (2013). Rate of conversion from prodromal Alzheimer’s disease to Alzheimer’s dementia: a systematic review of the literature. Dementia and geriatric cognitive disorders extra, 3(1), 320-332. [5] Serrano-Pozo, A., Frosch, M. P., Masliah, E., & Hyman, B. T. (2011). Neuropathological alterations in Alzheimer disease. Cold Spring Harbor perspectives in medicine, 1(1), a006189. [6] DeTure, M. A., & Dickson, D. W. (2019). The neuropathological diagnosis of Alzheimer’s disease. Molecular neurodegeneration, 14(1), 32.