Could we identify individuals at risk of developing Alzheimer’s?

When deciding on a project proposal, I was looking at datasets relating to the brain and found one on Alzheimer’s. Because Alzheimer’s Disease remains a significant challenge in the modern age, the choice was obvious.

What is Alzheimer’s Disease?

Alzheimer’s Disease (AD) is a neurodegenerative disorder characterized by significant cognitive decline. It is the most common form of dementia, accounting for more than 70% of all cases worldwide, and the number of cases is expected to rise in the near future. Before developing AD, individuals first undergo mild cognitive impairment (MCI), a condition where cognitive issues emerge but are not severe enough to disrupt daily life. Not everyone with MCI will progress to AD, but currently it is difficult to identify the individuals that will. Additionally, treatment options are available for early-to-mid stage AD yet over half of people are diagnosed at a more advanced stage. Thus, I aimed to classify MCI patients from AD patients and predict who will progress to AD at 1yr, 3yr and 5y intervals.

Dataset Overview

For my project, I used the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a comprehensive longitudinal study that contains clinical, imaging, demographic, and biomarker information. Below are the features that I selected which are all relevant to AD.

Feature Category Description
Sex Demographic Participant's Sex
Race Demographic Participant's Race
Ethnicity Demographic Participant's Ethnicity
Depression Clinical Has Depression
Abnormal Cholesterol Clinical Has abnormal cholesterol levels
Hypertension Clinical Has hypertension
Fusiform Biomarker Fusiform gyrus volume
Hippocampus Biomarker Hippocampal volume
BMI Biomarker Body Mass Index
Ravlt Percent Forgetting Neuropsychological Assessments Indicator of memory dysfunction
Education_level Neuropsychological Assessments Years of education
ADAS13 Neuropsychological Assessments 13-item Alzheimer's Disease Assessment Scale
FAQ Neuropsychological Assessments Functional Assessment Questionnaire
Ravlt Immediate Neuropsychological Assessments Assesses immediate memory
Ravlt Learning Neuropsychological Assessments Assesses verbal learning
Age Demographic Participant's age
Entorhinal Biomarker Entorhinal volume
Midtemp Biomarker Midtemporal gyrus volume
APOE4 copy number Biomarker Number of APOE4 copies
ADAS11 Neuropsychological Assessments 11-item Alzheimer's Disease Assessment Scale
ADASQ4 Neuropsychological Assessments Task 4 of the Alzheimer's Disease Assessment Scale

I do want to point out that I used only the features taken at baseline which is around the time a participant enrolled in the study for all the models. I also excluded any variables that were used for the inclusion criteria, which would have affected the classification results.

EDA

Early exploration centered on the practical constraints of the dataset which were primarily class imbalance and missing values. The class distribution at baseline had a modest 3:1 ratio of AD vs MCI participants while the ratio of non-converters vs converters within one year from baseline were 9:1. I won’t get into the distribution of the features here but race, ethnicity, and clinical history of depression were noticeably skewed.

MCI and AD diagnosis at baseline

MCI converters vs non-converters

The features that contained the most missing values were the MRI-derived brain measurements followed by the APOE4 copy number.

Feature Preparation and Engineering

While most features were used as-is, others were derived from the raw data. BMI was calculated using metric units after converting weight and height from pounds and inches if they weren’t already in metric units. Clinical features were extracted from the participants’ clinical history. Regional brain volumes for each subject were normalized using their respective intracranial volume. I also performed QC and found some outrageous BMI values (100+) so I removed them. Finally, I searched for the Alzheimer’s status for participants at the different time windows to identify the converters. For feature engineering, I encoded the categorical features as numeric arrays and performed feature scaling. Missing values were either dropped or imputed.

Models Used and Metrics

The models that I used were random forest, logistic regression, and support vector machine for their strengths in classification tasks. 5-fold cross-validation was used to evaluate performance. I used the F1-Score, average-precision and accuracy to compare the models. The training and testing split was 70%/30%.

Results

For baseline classification, the models performed similarly. Imputing the values did not affect performance. Using all the features, the Random Forest classifier had an accuracy of 87%, an F1-score of 74% and an average-precision score of 85%. Here, the top 10 features were shown to be brain measurements and neuropsychological assessments.

Top 10 Features

I also tested the three models using the top 10 features and then the top 3. The logistic regression model trained on three features had the best performance , with an accuracy of 0.88, F1-score of 0.76 and an average-precision of 0.87.

Predicting AD progression was much more difficult. Notice that the performance was worst when predicting conversion at year 1. The performance improved at year 3 and was even better at year 5, which led me thinking that the class imbalance was the likely cause, even when SMOTE and imputing were implemented

Prediction results

Conclusions

This project explored AD conversion using baseline classification and predictive modeling. Baseline classification highlighted differences between the MCI and AD subgroups and predictive modeling suggested which features might help identify high-risk MCI participants. Brain volume and measurements and especially neuropsychological assessments taken at baseline were found to have contributed the most to the model’s performance.

After Thoughts

This was the first time that I worked on a real-world clinical dataset. Here, I defined cohorts, created variables and applied and evaluated ML models for classification tasks. I spent most of my time reading on ADNI, the changes it had undergone for each phase and how the data was obtained and standardized.

While I didn't discover anything groundbreaking, I'm happy to report that my results were comparable to those found in the literature 🥳. Honestly, I wish I had done more given the rich data, but I had less than a semester and overall I'm pretty satisfied on how things went. I also enjoyed learning more about Alzheimer's (putting that neuroscience major to good use).

One last thing, I will not share my code as it contains private data and I used Jupyter Notebooks which shows snippets of the raw data (main reason why I decided on this write-up).

References

[1] World Health Organization. (n.d.). Dementia. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/dementia [2] Centers for Disease Control and Prevention. (n.d.). About dementia. Centers for Disease Control and Prevention. https://www.cdc.gov/alzheimers-dementia/about/index.html [3] Alzheimer’s Association. (2025). 2025 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 21(5) https://www.alz.org/getmedia/ef8f48f9-ad36-48ea-87f9-b74034635c1e/alzheimers-facts-and-figures.pdf [4] Ward, A., Tardiff, S., Dye, C., & Arrighi, H. M. (2013). Rate of conversion from prodromal Alzheimer’s disease to Alzheimer’s dementia: a systematic review of the literature. Dementia and geriatric cognitive disorders extra, 3(1), 320-332. [5] Serrano-Pozo, A., Frosch, M. P., Masliah, E., & Hyman, B. T. (2011). Neuropathological alterations in Alzheimer disease. Cold Spring Harbor perspectives in medicine, 1(1), a006189. [6] DeTure, M. A., & Dickson, D. W. (2019). The neuropathological diagnosis of Alzheimer’s disease. Molecular neurodegeneration, 14(1), 32.