Diabetes Risk Predictor

1 in 10 Americans has diabetes. What if we could predict it earlier?

Healthcare ML Python R scikit-learn

The Problem

Diabetes affects 37 million Americans and costs the healthcare system $327 billion annually. The tragedy is that Type 2 diabetes is often preventable — if caught early. But current screening relies on patients showing symptoms, which means many are diagnosed too late for lifestyle interventions to be effective.

The question: Can we use readily available patient data (BMI, blood pressure, age, family history) to flag at-risk individuals before symptoms appear?

The Approach

📊

EDA

Feature analysis

→

🔧

Engineer

Clinical features

→

🧰

Model

LR + RF + XGB

→

💡

Interpret

Feature importance

Exploratory Analysis: Identified key risk correlations — glucose levels, BMI, and age emerged as strongest predictors, confirming clinical literature
Feature Engineering: Created interaction terms (BMI × age, glucose × insulin) and handled missing data patterns common in clinical datasets
Model Comparison: Trained logistic regression, random forest, and gradient boosting; evaluated on accuracy, sensitivity (catching true positives), and AUC-ROC
Interpretability: Generated feature importance rankings and partial dependence plots so clinicians can understand and trust the predictions

Key Results

85%

Accuracy

Models Compared

High

Sensitivity

85% classification accuracy with random forest as best performer
Optimized for high sensitivity (minimizing false negatives) because missing a diabetic patient is far worse than a false alarm
Glucose level, BMI, and age confirmed as top 3 predictors — aligning with medical research
Feature importance analysis provides actionable insights for preventive care programs

Business Value

Healthcare Cost Reduction: Early intervention for pre-diabetic patients saves an estimated $3,000–$5,000 per patient per year in avoided complications. Screening Scale: An automated risk model can screen thousands of patients per hour vs. manual clinical assessment. Explainability: Feature importance analysis means clinicians trust and adopt the tool.

Tech Stack

Python R scikit-learn Pandas Matplotlib / Seaborn XGBoost

← Back to all projects