Diabetes Risk Predictor
The Problem
Diabetes affects 37 million Americans and costs the healthcare system $327 billion annually. The tragedy is that Type 2 diabetes is often preventable — if caught early. But current screening relies on patients showing symptoms, which means many are diagnosed too late for lifestyle interventions to be effective.
The question: Can we use readily available patient data (BMI, blood pressure, age, family history) to flag at-risk individuals before symptoms appear?
The Approach
- Exploratory Analysis: Identified key risk correlations — glucose levels, BMI, and age emerged as strongest predictors, confirming clinical literature
- Feature Engineering: Created interaction terms (BMI × age, glucose × insulin) and handled missing data patterns common in clinical datasets
- Model Comparison: Trained logistic regression, random forest, and gradient boosting; evaluated on accuracy, sensitivity (catching true positives), and AUC-ROC
- Interpretability: Generated feature importance rankings and partial dependence plots so clinicians can understand and trust the predictions
Key Results
- 85% classification accuracy with random forest as best performer
- Optimized for high sensitivity (minimizing false negatives) because missing a diabetic patient is far worse than a false alarm
- Glucose level, BMI, and age confirmed as top 3 predictors — aligning with medical research
- Feature importance analysis provides actionable insights for preventive care programs
Business Value
Healthcare Cost Reduction: Early intervention for pre-diabetic patients saves an estimated $3,000–$5,000 per patient per year in avoided complications. Screening Scale: An automated risk model can screen thousands of patients per hour vs. manual clinical assessment. Explainability: Feature importance analysis means clinicians trust and adopt the tool.