Data Scientist & ML Engineer

Karthik Mettu

I build production ML systems that deliver measurable business outcomes — from real-time fraud scoring at PayPal to demand forecasting across 60K+ SKUs at Accenture.

LinkedIn GitHub Email

By The Numbers

Fraud Recall Lift

Forecast Accuracy

SKUs Modeled

Records / Day

Professional Experience

Data Scientist — Fraud Detection & Generative AI

2024

PayPal

Improved fraud recall by 12–18% through feature engineering (transaction velocity, device behavior, merchant risk) with zero increase in false positives
Designed a Generative AI fraud explanation system using AWS Bedrock with structured prompt-engineering pipelines grounded in verified model outputs
Reduced analyst investigation time by 25–35% and clarification requests by 22% through human-readable LLM explanations
Built explainability data layer on Snowflake (materialized views, partitioned tables) meeting millisecond-level SLAs
Implemented hallucination-control metrics via MLflow; reduced misleading explanations by 30% through A/B testing

PythonAWS BedrockSnowflakeMLflowLambdaGen AI

Data Scientist — Demand Forecasting & Supply Chain

2021 – 2023

Accenture

Improved forecast accuracy by 23% over legacy system using hybrid ARIMA/SARIMA/Prophet + CatBoost models across 60K+ SKUs and 1,200+ stores
Reduced peak-season forecast errors by 27% with probabilistic boosting for high-volatility SKUs
Designed hierarchical forecasting (SKU → store → region), improving regional accuracy by 19%
Built automated ETL pipelines (Python, SQL, Airflow) processing 100M+ records/day
Implemented anomaly detection (Isolation Forest, z-score) reducing forecast failures by 34%

PythonSQLAirflowCatBoostDockerTableau

Featured Projects

Each project tells a story — click any card to read the full case study with problem, approach, results, and business value.

Audiobook Pipeline Flagship

"PDF in, emotionally narrated audiobook out."

4-stage NLP + TTS pipeline

Emotion-aware text-to-speech with sentence-level sentiment, Kokoro-82M voice blending, and Karpathy-style autoresearch optimization. Shipped with Streamlit demo.

PyTorchKokoro-82MOllamaNLPStreamlit

Shipped Read Case Study →

Movie Recommender System

"Binge Netflix? Build your own algorithm."

Content + Collaborative filtering

Dual-approach recommendation engine combining content-based similarity with collaborative filtering, deployed as an interactive Streamlit web app.

Pythonscikit-learnStreamlit

Live Demo Read Case Study →

Sentiment Analysis Engine

"What do millions of people actually feel?"

95% classification accuracy

Transformer-based NLP classifier achieving 95% accuracy on large-scale social media data. Fine-tuned for nuanced sentiment detection beyond pos/neg/neutral.

TransformersTensorFlowNLPPython

Shipped Read Case Study →

Diabetes Risk Predictor

"1 in 10 Americans. What if we predicted it earlier?"

85% early detection accuracy

ML-powered risk assessment for diabetes using logistic regression, random forest, and clinical feature engineering for early intervention.

PythonRscikit-learnHealthcare

Shipped Read Case Study →

Multivariate Statistical Analysis

"50 dimensions. Where's the signal?"

PCA + Factor + Cluster analysis

Applied PCA, factor analysis, and clustering to high-dimensional datasets, reducing noise and uncovering hidden structure for downstream ML pipelines.

RPCAFactor AnalysisStatistics

Shipped Read Case Study →

Data Storytelling Dashboards

"A great model is useless if nobody understands it."

Interactive + Multi-platform

Interactive dashboards and data storytelling across Tableau, Power BI, and Python, translating complex analytics into stakeholder-friendly narratives.

TableauPower BIPythonD3.js

Shipped Read Case Study →

Suicide Rate Forecasting

"Public health needs data, not guesswork."

R² = 0.85

Time-series forecasting with ARIMA and Prophet for public health resource allocation, delivering reliable multi-year projections for intervention planning.

ProphetARIMATime SeriesR

Shipped Read Case Study →

Technical Skills

Languages

PythonSQLR

Machine Learning & AI

PyTorchTensorFlowscikit-learnCatBoostXGBoostHugging FaceLLMs / Gen AINLPPrompt Engineering

Data Engineering

SnowflakeBigQueryApache AirflowETL PipelinesFastAPIDocker

Cloud & MLOps

AWSAzureMLflowA/B TestingGit

Visualization

TableauPower BIStreamlitMatplotlib

Statistics

ARIMA / ProphetHypothesis TestingBayesian MethodsPCARegression

Education

MS in Statistics & Data Science

University of Houston | GPA: 3.74 / 4.0 | Aug 2023 – Dec 2024

30 credit-hour program covering the full data science stack — from statistical theory through deep learning to production analytics.

MATH 6373 — Deep Learning & Neural Networks MATH 6386 — Big Data Analytics MATH 6350 — Statistical Learning & Data Mining MATH 6358 — Probability Models & Computing MATH 6381 — Information Visualization MATH 6357 — Linear Models & Experiments MATH 6359 — Applied Statistics & Multivariate MATH 6380 — Programming for Data Analytics Case Studies in Data Science (Elective) MATH 6315 — Internship / Masters Tutorial

Get In Touch

Open to Data Scientist, ML Engineer, and Applied Scientist roles across the US. Let's talk.

karthikrm202002@gmail.com LinkedIn GitHub