HomeImpactExperience ProjectsSkillsEducationContact
Data Scientist & ML Engineer

Karthik Mettu

I build production ML systems that deliver measurable business outcomes — from real-time fraud scoring at PayPal to demand forecasting across 60K+ SKUs at Accenture.

By The Numbers

0
Fraud Recall Lift
0
Forecast Accuracy
0
SKUs Modeled
0
Records / Day

Professional Experience

Data Scientist — Fraud Detection & Generative AI

2024
PayPal
  • Improved fraud recall by 12–18% through feature engineering (transaction velocity, device behavior, merchant risk) with zero increase in false positives
  • Designed a Generative AI fraud explanation system using AWS Bedrock with structured prompt-engineering pipelines grounded in verified model outputs
  • Reduced analyst investigation time by 25–35% and clarification requests by 22% through human-readable LLM explanations
  • Built explainability data layer on Snowflake (materialized views, partitioned tables) meeting millisecond-level SLAs
  • Implemented hallucination-control metrics via MLflow; reduced misleading explanations by 30% through A/B testing
PythonAWS BedrockSnowflakeMLflowLambdaGen AI

Data Scientist — Demand Forecasting & Supply Chain

2021 – 2023
Accenture
  • Improved forecast accuracy by 23% over legacy system using hybrid ARIMA/SARIMA/Prophet + CatBoost models across 60K+ SKUs and 1,200+ stores
  • Reduced peak-season forecast errors by 27% with probabilistic boosting for high-volatility SKUs
  • Designed hierarchical forecasting (SKU → store → region), improving regional accuracy by 19%
  • Built automated ETL pipelines (Python, SQL, Airflow) processing 100M+ records/day
  • Implemented anomaly detection (Isolation Forest, z-score) reducing forecast failures by 34%
PythonSQLAirflowCatBoostDockerTableau

Featured Projects

Each project tells a story — click any card to read the full case study with problem, approach, results, and business value.

Audiobook Pipeline Flagship

"PDF in, emotionally narrated audiobook out."
4-stage NLP + TTS pipeline

Emotion-aware text-to-speech with sentence-level sentiment, Kokoro-82M voice blending, and Karpathy-style autoresearch optimization. Shipped with Streamlit demo.

PyTorchKokoro-82MOllamaNLPStreamlit

Movie Recommender System

"Binge Netflix? Build your own algorithm."
Content + Collaborative filtering

Dual-approach recommendation engine combining content-based similarity with collaborative filtering, deployed as an interactive Streamlit web app.

Pythonscikit-learnStreamlit

Sentiment Analysis Engine

"What do millions of people actually feel?"
95% classification accuracy

Transformer-based NLP classifier achieving 95% accuracy on large-scale social media data. Fine-tuned for nuanced sentiment detection beyond pos/neg/neutral.

TransformersTensorFlowNLPPython

Diabetes Risk Predictor

"1 in 10 Americans. What if we predicted it earlier?"
85% early detection accuracy

ML-powered risk assessment for diabetes using logistic regression, random forest, and clinical feature engineering for early intervention.

PythonRscikit-learnHealthcare

Multivariate Statistical Analysis

"50 dimensions. Where's the signal?"
PCA + Factor + Cluster analysis

Applied PCA, factor analysis, and clustering to high-dimensional datasets, reducing noise and uncovering hidden structure for downstream ML pipelines.

RPCAFactor AnalysisStatistics

Data Storytelling Dashboards

"A great model is useless if nobody understands it."
Interactive + Multi-platform

Interactive dashboards and data storytelling across Tableau, Power BI, and Python, translating complex analytics into stakeholder-friendly narratives.

TableauPower BIPythonD3.js

Suicide Rate Forecasting

"Public health needs data, not guesswork."
R² = 0.85

Time-series forecasting with ARIMA and Prophet for public health resource allocation, delivering reliable multi-year projections for intervention planning.

ProphetARIMATime SeriesR

Technical Skills

Languages

PythonSQLR

Machine Learning & AI

PyTorchTensorFlowscikit-learnCatBoostXGBoostHugging FaceLLMs / Gen AINLPPrompt Engineering

Data Engineering

SnowflakeBigQueryApache AirflowETL PipelinesFastAPIDocker

Cloud & MLOps

AWSAzureMLflowA/B TestingGit

Visualization

TableauPower BIStreamlitMatplotlib

Statistics

ARIMA / ProphetHypothesis TestingBayesian MethodsPCARegression

Education

MS in Statistics & Data Science

University of Houston  |  GPA: 3.74 / 4.0  |  Aug 2023 – Dec 2024

30 credit-hour program covering the full data science stack — from statistical theory through deep learning to production analytics.

MATH 6373 — Deep Learning & Neural Networks MATH 6386 — Big Data Analytics MATH 6350 — Statistical Learning & Data Mining MATH 6358 — Probability Models & Computing MATH 6381 — Information Visualization MATH 6357 — Linear Models & Experiments MATH 6359 — Applied Statistics & Multivariate MATH 6380 — Programming for Data Analytics Case Studies in Data Science (Elective) MATH 6315 — Internship / Masters Tutorial

Get In Touch

Open to Data Scientist, ML Engineer, and Applied Scientist roles across the US. Let's talk.