Audiobook Pipeline

PDF in, emotionally narrated audiobook out — a 4-stage NLP+TTS pipeline that reads books the way humans do.
Flagship Project PyTorch Kokoro-82M Ollama Streamlit

The Problem

I love books but don't always have time to sit and read. Existing text-to-speech tools solve the "audio" part but miss something crucial: they sound robotic. A tense thriller passage gets the same flat monotone as a tender moment. The result is unlistenable for anything beyond a few minutes.

The question I asked: Can I build a pipeline that reads with emotion — adjusting tone, speed, and voice character based on what the text actually says?

The Approach

I designed a 4-stage pipeline where each stage solves one piece of the puzzle:

📄
Extract
pdfplumber
🤖
Clean
Ollama LLM
💬
Emotion
8-class sentiment
🎧
Synthesize
Kokoro-82M + MPS

Key Results

4
Pipeline Stages
8
Emotion Classes
82M
TTS Model Params
MPS
GPU Accelerated

Business Value

Accessibility: Converts any text to audio for visually impaired users. Content Production: Publishers and educators can generate narrated versions of documents at near-zero marginal cost. Technical Signal: Demonstrates end-to-end ML system design — data pipeline, model inference, GPU optimization, prompt engineering, and productionization via CLI + web app.

Tech Stack

Python 3.10+ PyTorch (MPS) Kokoro-82M Ollama pdfplumber Streamlit Ruff Pytest GitHub Actions CI
← Back to all projects