AI/ML

VoiceKeep

VoiceKeep (voicekeep.io): production voice platform from a Voice Cloner prototype ... TTS, conversations, and full audiobook production on a dedicated RTX PRO 6000 Blackwell GPU server

Try VoiceKeep at voicekeep.io

12s

P50 Latency

Short text generation on RTX PRO 6000 Blackwell

99.95%

Uptime

Single-server production deployment

0.03%

Error Rate

Across all generation requests

41+

Curated Voices

Presidents, celebrities, plus custom uploads

The Challenge

Professional voice talent costs $500+/hour, and 95% of books lack audio versions because producing a single audiobook costs $2K-$5K in human narration fees. Existing AI TTS solutions (ElevenLabs, Play.ht) handle single-voice generation but offer nothing for multi-character production workflows ... no conversation builder, no manuscript parsing, no chapter management, no pronunciation dictionaries. Content creators need a complete production pipeline: from raw manuscript to distribution-ready M4B with chapter markers and consistent multi-voice narration. The core engineering challenge: running a 1.7B parameter TTS model in bfloat16 with consistent sub-15-second latency while supporting three distinct production modes (single TTS, multi-speaker conversations, full audiobook chapters) through a unified inference pipeline. Initial prototype ran on an RTX 3080 (10GB VRAM) to force disciplined memory engineering; production moved to a dedicated GPU server with RTX PRO 6000 Blackwell to scale throughput while keeping the same memory-conscious pipeline.

The Approach

Chose Qwen3-TTS 1.7B after benchmarking against XTTS, Bark, and Tortoise ... best quality-to-VRAM ratio for zero-shot cloning from 10-30 second reference samples. Built the inference pipeline on FastAPI with a 4-tier Redis priority queue (admin > enterprise > pro > free). The critical architectural decision was making each audiobook chapter a Conversation record internally, reusing the entire existing TTS pipeline, per-line effects engine, takes system, and timeline editor with zero code duplication. The Audiobook Studio layer adds manuscript parsing (DOCX via python-docx, PDF via PyMuPDF, TXT via regex), chapter management, character-to-voice casting that propagates across all chapters, and a pronunciation dictionary that applies regex substitutions before TTS generation. Implemented proactive worker recycling every 500 generations to combat PyTorch VRAM fragmentation.

Tech Decisions

TTS Model

Qwen3-TTS 1.7B (bfloat16)

Best quality-to-VRAM ratio for zero-shot voice cloning. XTTS requires more memory for comparable quality; Bark and Tortoise are slower with inferior zero-shot capabilities. bfloat16 precision halves memory usage with negligible quality impact. Model originally fit within 10GB VRAM on an RTX 3080 prototype; production now runs on RTX PRO 6000 Blackwell with abundant headroom for concurrent workers.

Audiobook Architecture

Chapter-as-Conversation pattern

Each audiobook chapter maps to a Conversation record, reusing the entire TTS pipeline, per-line effects, takes system, and timeline editor with zero code duplication. This avoided building a parallel generation system and gave audiobooks instant access to all conversation features.

Queue System

Redis 4-Tier Priority Queue

Subscription tiers need differentiated service levels without separate infrastructure. Redis sorted sets with tier-based scoring ensure Enterprise requests process before Pro, Pro before Free ... all on the same GPU.

VRAM Management

Worker Recycling (500 generations)

PyTorch's CUDA memory allocator fragments over time, degrading latency from 12s to 45s+ after thousands of generations. Proactive worker recycling every 500 generations resets allocator state. The 3-second restart penalty is invisible to users.

Technical Challenges

The Solution

Voice Cloner runs three production modes through a unified FastAPI backend: (1) Single-voice TTS for quick generation, (2) Multi-speaker Conversations with drag-and-drop line ordering, per-line effects (speed/volume/gap), stage directions, multiple takes per line, ambient audio layers, and a waveform timeline editor, (3) Audiobook Studio that parses manuscripts into chapters, detects dialogue and character names, assigns AI voices to each character, applies book-wide pronunciation dictionaries, and exports as M4B with chapter markers or MP3/WAV zip with LUFS mastering. The frontend is Next.js 15 on Cloudflare Workers with wavesurfer.js visualization. 41+ curated voices plus custom uploads with SNR quality gating. Stripe handles tiered billing, Clerk manages auth, Sentry + Amplitude provide observability. Running at 99.95% uptime with 0.03% error rate on a single server.

Key Takeaways

Reusable Insights

Reusing existing systems through smart data modeling (chapter-as-conversation) avoids the trap of building parallel pipelines for related features.
GPU inference services need proactive VRAM management ... memory fragmentation is silent and cumulative, degrading latency until the service appears broken.
Zero-shot voice cloning quality is bounded by reference audio quality. Input validation on uploads is the highest-ROI investment for user satisfaction.
Multi-tier manuscript parsing (structure > regex > fallback) handles real-world document diversity better than any single detection strategy.
Single-server GPU deployments can serve production SaaS workloads at 99.95% uptime with proper queue management and proactive maintenance.

Related Projects

2025 / Developer Tools

TraceForge

Cut vector conversion time from 45 minutes to 8 seconds per asset... a 337x speedup. Design teams were hemorrhaging billable hours manually tracing logos and icons in Illustrator. Built a GPU-accelerated pipeline combining neural upscaling with dual vectorization engines (Potrace + VTracer), plus an SVGO optimization stage that reduces file sizes by 40-60%. Now processing 2,000+ conversions monthly with zero manual intervention.

PythonFastAPIPotraceVTracerCUDA

Case Study

2025 / Developer Tools

Claude Pilot

Recovered 2+ hours daily lost to context-switching between terminal, database clients, and config files. Claude Code power users were drowning in fragmented tooling... no unified view of sessions, memory state, or MCP server health. Architected a native Electron control center with 25 tRPC endpoints managing PostgreSQL, Memgraph, and Qdrant memory systems. 80% test coverage, zero production incidents since launch.

ElectronReactTypeScripttRPCZod

Case Study

2024 / AI/ML

PhotoKeep Pro

Slashed cloud GPU costs by 73% while boosting restoration quality by 4dB over commercial alternatives. A restoration startup was burning $12k/month on fragmented API calls with inconsistent results. Engineered a unified orchestration layer managing 14+ deep learning models (SUPIR, HAT, CodeFormer) with thread-safe VRAM allocation and LRU eviction across 49GB. Now delivering 28.5dB PSNR quality at 99.95% uptime... outperforming Magnific AI and Topaz on blind tests.

PythonFastAPIPyTorchReactTypeScript

Case Study

2024 / AI/ML

PenQWEN

Reduced security assessment setup time from 4 hours to 12 minutes with zero hallucinated commands. Pentesting teams were wasting senior hours on boilerplate reconnaissance while generic LLMs generated dangerous garbage. Built a domain-adapted Qwen2.5 model through two-stage LoRA training: cybersecurity corpus adaptation, then agentic fine-tuning for tool calling and OPSEC. 3.6GB adapters trained on 12GB curated security data now automate 60% of routine enumeration tasks.

PythonPyTorchLoRAQwen2.5Transformers

Case Study

Have a similar challenge?

I help teams solve complex technical problems. Let's discuss your project.

Talk through your challenge

VoiceKeep

●The Challenge

●The Approach

●Tech Decisions

●Technical Challenges

●The Solution

●Key Takeaways

●Related Projects

TraceForge

Claude Pilot

PhotoKeep Pro

PenQWEN

Have a similar challenge?

VoiceKeep

●The Challenge

●The Approach

●Tech Decisions

●Technical Challenges

●The Solution

●Key Takeaways

●Related Projects

TraceForge

Claude Pilot

PhotoKeep Pro

PenQWEN

Have a similar challenge?

The Challenge

The Approach

Tech Decisions

Technical Challenges

The Solution

Key Takeaways

Related Projects

The Challenge

The Approach

Tech Decisions

Technical Challenges

The Solution

Key Takeaways

Related Projects