Synthetic Voices
Real Emotion

Advanced speech systems for real-time voice generation, enhancement, and audio intelligence.

Designing high-performance, scalable architectures for text-to-speech and neural audio restoration.

About

About SynthVox

SynthVox is an independent effort focused on building next-generation speech systems at the intersection of research and real-world application. It explores how modern architectures can improve the way machines generate, process, and enhance human speech, with a strong emphasis on real-time performance, high-quality output, and practical usability.

The work spans text-to-speech, voice cloning, audio restoration, and neural compression—bringing these areas together into unified, efficient pipelines. Rather than treating them as isolated problems, SynthVox approaches speech as a complete system, combining generation, enhancement, and optimization into a cohesive framework designed to scale beyond experimentation.

Areas of Work

What You Can Build with
SynthVox

Built on cutting-edge speech architectures, designed for speed and realism.

Generate Real-Time AI Voice

Create natural, high-fidelity speech with ultra-fast inference. Built for real-time applications and production-scale voice generation.

Clone Voices Instantly

Zero-shot voice cloning with high accuracy and expressive control—delivering studio-quality output at unprecedented speed.

Enhance & Restore Audio

Recover lost detail, extend bandwidth, and transform low-quality recordings into clean, high-resolution audio.

Enhance & Restore Audio

Recover lost detail, extend bandwidth, and transform low-quality recordings into clean, high-resolution audio.

Enhance & Restore Audio

Recover lost detail, extend bandwidth, and transform low-quality recordings into clean, high-resolution audio.

Optimize Speech for Scale

Compress, process, and deliver audio efficiently using advanced neural codecs and optimized speech pipelines.

Optimize Speech for Scale

Compress, process, and deliver audio efficiently using advanced neural codecs and optimized speech pipelines.

Optimize Speech for Scale

Compress, process, and deliver audio efficiently using advanced neural codecs and optimized speech pipelines.

Build End-to-End Voice Systems

From generation to restoration and deployment, design complete speech systems tailored for real-world applications.

Our Models

Developed Models

SynthVox is built on a suite of advanced speech models designed for real-time performance, high-fidelity synthesis & efficient audio processing.

From ultra-fast voice generation to restoration, compression, and expressive synthesis, our systems work together to power scalable, production-ready voice AI.

Generate (LuxTTS)

Ultra-fast voice cloning system delivering high-fidelity speech generation at real-time speeds with natural expressiveness.

Express (MiraTTS)

Emotion-aware speech synthesis model designed to produce richly expressive and contextually natural voice output.

Enhance (LavaSR)

High-performance speech restoration and bandwidth extension system for recovering and enhancing audio quality.

Compress/Scale (LinaCodec)

Ultra-efficient neural audio codec enabling extreme compression while preserving high-quality speech reconstruction.

Github

Open Source Work

A collection of research and engineering work in speech AI. Explore models, experiments, and systems built across different areas of audio and voice.

LuxTTS

LuxTTS is an lightweight zipvoice based text-to-speech model designed for high quality voice cloning and realistic generation at speeds exceeding 150x realtime.

NovaSR

NovaSR, a tiny 50kB audio upsampling model that upscales muffled 16khz audio into clear and crisp 48khz audio at speeds over 3500x realtime.

MiraTTS

MiraTTS is a finetune of the excellent Spark-TTS model for enhanced realism and stability performing on par with closed source models.

LavaSR

LavaSR is a high quality speech model that enhances low quality audio with noise into clean crisp audio with speeds reaching roughly 5000x realtime on GPU and over 60x realtime on CPU.

Linacodec

Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio.

FlashSR

This is a tiny audio super-resolution model based on hierspeech++ that upscales 16khz audio into much clearer 48khz audio at speed over 200x realtime to 400x realtime!

Recognitions

What People Are Saying

Feedback and recognition from researchers and developers working in AI and speech. Real reactions to models, experiments, and results.

Bülent Üstbaş
AI Researcher
I just saw it; according to the announcements, LuxTTS does voice cloning with 1GB VRAM, runs at 150x real-time speed, and clones from a 3-second audio sample. People are still paying monthly subscriptions to ElevenLabs and shipping their audio files to someone else's server. When the 4GB GPU in your pocket can handle this, why are you paying rent to someone? It competes with models 10 times its size, with 48kHz output quality. Running faster than real-time even on CPU is a whole other thing. This is a structure that proves once again that the size race is over.
Charly Wargnier
Openclaw
Wild. There is an incredible new open-source TTS model in town. It achieves SOTA voice cloning on par with models 10x larger. 150x real-time generation. 1GB VRAM. 48kHz quality. Meet LuxTTS: The new open-source standard for local voice cloning. If you are still paying expensive API fees for text-to-speech, you need to see this repo. Why it’s a game changer: → The Input: Requires just 3 seconds of reference audio. → The Output: Pristine 48kHz speech (double the industry standard). → The Speed: 150x real-time on GPU, and >1x real-time on a basic CPU. → The Quality: SOTA cloning that rivals massive, bloated AI models.
Hasan Toor
AI & Tech Educator
You can now clone any voice on a 4GB GPU. LuxTTS just killed the "you need ElevenLabs" excuse. It clones voices from 3 seconds of audio at 150x realtime speed. Fits in 1GB VRAM. Faster than realtime even on CPU. → 48khz output vs industry standard 24khz → Clone any voice locally with no subscription → Works on GPU and CPU 100% Opensource.
Wildminder
Physicist, Programmer, Designer
Quick, crisp, and efficient. Perfect for batch processing or live enhancement. Love seeing this kind of stuff.
Hugging Models
Best of @huggingface models.
LavaSR is a compact, high-speed model (50MB) that extends bandwidth & restores audio super fast — up to 4000× on GPU.
Mario Nawfal
Founder @ibcgroupio , http://attentioncompany.com
LuxTTS clones any voice from 3 seconds of audio on a 4GB GPU. - 150x realtime speed - 48khz output vs industry standard 24khz - Fits in 1GB VRAM - Works on CPU too No ElevenLabs subscription. No cloud. Just open source. The voice cloning barrier just hit zero.
Palanisamy Ramasamy
Founder & CEO, LuMay AI
This is impressive. The combination of speed, low compute requirements, and high-quality output is what makes this stand out. Running near real-time even on CPU could unlock a lot of practical use cases.
Jafar Najafov
Co-Founder & CEO at Nextool & Reel Agency
You can now clone any voice on a 4GB GPU. LuxTTS just killed the "you need ElevenLabs" excuse. It clones voices from 3 seconds of audio at 150x realtime speed. Fits in 1GB VRAM. Faster than realtime even on CPU. → 48khz output vs industry standard 24khz → Clone any voice locally with no subscription → Works on GPU and CPU 100% Opensource.
Mentor Dolores Florentino
Career and Leadership Mentor
Voice cloning used to require massive compute and long training pipelines. Now we are talking about cloning from a three second clip. That is an incredible leap.