text-to-speech
Terminal eBook Reader with Text-to-Speech.
Related contents:
File Converter, OCR, Transcription & TTS WebUI.
File Wizard is a self-hosted, browser-based utility for file conversion, OCR, and audio transcription. It wraps many cli and python converters aswell as fast-whisper and tesseract ocr.
an open-source, multilingual text-to-speech synthesis system written in pure java.
MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.
Related contents:
State-of-the-art TTS model under 25MB 😻
Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.
Related contents:
We're excited to introduce Chatterbox, our first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Free & Open-Source AI Voice Generator.
A powerful, browser-based AI voice generator that lets you create natural-sounding voices without installing anything.
Use it directly in your browser or self-host it for your own applications with OpenAI API compatibility!
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.
Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.
Related contents:
Storyteller is a self-hosted platform for creating and reading ebooks with synced narration. It's made of three components: the API server, the web interface, and the mobile apps. Together, these components allow you to take audiobooks and ebooks that you already own and automatically synchronize them, as well as read or listen to (or both!) the resulting synced books.
Related contents:
Get news from foreign RSS feeds translated, summarized, and spoken to you daily.
Are you a fan of daily news briefings, but wish you had a wider selection of sources? Say no more! Newsbridge is a simple app that allows you to add your sources as RSS feeds, and then delivers the top stories rom all sources (regardless of the source language) on demand as a 5-6 minute audio briefing.
Mix sources from French, Arabic, Korean, Russian, etc., it doesn't matter. Everything is translated to your language before audio is played.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
🔊 Text-Prompted Generative Audio Model
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
Related contents: