Free & Open-Source AI Voice Generator.
A powerful, browser-based AI voice generator that lets you create natural-sounding voices without installing anything.
Use it directly in your browser or self-host it for your own applications with OpenAI API compatibility!
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.
Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.
Related contents:
Storyteller is a self-hosted platform for creating and reading ebooks with
synced narration. It's made of three components: the API server, the web
interface, and the mobile apps. Together, these components allow you to take
audiobooks and ebooks that you already own and automatically synchronize them,
as well as read or listen to (or both!) the resulting synced books.
Related contents:
Get news from foreign RSS feeds translated, summarized, and spoken to you daily.
Are you a fan of daily news briefings, but wish you had a wider selection of sources? Say no more! Newsbridge is a simple app that allows you to add your sources as RSS feeds, and then delivers the top stories rom all sources (regardless of the source language) on demand as a 5-6 minute audio briefing.
Mix sources from French, Arabic, Korean, Russian, etc., it doesn't matter. Everything is translated to your language before audio is played.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift.
Text-Prompted Generative Audio Model
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
eSpeak est capable de lire en plusieurs langues et même si la voix est très très moche, c'est la solution idéale si vous voulez intégrer une dimension vocale à l'un de vos scripts de traitement ou si vous êtes non/mal voyant et que vous voulez vous créer des petits raccourcis capables de vous rendre la vie plus douce.