Biapy's Bookmarks

SuperCmd

https://supercmd.sh/

AI-Native macOS Launcher.

Command your Mac at the speed of thought.

Voice typing, text-to-speech, persistent memory, AI prompt, clipboard history, and thousands of extensions — all in one launcher built natively for macOS.

SuperCmd @ GitHub.

ai launcher macos raycast software source-available speech-to-text text-to-speech

Added 3 weeks ago

Vapi

https://vapi.ai/

Build Advanced Voice AI Agents.

Related contents:

How I Built an AI Receptionist for a Luxury Mechanic Shop - Part 1 @ That Ladydev.

ai ai-agent api-server commercial text-to-speech voice

Added 1 month ago

Fish Speech

https://speech.fish.audio/

SOTA Open Source TTS.

State-of-the-art multilingual text-to-speech (TTS) system, redefining the boundaries of voice generation.

Fish Audio S2 Pro is the most advanced multimodal model developed by Fish Audio. Trained on over 10 million hours of audio data covering more than 80 languages, S2 Pro combines a Dual-Autoregressive (Dual-AR) architecture with reinforcement learning (RL) alignment to generate speech that is exceptionally natural, realistic, and emotionally rich, leading the competition among both open-source and closed-source systems.

Fish Speech @ GitHub.

machine-learning source-available text-to-speech

Added 1 month ago

Voicebox

https://voicebox.sh/

Open Source Voice Cloning Desktop App Powered by Qwen3-TTS. Create natural-sounding speech from text with near-perfect voice replication.

The open-source voice synthesis studio. Clone voices. Generate speech. Build voice-powered apps. All running locally on your machine.

Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as a local, free and open-source alternative to ElevenLabs — download models, clone voices, and generate speech entirely on your machine.

Voicebox @ GitHub.

Related contents:

Voicebox - Clonez des voix en local sans passer par le cloud @ Korben :fr:.

foss linux macos mit-licensed open-source qwen software text-to-speech voice windows

Added 1 month ago

RCLI

https://github.com/RunanywhereAI/rcli

Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG.

RCLI is an on-device voice AI for macOS. A complete STT + LLM + TTS pipeline running natively on Apple Silicon — 38 macOS actions via voice, local RAG over your documents, sub-200ms end-to-end latency. No cloud, no API keys.

foss llm macos mit-licensed open-source rag software text-to-speech

Added 1 month ago

Pocket TTS

https://github.com/kyutai-labs/pocket-tts

A TTS that fits in your CPU (and pocket).

A lightweight text-to-speech (TTS) application designed to run efficiently on CPUs. Forget about the hassle of using GPUs and web APIs serving TTS models. With Kyutai's Pocket TTS, generating audio is just a pip install and a function call away.

Related contents:

foss lightweight mit-licensed open-source python pytorch text-to-speech

Added 3 months ago

Lue

https://github.com/superstarryeyes/lue

Terminal eBook Reader with Text-to-Speech.

Related contents:

Lue - Lisez vos ebooks en audio dans le terminal @ Korben :fr:.

command-line ebook e-reader foss gpl3-licensed open-source text-to-speech tui

Added 6 months ago

File Wizard

https://github.com/LoredCast/filewizard

File Converter, OCR, Transcription & TTS WebUI.

File Wizard is a self-hosted, browser-based utility for file conversion, OCR, and audio transcription. It wraps many cli and python converters aswell as fast-whisper and tesseract ocr.

converter files foss mit-licensed ocr open-source text-to-speech transcription web-app

Added 7 months ago

MaryTTS

https://marytts.github.io/

an open-source, multilingual text-to-speech synthesis system written in pure java.

MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.

MaryTTS @ GitHub.

Related contents:

Episode 615: 25.05 Reasons to NixOS @ Linux Unplugged.

foss java lgpl3-licensed open-source text-to-speech

Added 8 months ago

Kitten TTS 😻

https://github.com/KittenML/KittenTTS

State-of-the-art TTS model under 25MB 😻

Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.

Related contents:

apache2-licensed foss lightweight open-source python text-to-speech

Added 8 months ago

Chatterbox TTS

https://resemble-ai.github.io/chatterbox_demopage/

We're excited to introduce Chatterbox, our first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

Chatterbox TTS @ GitHub.

foss mit-licensed open-source python text-to-speech

Added 10 months ago

Kokoro Web

https://voice-generator.pages.dev/

Free & Open-Source AI Voice Generator.

A powerful, browser-based AI voice generator that lets you create natural-sounding voices without installing anything.

Use it directly in your browser or self-host it for your own applications with OpenAI API compatibility!

Kokoro Web @ GitHub.

ai foss machine-learning mit-licensed openai open-source self-hosted text-to-speech voice web-app

Added 1 year ago

Zonos

https://github.com/Zyphra/Zonos

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Related contents:

Zonos, l’IA ultime pour cloner une voix ? @ Choses à Savoir TECH :fr:.

foss machine-learning open-source self-hosted text-to-speech

Added 1 year ago

Storyteller

https://smoores.gitlab.io/storyteller/

Storyteller is a self-hosted platform for creating and reading ebooks with synced narration. It's made of three components: the API server, the web interface, and the mobile apps. Together, these components allow you to take audiobooks and ebooks that you already own and automatically synchronize them, as well as read or listen to (or both!) the resulting synced books.

Storyteller @ GitLab.

Related contents:

Episode 140: When Upgrades Go Wrong @ Self Hosted.

ai audiobook ebook foss llm open-source self-hosted text-to-speech web-app

Added 1 year ago

Newsbridge

https://github.com/AshkanArabim/newsbridge

Get news from foreign RSS feeds translated, summarized, and spoken to you daily.

Are you a fan of daily news briefings, but wish you had a wider selection of sources? Say no more! Newsbridge is a simple app that allows you to add your sources as RSS feeds, and then delivers the top stories rom all sources (regardless of the source language) on demand as a 5-6 minute audio briefing.

Mix sources from French, Arabic, Korean, Russian, etc., it doesn't matter. Everything is translated to your language before audio is played.

feed-reader foss open-source rss self-hosted text-to-speech translation web-app

Added 1 year ago

Amphion

https://openhlt.github.io/amphion/

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Amphion @ GitHub.

ai audio foss generation machine-learning music open-source speech text-to-speech

Added 1 year ago

sherpa-onnx

https://k2-fsa.github.io/sherpa/onnx/index.html

Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift.

sherpa-onnx @ GitHub.

kaldi machine-learning open-source speech-to-text text-to-speech

Added 1 year ago

MeloTTS

https://github.com/myshell-ai/MeloTTS?tab=readme-ov-file

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

library open-source python text-to-speech

Added 2 years ago

Bark

https://github.com/suno-ai/bark

🔊 Text-Prompted Generative Audio Model

Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.

machine-learning text-to-audio text-to-speech

Added 2 years ago

eSpeak

https://espeak.sourceforge.net/

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.

eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak @ SourceForge.

Related contents:

Donnez de la voix à votre ordinateur @ Korben :fr:.

foss gpl3-licensed linux open-source software text-to-speech windows

Added 14 years ago