Biapy's Bookmarks

Monologue

https://www.monologue.to/

Speech to text. talk to the computer.

Related contents:

I Started Talking to My Computer Instead of Typing. It Changed How I Think. @ Working Overtime's Every.

ai macos software speech-recognition speech-to-text

Added 1 month ago

EmoBox

https://emo-box.github.io/

Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark.

EmoBox, a groundbreaking multilingual multi-corpus speech emotion recognition (SER) toolkit designed to streamline research in this field. EmoBox is accompanied by a meticulously curated benchmark tailored for both intra-corpus and cross-corpus evaluation settings.

EmoBox @ GitHub.

emotion-recognition machine-learning open-source speech-recognition

Added 9 months ago

superwhisper

https://superwhisper.com/

Al powered voice to text.

Write 3x faster, without lifting a finger.

Related contents:

Vibe Coding and the Future of Software Engineering @ Alex P.

ai commercial ios macos software speech-recognition speech-to-text whisper

Added 7 months ago

Whispering

https://github.com/epicenter-so/epicenter/tree/main/apps/whispering

Whispering is an open-source speech-to-text application. Press a keyboard shortcut, speak, and your words will transcribe, transform, then copy and paste at the cursor.

foss linux macos mit-licensed open-source software speech-recognition speech-to-text windows

Added 2 months ago

Kyutai STT

https://kyutai.org/next/stt

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Kyutai STT is a streaming speech-to-text model architecture, providing an unmatched trade-off between latency and accuracy, perfect for interactive applications. Its support for batching allows for processing hundreds of concurrent conversations on a single GPU.

Kyutai STT & TTS @ GitHub.

Related contents:

Transcribe speech 100x faster and 100x cheaper with open models @ Modal.

ai apache2-licensed foss mit-licensed open-source speech-recognition

Added 2 months ago

Open Voice OS

https://www.openvoiceos.org/

🌟 OpenVoiceOS is an open-source platform for smart speakers and other voice-centric devices.

OpenVoiceOS is a community-driven, open-source voice AI platform for creating custom voice-controlled interfaces across devices with NLP, a customizable UI, and a focus on privacy and security.

Open Voice OS core @ GitHub.

ai-assistant foss llm nlp open-source self-hosted speech-recognition voice-assistant

Added 8 months ago

Handy

https://handy.computer/

speak into any text field.

A free, open source, and extensible speech-to-text application that works completely offline.

Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech transcription. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.

Handy @ GitHub.

foss linux macos mit-licensed open-source software speech-recognition speech-to-text windows

Added 3 weeks ago

Hertz-dev

https://github.com/Standard-Intelligence/hertz-dev

Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.

audio llm open-source python speech-recognition

Added 11 months ago

🍓 Ichigo

https://github.com/homebrewltd/ichigo

Llama3.1 learns to Listen. Local real-time voice AI (Formerly llama3-s).

🍓 Ichigo is an open, ongoing research experiment to extend a text-based LLM to have native "listening" ability. Think of it as an open data, open weight, on device Siri.

ai foss llm machine-learning open-source speech-recognition voice-assistant

Added 1 year ago

Neon Core

https://github.com/NeonGeckoCom/NeonCore

Neon Core extends Mycroft core with more modular code, extended multi-user support, and more.

Neon AI is an open source voice assistant.

ai-assistant llm open-source speech-recognition voice-assistant

Added 8 months ago

Whisper

https://openai.com/index/whisper/

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Whisper @ GitHub.

machine-learning openai open-source python speech-recognition speech-to-text

Added 2 years ago

AI Transcriptions by Riverside

https://riverside.fm/transcription

Accurate AI Transcriptions in Minutes.

Web service proposing to transcribe video and/or audio content using AI

ai machine-learning speech-recognition speech-to-text web-service

Added 2 years ago

Distil-Whisper

https://github.com/huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

ai machine-learning open-source speech-recognition speech-to-text whisper

Added 1 year ago

nvidia/parakeet-tdt-0.6b-v2 @ Hugging Face

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

parakeet-tdt-0.6b-v2 is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction.

Related contents:

Transcribe speech 100x faster and 100x cheaper with open models @ Modal.

ai cc-by-4-licensed nvidia speech-recognition

Added 2 months ago

say

https://github.com/8ta4/say

say is always on, recording and transcribing your voice 24/7. Whenever inspiration strikes, just say it.

llm machine-learning macos recording speech-recognition speech-to-text

Added 1 year ago

Whisper

https://github.com/openai/whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

ai machine-learning openai open-source speech-recognition

Added 3 years ago

nvidia/canary-1b-flash @ Hugging Face

https://huggingface.co/nvidia/canary-1b-flash

canary-1b-flash supports automatic speech-to-text recognition (ASR) in four languages (English, German, French, Spanish) and translation from English to German/French/Spanish and from German/French/Spanish to English with or without punctuation and capitalization (PnC).

Related contents:

Transcribe speech 100x faster and 100x cheaper with open models @ Modal.

ai cc-by-4-licensed speech-recognition translation

Added 2 months ago

Amberscript

https://www.amberscript.com/en/

Audio & Video Transcription | Speech-to-text. Smarter subtitling and transcription. We combine artificial and human intelligence to bring you accurate and fast transcripts, captions, and translated subtitles with ease.

accessibility ai audio speech-recognition speech-to-text transcription video web-service

Added 2 years ago