Offline audio transcription and translation.
Transcribe and translate audio offline on your personal computer. Powered by OpenAI's Whisper.
ImageBind One Embedding Space to Bind Them All.
PyTorch implementation and pretrained models for ImageBind. For details, see the paper: ImageBind: One Embedding Space To Bind Them All.
ImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.
🔊 Text-Prompted Generative Audio Model
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
A full-body keyboard using gestures to type through computer vision.
Semaphore uses OpenCV and MediaPipe's Pose detection to perform real-time detection of body landmarks from video input. From there, relative differences are calculated to determine specific positions and translate those into keys and commands sent via keyboard.
Play and create AI-generated adventures with infinite possibilities. Not sure where to start?
Every image is uniquely generated by artificial intelligence. Food items that add an image see 70% more orders and 65% higher sales compared to restaurants that do not.
Lama Cleaner is a free, open-source and fully self-hostable inpainting tool powered by state-of-the-art AI models. You can use it to remove any unwanted object, defect, people from your pictures or erase and replace anything on your pictures.
🎚️ Open Source Audio Matching and Mastering. Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering.
It follows a simple idea - you take TWO audio files and feed them into Matchering. Our algorithm matches both of these tracks and provides you the mastered TARGET track with the same RMS, FR, peak amplitude and stereo width as the REFERENCE track has.
A Jasper alternative open source with ChatGPT.
This project uses ChatGPT API to create almost any text based output for your need - from marketing content to blog post ideas and a lot more. It uses simple template based components to ask ChatGPT for generating results Creating new templates or tasks take about 30 mins. no more, so you can extend it for your needs or wait for new template release :)
The free AI encyclopedia.
AI tools, podcasts, prompts, newsletter, and movies.
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned 20 billion parameter language model, a 6 billion parameter moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. It was trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai. Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
Transcribe and translate any audio file.
Free, fast and accurate transcription of audio files. 100% free to use.