computer-vision
AI Vision for macOS. Fast Screen Capture & VQA.
Peekaboo is a macOS CLI & optional MCP server that enables AI agents to capture screenshots of applications, or the entire system, with optional visual question answering through local or remote AI models.
Related contents:
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning.
OpenVision 2: A Family of Generative Pretrained Visual Encoders that removes the text encoder and contrastive loss, training with caption-only supervision.
AI and inverse problems for a revolution in digital photography.
Related contents:
get alerted when you slouch.
SlouchDetector uses MediaPipe face detection to learn your ideal sitting posture and reminds you to sit up when you slouch. All processing happens locally in your browser; no video data is sent to any server.
Related contents:
Build Agents that Never Hallucinate. Deploy the most accurate RAG in the world in two lines of code.
The most accurate document search and store for building AI apps.
Related contents:
computer vision and sports.
In sports, every centimeter and every second matter. That's why Roboflow decided to use sports as a testing ground to push our object detection, image segmentation, keypoint detection, and foundational models to their limits. This repository contains reusable tools that can be applied in sports and beyond.
Open-Source Browser Agent for autonomously performing complex tasks on the web
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
Ultralytics YOLO11 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLO11 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
We write your reusable computer vision tools. Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone. You can count on us!
Supervision provides a seamless process for annotating predictions generated by various object detection and segmentation models.
ImageBind One Embedding Space to Bind Them All.
PyTorch implementation and pretrained models for ImageBind. For details, see the paper: ImageBind: One Embedding Space To Bind Them All.
ImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.
A full-body keyboard using gestures to type through computer vision.
Semaphore uses OpenCV and MediaPipe's Pose detection to perform real-time detection of body landmarks from video input. From there, relative differences are calculated to determine specific positions and translate those into keys and commands sent via keyboard.
GVision is a reverse image search app that use Google Cloud Vision API to detect landmarks and web entities from images, helping you gather valuable information quickly and easily.
YOLOv5 in PyTorch > ONNX > CoreML > TFLite. YOLOv5 is the world's most loved vision AI, representing Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
Open Computer Vision. Open source machine learning library for computer vision.
Related contents:
Altify automizes the task of inserting alternative text attributes for image tags. Altify uses Microsoft Computer Vision API's deep learning algorithms to caption images in an HTML file and returns a new HTML file in which alt attributes are filled out with their corresponding captions.