Biapy's Bookmarks

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

Granite-4.0-3B-Vision is a vision-language model (VLM) designed for enterprise-grade document data extraction. It focuses on specialized, complex extraction tasks that ultracompact models often struggle with.

Related contents:

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents @ Hugging Face.

computer-vision ocr vlm

Added 2 months ago

Motion

https://motion-project.github.io/

Motion is a highly configurable program that monitor video signals from many types of cameras and depending upon how they are configured, perform actions when movement is detected.

Motion @ GitHub.

Related contents:

Motion - L'outil Linux pour gérer toutes vos caméras de surveillance @ Korben :fr:.

automation camera computer-vision foss gpl3-licensed motion open-source video-surveillance

Added 3 months ago

Label Studio

https://labelstud.io/

Open Source Data Labeling.

The most flexible data labeling platform to fine-tune LLMs, prepare training data, or evaluate AI systems. Label Studio is a multi-type data labeling and annotation tool with standardized output format. Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.

Label Studio @ GitHub.

ai apache2-licensed computer-vision data-science foss llm machine-learning metadata ocr open-source training

Added 3 months ago

Skyvern

https://www.skyvern.com/

AI Browser Automation. Automate browser based workflows with AI.

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows on a large number of websites, replacing brittle or unreliable automation solutions.

Skyvern @ GitHub.

agpl3-licensed ai ai-agent automation browser-automation computer-vision foss llm open-source web workflow

Added 6 months ago

Unblink

https://github.com/tri2820/unblink

VLM app for video analytics.

Unblink is a camera monitoring application that runs AI vision models on your camera streams in real-time.

agpl3-licensed camera computer-vision foss open-source video-surveillance

Added 7 months ago

🏰 Grayskull

https://github.com/zserge/grayskull

A tiny, dependency-free computer vision library in C for embedded systems, drones, and robotics.

Grayskull is a minimalist, dependency-free computer vision library designed for microcontrollers and other resource-constrained devices. It focuses on grayscale images and provides modern, practical algorithms that fit in a few kilobytes of code. Single-header design, integer-based operations, pure C99.

Related contents:

By the power of grayscale! @ zserge's blog.

computer-vision foss library lightweight minimalistic mit-licensed open-source robotics

Added 7 months ago

Image Moderator for S3 Bucket

https://github.com/lrasata/infra-s3-image-moderator/tree/v1.0.0

AWS-based automation which scans images stored in an Amazon S3 bucket for inappropriate or unsafe content using Amazon Rekognition.

Related contents:

Detect inappropriate images in S3 with AWS Rekognition + Terraform @ Liantsoa R.'s Medium.

automation aws computer-vision s3 terraform

Added 7 months ago

Peekaboo

https://www.peekaboo.boo/

AI Vision for macOS. Fast Screen Capture & VQA.

Peekaboo is a macOS CLI & optional MCP server that enables AI agents to capture screenshots of applications, or the entire system, with optional visual question answering through local or remote AI models.

Peekaboo @ GitHub.

Related contents:

Accelerate developer productivity with these 9 open source AI and MCP projects @ GitHub blog.

ai ai-agent computer-vision foss macos mcp mit-licensed open-source screenshot

Added 7 months ago

OpenVision 2

https://ucsc-vlaa.github.io/OpenVision2/

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning.

OpenVision 2: A Family of Generative Pretrained Visual Encoders that removes the text encoder and contrastive loss, training with caption-only supervision.

OpenVision & OpenVision 2 @ GitHub.

ai apache2-licensed computer-vision foss machine-learning open-source

Added 8 months ago

Enhance Lab :fr:

https://enhancelab.fr/

AI and inverse problems for a revolution in digital photography.

Related contents:

S5E21 - On a reçu le génie français qui révolutionne la vision artificielle @ Underscore_ :fr:.

ai commercial computer-vision france image-manipulation machine-learning photography

Added 9 months ago

SlouchDetector

https://slouchdetector.net/

get alerted when you slouch.

SlouchDetector uses MediaPipe face detection to learn your ideal sitting posture and reminds you to sit up when you slouch. All processing happens locally in your browser; no video data is sent to any server.

SlouchDetector @ GitHub.

Related contents:

SlouchDetector - Quand votre webcam vous rappelle de vous tenir droit @ Korben :fr:.

computer-vision foss healthcare mit-licensed open-source web-app webcam

Added 10 months ago

Morphik

https://www.morphik.ai/

Build Agents that Never Hallucinate. Deploy the most accurate RAG in the world in two lines of code.

The most accurate document search and store for building AI apps.

Morphik @ GitHub.

Related contents:

Don't bother parsing: Just use images for RAG @ Morphik.

ai ai-agent bsl-licensed computer-vision llm rag source-available

Added 10 months ago

sports

https://github.com/roboflow/sports

computer vision and sports.

In sports, every centimeter and every second matter. That's why Roboflow decided to use sports as a testing ground to push our object detection, image segmentation, keypoint detection, and foundational models to their limits. This repository contains reusable tools that can be applied in sports and beyond.

computer-vision foss machine-learning mit-licensed object-detection open-source python

Added 1 year ago

Index

https://github.com/lmnr-ai/index

Open-Source Browser Agent for autonomously performing complex tasks on the web

ai ai-agent apache2-licensed browser-automation computer-vision foss llm open-source

Added 1 year ago

Viseron

https://github.com/roflcoopter/viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

computer-vision face-detection foss mit-licensed nvr open-source self-hosted video-surveillance web-app

Added 1 year ago

Ultralytics YOLO

https://docs.ultralytics.com/

Ultralytics YOLO11 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLO11 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.

Ultralytics YOLO @ GitHub.

computer-vision machine-learning open-source yolo

Added 1 year ago

Supervision

https://supervision.roboflow.com/latest/

We write your reusable computer vision tools. Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone. You can count on us!

Supervision provides a seamless process for annotating predictions generated by various object detection and segmentation models.

Supervision @ GitHub.

computer-vision library machine-learning open-source

Added 2 years ago

ImageBind

https://github.com/facebookresearch/ImageBind

ImageBind One Embedding Space to Bind Them All.

PyTorch implementation and pretrained models for ImageBind. For details, see the paper: ImageBind: One Embedding Space To Bind Them All.

ImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.

computer-vision machine-learning meta open-source python pytorch

Added 3 years ago

Semaphore

https://github.com/everythingishacked/Semaphore

A full-body keyboard using gestures to type through computer vision.

Semaphore uses OpenCV and MediaPipe's Pose detection to perform real-time detection of body landmarks from video input. From there, relative differences are calculated to determine specific positions and translate those into keys and commands sent via keyboard.

computer-vision keyboard machine-learning mediapipe opencv open-source

Added 3 years ago

GVision

https://github.com/GONZOsint/gvision

GVision is a reverse image search app that use Google Cloud Vision API to detect landmarks and web entities from images, helping you gather valuable information quickly and easily.

computer-vision geoint geolocation google osint

Added 3 years ago

YOLOv5

https://github.com/ultralytics/yolov5

YOLOv5 in PyTorch > ONNX > CoreML > TFLite. YOLOv5 is the world's most loved vision AI, representing Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.

computer-vision data-science machine-learning python pytorch

Added 3 years ago

OpenCV

https://opencv.org/

Open Computer Vision. Open source machine learning library for computer vision.

Related contents:

OpenCV Course - Full Tutorial with Python @ freeCodeCamp.org's YouTube.

ai apache2-licensed c computer-vision foss library machine-learning open-source python

Added 3 years ago

Altify

https://github.com/ParhamP/altify

Altify automizes the task of inserting alternative text attributes for image tags. Altify uses Microsoft Computer Vision API's deep learning algorithms to caption images in an HTML file and returns a new HTML file in which alt attributes are filled out with their corresponding captions.

computer-vision développement deep-learning

Added 9 years ago