Biapy's Bookmarks

mac-ocr

https://github.com/privatenumber/mac-ocr

macOS CLI for OCR and searchable PDFs using Apple's Vision framework.

A macOS command-line tool that reads text from images and PDFs, and creates searchable PDFs. Runs entirely on your Mac with Apple's Vision framework; nothing is uploaded.

command-line computer-vision foss macos mit-licensed ocr open-source pdf

Added 1 month ago

PaddleOCR

https://paddleocr.com/

The Ultimate Document Solution.

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

PaddleOCR @ GitHub.

apache2-licensed foss ocr open-source pdf self-hosted web-app

Added 1 month ago

Stirling Image

https://stirling-image.github.io/stirling-image/

Self-hosted image processing

Resize, compress, convert, remove backgrounds, and more. All on your own server, no data leaves your machine. Get started

Stirling-PDF but for images. 30+ tools and local AI in a single Docker container - resize, compress, remove backgrounds, upscale, OCR, and more. No cloud, no telemetry. Your images never leave your machine.

Stirling Image @ GitHub.

Related contents:

Veille #51 — L'actu de la semaine @ Camille Roux :fr:.

agpl3-licensed background-removal image-editor ocr open-source self-hosted upscaler web-app

Added 3 months ago

Granite 4.0 3B Vision

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

Granite-4.0-3B-Vision is a vision-language model (VLM) designed for enterprise-grade document data extraction. It focuses on specialized, complex extraction tasks that ultracompact models often struggle with.

Related contents:

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents @ Hugging Face.

computer-vision ocr vlm

Added 3 months ago

Paperwise

https://paperwise.dev/

Documents, structured and queryable. Structured and Queryable Documents.

Paperwise helps you OCR, extract, organize, and query documents on your own infrastructure. Run it locally or self-host it, and keep full control of your data.

Paperwise @ GitHub.

dms ocr open-source rag self-hosted web-app

Added 4 months ago

LiteParse

https://developers.llamaindex.ai/liteparse/

A fast, helpful, and open-source document parser.

LiteParse is an open-source document parsing library that parses text with spatial layout information and bounding boxes. It runs entirely on your machine, with no cloud dependencies, no LLMs, no API keys.

LiteParse is designed specifically for use cases that require fast, accurate text parsing: real-time applications, coding agents, and local workflows. It provides a simple CLI and library API for parsing PDFs, Office documents, and images, with built-in OCR support.

LiteParse @ GitHub.

apache2-licensed api-server command-line docx foss ocr open-source pdf self-hosted web-app

Added 4 months ago

Papermerge DMS

https://www.papermerge.com/

Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata based search.

Papermerge DMS core @ GitHub.

apache2-licensed dms foss ocr open-source paperless self-hosted web-app

Added 4 months ago

Readur 📄

https://github.com/readur/readur/

Quick, painless, intuitive OCR platform written in Rust and TypeScript. Modern UI with modern API, with an emphasis on intuitive user experience.

Readur is a powerful and modern document management system designed to help individuals and teams efficiently organize, process, and access their digital documents. It combines a high-performance backend with a sleek and intuitive web interface to deliver a smooth and reliable user experience.

Related contents:

Readur - Gestion documentaire OCR pour ranger votre bazar @ Korben :fr:.

dms foss mit-licensed ocr open-source paperless self-hosted web-app

Added 4 months ago

Label Studio

https://labelstud.io/

Open Source Data Labeling.

The most flexible data labeling platform to fine-tune LLMs, prepare training data, or evaluate AI systems. Label Studio is a multi-type data labeling and annotation tool with standardized output format. Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.

Label Studio @ GitHub.

ai apache2-licensed computer-vision data-science foss llm machine-learning metadata ocr open-source training

Added 4 months ago

PaddleOCR :cn:

https://www.paddleocr.ai/latest/en/index.html

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

PaddleOCR is an industry-leading, production-ready OCR and document AI engine, offering end-to-end solutions from text extraction to intelligent document understanding

PaddleOCR @ GitHub.

Related contents:

Episode #125: The state of homelab tech (2026) @ Changelog & Friends.

ai apache2-licensed china foss micro-service ocr open-source rag self-hosted

Added 6 months ago

Scribe OCR

https://scribeocr.com/

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

Scribe OCR is a free (libre) web application for recognizing text from images, proofreading OCR data, and creating fully-digitized documents

Scribe OCR @ GitHub.

Related contents:

ScribeOCR - Corrigez vos erreurs d'OCR directement dans le navigateur (en local) @ Korben :fr:.

agpl3-licensed foss ocr open-source pdf self-hosted web-app

Added 8 months ago

Tesseract OCR

https://tesseract-ocr.github.io/

Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license.

Tesseract can be used directly via command line, or (for programmers) by using an API to extract printed text from images. It supports a wide variety of languages. Tesseract doesn’t have a built-in GUI, but there are several available from the 3rdParty page. External tools, wrappers and training projects for Tesseract are listed under AddOns.

Tesseract OCR @ GitHub.

Related contents:

apache2-licensed command-line foss ocr open-source

Added 9 months ago

File Wizard

https://github.com/LoredCast/filewizard

File Converter, OCR, Transcription & TTS WebUI.

File Wizard is a self-hosted, browser-based utility for file conversion, OCR, and audio transcription. It wraps many cli and python converters aswell as fast-whisper and tesseract ocr.

converter files foss mit-licensed ocr open-source text-to-speech transcription web-app

Added 10 months ago

NormCap

https://dynobo.github.io/normcap/

OCR-powered screenshot tool to capture text instead of images.

NormCap @ GitHub.

Related contents:

NormCap - Un OCR gratuit pour capturer directement le texte @ Korben :fr:.

foss gpl3-licensed linux macos ocr open-source screenshot software windows

Added 1 year ago

Auntie PDF

https://auntiepdf.com/

Your all-knowing guide that unpacks every PDF into clear, actionable insights.

Auntie PDF is a web application that helps users extract information and insights from PDF documents. With a sassy, helpful personality, Auntie PDF makes understanding complex documents easier and more engaging.

ai foss llm mistral mit-licensed ocr open-source pdf web-app

Added 1 year ago

Kreuzberg

https://github.com/Goldziher/kreuzberg

A text extraction library supporting PDFs, images, office documents and more.

Kreuzberg is a Python library for text extraction from documents. It provides a unified async interface for extracting text from PDFs, images, office documents, and more.

csv docx foss library microsoft-office ocr open-source pdf python xlsx

Added 1 year ago

Sparrow

https://sparrow.katanaml.io/

Data processing with ML, LLM and Vision LLM.

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, bank statements, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance.

Sparrow @ GitHub.

Related contents:

Sparrow - Pour extraire des données avec l'IA @ Korben :fr:.

ai data-pipeline foss llm ocr open-source parser self-hosted web-app

Added 1 year ago

🖼️ Image Toolbox

https://github.com/T8RIN/ImageToolbox

ImageToolbox is a versatile image editing tool designed for efficient photo manipulation. It allows users to crop, apply filters, edit EXIF data, erase backgrounds, and even convert images to PDFs. Ideal for both photographers and developers, the tool offers a simple interface with powerful capabilities.

android apache2-licensed foss image image-editor image-manipulation ocr open-source photography

Added 1 year ago

paperless-gpt

https://github.com/icereed/paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI.

paperless-gpt seamlessly pairs with paperless-ngx to generate AI-powered document titles and tags, saving you hours of manual sorting. While other tools may offer AI chat features, paperless-gpt stands out by supercharging OCR with LLMs—ensuring high accuracy, even with tricky scans. If you’re craving next-level text extraction and effortless document organization, this is your solution.

ai automation foss llm llm-vision mit-licensed ocr open-source paperless self-hosted

Added 1 year ago

MarkItDown

https://github.com/microsoft/markitdown

Python tool for converting files and office documents to Markdown. MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).

MarkItDown-MCP @ GitHub.

Related contents:

command-line exif foss llm mcp microsoft-office ocr open-source parser pdf python rag

Added 1 year ago

mPLUG-DocOwl

https://github.com/X-PLUG/mPLUG-DocOwl

The Powerful Multi-modal LLM Family for OCR-free Document Understanding. Modularized Multimodal Large Language Model for Document Understanding.

foss llm machine-learning ocr open-source pdf

Added 1 year ago

Zerox OCR

https://github.com/getomni-ai/zerox

Zero shot pdf OCR with gpt-4o-mini.

A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!

ai foss llm machine-learning ocr open-source pdf python

Added 1 year ago

Frog

https://getfrog.app/

Extract text from any image, video, QR Code, etc.

Quickly extract text from almost any source: YouTube, screencasts, PDFs, webpages, photos, etc. Grab the image and get the text.

flatpak foss linux ocr open-source qrcode screenshot software

Added 1 year ago

unpaper

https://github.com/unpaper/unpaper

A post-processing tool for scanned sheets of paper.

unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).

command-line foss ocr open-source pdf scan

Added 1 year ago

OCRmyPDF

https://ocrmypdf.readthedocs.io/en/latest/index.html

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.

OCRmyPDF @ GitHub.

command-line foss ocr open-source pdf python scan

Added 1 year ago

marker

https://www.datalab.to/marker

Convert PDF to markdown quickly with high accuracy

marker @ GitHub.

command-line converter markdown ocr open-source pdf python

Added 2 years ago

Open-Capture

https://github.com/edissyum/opencapture

Open-Capture is the one and only 100% Open Source intelligent capture managment.

ai archive maarch machine-learning ocr open-source scan

Added 3 years ago

Mayan EDMS

http://www.mayan-edms.com/#

Mayan EDMS is an electronic vault for your documents. With Mayan EDMS you will never lose another document to floods, fire, theft, sabotage, fungus or decomposition. Its advanced search and categorization capabilities will help you reduce the time to find the information you need. It is free open source and integrates with your existing equipment, that means low to no initial investment, and even lower total cost of ownership, reducing operational costs has never been this easy. Being Open Source its code is freely available, allowing you to see how it is handling your documents if you ever need to, you will be glad you choose Mayan EDMS on your next audit. Initially released in 2011 and with thousands of installations worldwide, Mayan EDMS is a mature and time tested software you can rely on.

dms ocr open-source scan web-app

Added 10 years ago