ocr
OCR-powered screenshot tool to capture text instead of images.
Related contents:
Your all-knowing guide that unpacks every PDF into clear, actionable insights.
Auntie PDF is a web application that helps users extract information and insights from PDF documents. With a sassy, helpful personality, Auntie PDF makes understanding complex documents easier and more engaging.
A text extraction library supporting PDFs, images, office documents and more.
Kreuzberg is a Python library for text extraction from documents. It provides a unified async interface for extracting text from PDFs, images, office documents, and more.
Data processing with ML, LLM and Vision LLM.
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, bank statements, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance.
Related contents:
ImageToolbox is a versatile image editing tool designed for efficient photo manipulation. It allows users to crop, apply filters, edit EXIF data, erase backgrounds, and even convert images to PDFs. Ideal for both photographers and developers, the tool offers a simple interface with powerful capabilities.
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI.
paperless-gpt seamlessly pairs with paperless-ngx to generate AI-powered document titles and tags, saving you hours of manual sorting. While other tools may offer AI chat features, paperless-gpt stands out by supercharging OCR with LLMs—ensuring high accuracy, even with tricky scans. If you’re craving next-level text extraction and effortless document organization, this is your solution.
Python tool for converting files and office documents to Markdown. MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
Related contents:
The Powerful Multi-modal LLM Family for OCR-free Document Understanding. Modularized Multimodal Large Language Model for Document Understanding.
Zero shot pdf OCR with gpt-4o-mini.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!
Extract text from any image, video, QR Code, etc.
Quickly extract text from almost any source: YouTube, screencasts, PDFs, webpages, photos, etc. Grab the image and get the text.
A post-processing tool for scanned sheets of paper.
unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
Open-Capture is the one and only 100% Open Source intelligent capture managment.
Mayan EDMS is an electronic vault for your documents. With Mayan EDMS you will never lose another document to floods, fire, theft, sabotage, fungus or decomposition. Its advanced search and categorization capabilities will help you reduce the time to find the information you need. It is free open source and integrates with your existing equipment, that means low to no initial investment, and even lower total cost of ownership, reducing operational costs has never been this easy. Being Open Source its code is freely available, allowing you to see how it is handling your documents if you ever need to, you will be glad you choose Mayan EDMS on your next audit. Initially released in 2011 and with thousands of installations worldwide, Mayan EDMS is a mature and time tested software you can rely on.