Extract structured data from PDFs.
Stop wasting time extracting PDFs.
Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.
Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.
Docling parses documents and exports them to the desired format with ease and speed.
🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.
The Data Processor for Agents.
Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
A tiny, permissive CSS selector parser.
Langium is an open source language engineering tool with first-class support for the Language Server Protocol, written in TypeScript and running in Node.js.
Parser Building Toolkit for JavaScript.
Chevrotain is a blazing fast and feature rich Parser Building Toolkit for JavaScript with built-in support for LL(K). Grammars and 3rd party plugin for LL(*) grammars. It can be used to build parsers/compilers/interpreters for various use cases ranging from simple configuration files, to full-fledged programing languages.