Your all-knowing guide that unpacks every PDF into clear, actionable insights.
Auntie PDF is a web application that helps users extract information and insights from PDF documents. With a sassy, helpful personality, Auntie PDF makes understanding complex documents easier and more engaging.
Python tool for converting files and office documents to Markdown.
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
Related contents:
zathura is a highly customizable and functional document viewer based on the girara user interface library and several document libraries.
It provides a minimalistic and space saving interface as well as an easy usage that mainly focuses on keyboard interaction.
Sioyek is a PDF viewer with a focus on technical books and research papers.
A Symfony Bundle for interacting with Gotenberg. Integrates natively with twig, router, PHPStorm and more !
A PHP tool that helps you write eBooks in markdown and convert to PDF, EPUB and HTML.
Ibis Next is an open-source tool developed for ebook creators who want to focus on content creation. Ibis Next supports writing in Markdown and can generate ebooks in PDF, EPUB, or HTML format. The tool aims to simplify the ebook creation process, allowing writers to concentrate on their content while providing functionality for converting it into polished ebooks efficiently.
Extract structured data from PDFs.
Stop wasting time extracting PDFs.
Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.
Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.
Docling parses documents and exports them to the desired format with ease and speed.
Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.
Related contents:
I, Librarian is an online service that will organize your collection of PDF papers and office documents. It provides a lot of extra features for students and research groups both in industry and academia. It is a reference manager, PDF manager and organizer focused on private group collaboration.
The Data Processor for Agents.
Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
Create Interactive Flipbooks on our Digital Publishing Platform.
Issuu turns PDFs and other file types into digital Flipbooks and shareable content types. Upload a document, watch it transform, and enhance it with interactive features like Videos and Links. Easily share the URL, Embed it onto your website, and sell content with Digital Sales. Promote your work across all channels with Social Posts, Articles, and GIFs.
Open Source Document Signing. Open source DocuSign alternative. Create, fill, and sign digital documents
DocuSeal is an open source platform that provides secure and efficient digital document signing and processing. Create PDF forms to have them filled and signed online on any device with an easy-to-use, mobile-optimized web tool.
Related contents:
PDF processor api & cli.
pdfcpu is a PDF processing library written in Go that supports encryption and offers both an API and a command-line interface (CLI). It is compatible with all PDF versions with basic support and ongoing improvement for PDF 2.0 (ISO-32000-2).
A post-processing tool for scanned sheets of paper.
unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.