Make clean PDF and EPUB docs from web pages
Percollate lets you turn web pages into readable PDF, EPUB, HTML, or Markdown files from the command line.
The Privacy First PDF Toolkit.
BentoPDF is a powerful, privacy-first, client-side PDF toolkit that allows you to manipulate, edit, merge, and process PDF files directly in your browser. No server-side processing is required, ensuring your files remain secure and private.
Read and extract text and other content from PDFs in C# (port of PDFBox).
PdfPig supports reading text and content from PDF files. It also supports basic PDF file creation.
Related contents:
Use AI technology to parse EPUB and PDF eBooks by chapters and generate intelligent summaries
Related contents:
Where You Read, Digest and Get Insight.
Readest is a modern, open-source ebook reader for immersive reading. Seamlessly sync your progress, notes, highlights, and library across macOS, Windows, Linux, Android, iOS, and the Web.
Related contents:
A Java PDF Library.
The Apache PDFBox® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.
Related contents:
Edit, Sign, Merge & Secure. Edit PDFs Freely & Privately, Right in Your Browser.
Breeze PDF is a powerful, free PDF editor that works entirely offline in your browser. No uploads, 100% privacy guaranteed.
Your all-knowing guide that unpacks every PDF into clear, actionable insights.
Auntie PDF is a web application that helps users extract information and insights from PDF documents. With a sassy, helpful personality, Auntie PDF makes understanding complex documents easier and more engaging.
A text extraction library supporting PDFs, images, office documents and more.
Kreuzberg is a Python library for text extraction from documents. It provides a unified async interface for extracting text from PDFs, images, office documents, and more.
Python tool for converting files and office documents to Markdown. MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
Related contents:
zathura is a highly customizable and functional document viewer based on the girara user interface library and several document libraries. It provides a minimalistic and space saving interface as well as an easy usage that mainly focuses on keyboard interaction.
Sioyek is a PDF viewer with a focus on technical books and research papers.
The Powerful Multi-modal LLM Family for OCR-free Document Understanding. Modularized Multimodal Large Language Model for Document Understanding.
A Symfony Bundle for interacting with Gotenberg. Integrates natively with twig, router, PHPStorm and more !
A PHP tool that helps you write eBooks in markdown and convert to PDF, EPUB and HTML.
Ibis Next is an open-source tool developed for ebook creators who want to focus on content creation. Ibis Next supports writing in Markdown and can generate ebooks in PDF, EPUB, or HTML format. The tool aims to simplify the ebook creation process, allowing writers to concentrate on their content while providing functionality for converting it into polished ebooks efficiently.
Extract structured data from PDFs. Stop wasting time extracting PDFs. Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.
Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.
Docling parses documents and exports them to the desired format with ease and speed. 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.
Related contents:
I, Librarian is an online service that will organize your collection of PDF papers and office documents. It provides a lot of extra features for students and research groups both in industry and academia. It is a reference manager, PDF manager and organizer focused on private group collaboration.
Zero shot pdf OCR with gpt-4o-mini.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!
The Data Processor for Agents.
Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
Create Interactive Flipbooks on our Digital Publishing Platform.
Issuu turns PDFs and other file types into digital Flipbooks and shareable content types. Upload a document, watch it transform, and enhance it with interactive features like Videos and Links. Easily share the URL, Embed it onto your website, and sell content with Digital Sales. Promote your work across all channels with Social Posts, Articles, and GIFs.
Detect and extract tables to markdown and csv.
Tabled is a small library for detecting and extracting tables. It uses surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or html.
Open Source Document Signing. Open source DocuSign alternative. Create, fill, and sign digital documents ✍️
DocuSeal is an open source platform that provides secure and efficient digital document signing and processing. Create PDF forms to have them filled and signed online on any device with an easy-to-use, mobile-optimized web tool.
Related contents:
PDF processor api & cli.
pdfcpu is a PDF processing library written in Go that supports encryption and offers both an API and a command-line interface (CLI). It is compatible with all PDF versions with basic support and ongoing improvement for PDF 2.0 (ISO-32000-2).
A post-processing tool for scanned sheets of paper.
unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
PdfDing is a selfhosted PDF manager and viewer offering a seamless user experience on multiple devices. It's designed be to be minimal, fast, and easy to set up using Docker.
PdfDing is a PDF manager and viewer that you can host yourself. It offers a seamless user experience on multiple devices. It's designed be to be minimal, fast, and easy to set up using Docker. As all data stays on your server you have full control over your data and privacy.
With its simple, intuitive and adjustable UI, PdfDing makes it easy for users to keep track of their PDFs and access them whenever they need to. With a dark mode and colored themes users can style the app to their liking. As PdfDing offers SSO support via OIDC it can be easily integrated into existing setups.
The Free & OpenSource Alternative to Docusign.
Seal the Deal, Openly. Your ultimate open source PDF E-Signature Solution. Transform the Way You Sign, Store, and Secure Your Documents. All in One Place - All for Free.
Free web software for signing, organizing, editing metadatas or compressing PDFs.
The leading HTML5 client solution for generating PDFs. Transform your PDF generation process for your event tickets, reports, certificates, and more.
Client-side JavaScript PDF generation for everyone.
mPDF is a PHP library which generates PDF files from UTF-8 encoded HTML.
It is based on FPDF and HTML2FPDF with a number of enhancements.
Symfony bundle to generate PDFs with headless Chrome using chrome-php/chrome.
The ChromePdfBundle is a Symfony bundle that leverages the chrome-php/chrome project to render HTML and save the output as a PDF file.
dompdf is an HTML to PDF converter.
At its heart, dompdf is (mostly) a CSS 2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes.
PrintCSS CSS Paged Media tutorial and information.
This tutorial shows how to generate PDF documents from XML/HTML using the "CSS Paged Media" approach, whereby the complete styling and layout information is kept in cascading stylesheets (CSS). It will also show the results produced by different tools with identical data, providing an impression of functionality and output quality.
Translate and communicate with ease
Spend less time translating and more time on the task at hand. No matter what or where you're translating, DeepL Pro ensures it's accurate, secure, and tailored to your needs.
locally hosted web application that allows you to perform various operations on PDF files.
This is a powerful locally hosted web based PDF manipulation tool using docker that allows you to perform various operations on PDF files, such as splitting merging, converting, reorganizing, adding images, rotating, compressing, and more. This locally hosted web application started as a 100% ChatGPT-made application and has evolved to include a wide range of features to handle all your PDF needs.
Compose theses faster. Focus on your text and let Typst take care of layout and formatting. A new markup-based typesetting system that is powerful and easy to learn. Typst is a new markup-based typesetting system that is designed to be as powerful as LaTeX while being much easier to learn and use.
Related contents:
Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
Related contents:
Over 3200 pixel-perfect icons for web design. Free and open source icons designed to make your website or app attractive, visually consistent and simply beautiful.
Online PDF tools for PDF lovers. Every tool you need to work with PDFs in one place. Every tool you need to use PDFs, at your fingertips. All are 100% FREE and easy to use! Merge, split, compress, convert, rotate, unlock and watermark PDFs with just a few clicks.
Free PDF, Video, Image & Other Online Tools. We offer PDF, video, image and other online tools to make your life easier.
The Awesome Document Factory.
WeasyPrint is a smart solution helping web developers to create PDF documents. It’s free and open source software that can be easily plugged to your applications and websites and turns simple HTML pages into gorgeous:
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into reports, invoices or tickets.
KOReader is a document viewer for E Ink devices. Supported fileformats include EPUB, PDF, DjVu, XPS, CBT, CBZ, FB2, PDB, TXT, HTML, RTF, CHM, DOC, MOBI and ZIP files. It’s available for Kindle, Kobo, PocketBook, Android and desktop Linux.
Related contents:
A Docker-powered stateless API for PDF files. Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more.
Speaker Deck is the best way to share presentations online. Simply upload your slides as a PDF, and we’ll turn them into a beautiful online experience. View them on SpeakerDeck.com, or share them on any website with an embed code.
View and convert your PDFs into interactive web publications that work on any device (formerly FlexPaper).