Biapy's Bookmarks

WorkKit

https://github.com/6over3/WorkKit

A Swift package for parsing iWork Keynote, Pages, and Numbers documents.

A Swift package for parsing and extracting content from Apple iWork documents (Pages, Numbers, and Keynote). WorkKit provides a straightforward API to open iWork documents and traverse their content.

Related contents:

Reverse Engineering iWork @ ./make.

agpl3-licensed apple development foss open-source parser swift

Added 1 week ago

jsonriver

https://rictic.github.io/jsonriver/

a streaming JSON parser.

jsonriver is a simple JS library that will parse JSON incrementally as it streams in, e.g. from a network request or a language model. It gives you a sequence of increasingly complete values.

jsonriver @ GitHub.

bsd3-licensed data-stream development foss javascript json library open-source parser

Added 1 week ago

Kong

https://github.com/alecthomas/kong

Kong is a command-line parser for Go.

Kong aims to support arbitrarily complex command-line structures with as little developer effort as possible.

To achieve that, command-lines are expressed as Go types, with the structure and tags directing how the command line is mapped onto the struct.

command-line development foss golang library mit-licensed open-source parser

Added 2 weeks ago

Feedsmith

https://feedsmith.dev/

Fast, all‑in‑one JavaScript parser and generator for RSS, Atom, RDF, and JSON Feed, with support for popular namespaces and OPML files. Fast, all-in-one parser and generator for RSS, Atom, RDF, and JSON Feed, with support for Podcast, iTunes, Dublin Core, and OPML files.

Feedsmith offers universal and format‑specific parsers that maintain the original feed structure in a clean, object-oriented format while intelligently normalizing legacy elements. Access all feed data without compromising simplicity.

Feedsmith @ GitHub.

Related contents:

#119: Les news sur le développement web et l'IA pour septembre 2025 RC2 @ Double Slash :fr:.

atom development dublin-core foss generator itunes javascript json library mit-licensed open-source opml parser rdf rss typescript

Added 3 weeks ago

SQLGlot

https://sqlglot.com/sqlglot.html

Python SQL Parser and Transpiler.

SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 31 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.

SQLGlot @ GitHub.

bigquery duckdb foss mit-licensed open-source parser prestodb python snowflake spark sql transpiler trino

Added 3 weeks ago

sj.h

https://github.com/rxi/sj.h?utm_source=tldrwebdev

A tiny little JSON parsing library

c foss json library open-source parser unlicense-licensed

Added 1 month ago

Swift Syntax

https://github.com/swiftlang/swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.

The swift-syntax package is a set of libraries that work on a source-accurate tree representation of Swift source code, called the SwiftSyntax tree. The SwiftSyntax tree forms the backbone of Swift’s macro system – the macro expansion nodes are represented as SwiftSyntax nodes and a macro generates a SwiftSyntax tree to be inserted into the source file.

Related contents:

Bitrig’s Swift Interpreter: Building an interpreter for Swift in Swift @ Bitrig.

apache2-licensed foss library open-source parser swift

Added 1 month ago

hyperpb

https://github.com/bufbuild/hyperpb-go

10x faster dynamic Protobuf parsing in Go that’s even 3x faster than generated code.

hyperpb is a highly optimized dynamic message library for Protobuf or read-only workloads. It is designed to be a drop-in replacement for dynamicpb, protobuf-go's canonical solution for working with completely dynamic messages.

Related contents:

apache2-licensed foss golang open-source optimization parser protobuf

Added 3 months ago

OSV

https://github.com/njaremko/osv

OSV is a high-performance CSV parser for Ruby, implemented in Rust. It wraps BurntSushi's excellent csv-rs crate.

It provides a simple interface for reading CSV files with support for both hash-based and array-based row formats.

The array-based mode is faster than the hash-based mode, so if you don't need the hash keys, use the array-based mode.

csv development foss library mit-licensed open-source parser ruby

Added 4 months ago

Herb

https://herb-tools.dev/

HTML-aware ERB parsing. Powerful and seamless HTML-aware ERB parsing and tooling.

Next-generation HTML+ERB parsing for smarter developer tooling and more. Herb is an HTML-aware Embedded Ruby parsing tool built on Prism, Ruby's official parser.

Herb @ GitHub.

development embedded erb foss html library mit-licensed open-source parser ruby

Added 6 months ago

MD4C

https://github.com/mity/md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.

Related contents:

Why Is This Site Built With C @ Marcelo Fernandes.

c development foss library markdown mit-licensed open-source parser

Added 6 months ago

The simdjson library

https://simdjson.org/

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

JSON is everywhere on the Internet. Servers spend a lot of time parsing it. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to break speed records.

simdjson @ GitHub.

apache2-licensed dual-license json mit-licensed parser performance

Added 7 months ago

jsonparser

https://github.com/Krish120003/jsonparser/

JSON parser creating Rust objects in-memory.

Related contents:

Parsing JSON in 500 lines of Rust @ Krish's Blog.

development foss json library open-source parser rust

Added 8 months ago

Sparrow

https://sparrow.katanaml.io/

Data processing with ML, LLM and Vision LLM.

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, bank statements, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance.

Sparrow @ GitHub.

Related contents:

Sparrow - Pour extraire des données avec l'IA @ Korben :fr:.

ai data-pipeline foss llm ocr open-source parser self-hosted web-app

Added 8 months ago

MarkItDown

https://github.com/microsoft/markitdown

Python tool for converting files and office documents to Markdown. MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).

Related contents:

command-line exif foss llm microsoft-office ocr open-source parser pdf python rag

Added 9 months ago

OmniParse

https://omniparse.cognitivelab.in/

Convert Anything into Structured Actionable Data.

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks.

OmniParse is a platform that ingests and parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether you are working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured, and ready for AI applications such as RAG, fine-tuning, and more

OmniParse @ GitHub.

data-science foss genai llm open-source parser rag

Added 9 months ago

Documind

https://www.documind.xyz/

Extract structured data from PDFs. Stop wasting time extracting PDFs. Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.

Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.

Documind @ GitHub.

foss llm machine-learning open-source parser pdf rag

Added 11 months ago

MinerU

https://mineru.readthedocs.io/en/latest/

MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.

MinerU @ GitHub.

converter json llm markdown open-source parser pdf rag

Added 11 months ago

JSON Parsing Test Suite

https://github.com/nst/JSONTestSuite/

A comprehensive test suite for RFC 8259 compliant JSON parsers

foss json open-source parser unit-testing

Added 11 months ago

json.cpp

https://github.com/jart/json.cpp

JSON for Classic C++.

json.cpp is a baroque JSON parsing / serialization library for C++.

c++ development foss json library open-source parser

Added 11 months ago

Docling

https://ds4sd.github.io/docling/

Docling parses documents and exports them to the desired format with ease and speed. 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.

Docling @ GitHub.

Related contents:

Docling - Pour convertir vos documents sans prise de tête @ Korben :fr:.

asciidoc data-mining data-science docx foss html llm markdown open-source parser pdf pptx python rag

Added 11 months ago

Marly AI

https://www.marly.ai/

The Data Processor for Agents.

Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).

Marly @ GitHub.

data-science foss llm open-source parser pdf python

Added 11 months ago

OmniParser

https://microsoft.github.io/OmniParser/

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

OmniParser @ GitHub.

ai microsoft open-source parser python screenshot

Added 1 year ago

Tabled

https://github.com/VikParuchuri/tabled

Detect and extract tables to markdown and csv.

Tabled is a small library for detecting and extracting tables. It uses surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or html.

data-science foss library machine-learning open-source parser pdf python

Added 1 year ago

Ohm

https://ohmjs.org/

A library and language for building parsers, interpreters, compilers, etc.

Ohm is a parsing toolkit consisting of a library and a domain-specific language. You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages.

Ohm @ GitHub.

development javascript library open-source parser typescript

Added 1 year ago

Parsel

https://parsel.verou.me/

A tiny, permissive CSS selector parser.

Parsel @ GitHub.

css development javascript open-source parser typescript

Added 1 year ago

Langium

https://langium.org/

Langium is an open source language engineering tool with first-class support for the Language Server Protocol, written in TypeScript and running in Node.js.

Langium @ GitHub.

development grammar javascript language language-server-protocol lsp open-source parser

Added 2 years ago

Chevrotain

https://chevrotain.io/docs/

Parser Building Toolkit for JavaScript.

Chevrotain is a blazing fast and feature rich Parser Building Toolkit for JavaScript with built-in support for LL(K). Grammars and 3rd party plugin for LL(*) grammars. It can be used to build parsers/compilers/interpreters for various use cases ranging from simple configuration files, to full-fledged programing languages.

Chevrotain @ GitHub.

development grammar javascript language open-source parser toolkit

Added 2 years ago

snarkdown

https://github.com/developit/snarkdown

:smirk_cat: A snarky 1kb Markdown parser written in JavaScript. Snarkdown is a dead simple 1kb Markdown parser.

It's designed to be as minimal as possible, for constrained use-cases where a full Markdown parser would be inappropriate.

development javascript library markdown open-source parser web-design

Added 2 years ago