format
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming languages and analytics tools.
Related contents:
A high-performance, DirectStorage-native container format for comics and manga.
Bound Book Format (.bbf) is a high-performance binary container designed specifically for digital comic books and manga. Unlike CBR/CBZ, BBF is built for DirectStorage/mmap, easy integrity checks, and mixed-codec containerization.
Related contents:
Tree Root Object Notation.
A JSON-Compatible Zero-Copy Serialization Format.
File Format for Internet of Things
TsFile is a columnar storage file format designed for time series data, which supports efficient compression, high throughput of read and write, and compatibility with various frameworks, such as Spark and Flink. It is easy to integrate TsFile into IoT big data processing frameworks.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Lance is a modern columnar data format optimized for machine learning and AI applications. It efficiently handles diverse multimodal data types while providing high-performance querying and versioning capabilities.
Related contents:
A simple, open format for guiding coding agents, used by over 20k open-source projects.
Think of AGENTS.md as a README for agents: a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project.
Related contents:
- Improve your AI code output with AGENTS.md (+ my best tips) @ builder.io.
- #118 - Les news sur le développement web et l'IA pour septembre 2025 RC1 @ Double Slash :fr:.
- Optimizing repos for AI @ Tom Bedor's Blog.
- How to write a great agents.md: Lessons from over 2,500 repositories @ GitHub Blog.
- Writing a good CLAUDE.md @ humanlayer.
- Streamlining my user-level CLAUDE.md @ chris dzombak.
- The Complete Guide to CLAUDE.md @ builder.io.
- How to build self-improving coding agents - Part 1 @ Eric J Ma's Website.
- AGENTS.md outperforms skills in our agent evals @ Vercel.
- Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? @ arXiv.
- Knowledge Priming @ martinFowler.com.
- J'ai mis deux IA en code review l'une contre l'autre — voici ce que ça donne @ Maxence Maireaux :fr:.
- Filesystems are having a moment @ madalitso.me.
- Your Docs Directory Is Doomed @ Jim Yagmin's Blog.
- Stop Wasting Hours Writing Unit Tests: Use GitHub Copilot to Explode Code Coverage Fast @ Build5Nines.
DuckLake is an integrated data lake and catalog format
DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and your SQL database. It's an open, standalone format from the DuckDB team.
DuckLake is an open Lakehouse format that is built on SQL and Parquet. DuckLake stores metadata in a catalog database, and stores data in Parquet files. The DuckLake extension allows DuckDB to directly read and write data from DuckLake.
The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Related contents:
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
FLIF is a novel lossless image format which outperforms PNG, lossless WebP, lossless BPG, lossless JPEG2000, and lossless JPEG XR in terms of compression ratio.
FLIF is a novel lossless image format which outperforms PNG, lossless WebP, lossless BPG and lossless JPEG2000 in terms of compression ratio.