columnar
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming languages and analytics tools.
Related contents:
File Format for Internet of Things
TsFile is a columnar storage file format designed for time series data, which supports efficient compression, high throughput of read and write, and compatibility with various frameworks, such as Spark and Flink. It is easy to integrate TsFile into IoT big data processing frameworks.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Lance is a modern columnar data format optimized for machine learning and AI applications. It efficiently handles diverse multimodal data types while providing high-performance querying and versioning capabilities.
Related contents:
1000x Faster Analytics in Postgres. Postgres-native Data Warehouse.
pg_mooncake is a Postgres extension that adds columnar storage and vectorized execution (DuckDB) for fast analytics within Postgres. Postgres + pg_mooncake ranks among the top 10 fastest in ClickBench.
Related contents:
The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Related contents:
Zero-ETL data analytics with Postgres.
Simple and cost-effective cloud analytics platform automatically synced with your data sources.
BemiDB is a Postgres read replica optimized for analytics. It consists of a single binary that seamlessly connects to a Postgres database, replicates the data in a compressed columnar format, and allows you to run complex queries using its Postgres-compatible analytical query engine.
Fast Open-Source OLAP DBMS.
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.
Related contents:
- Altinity Kubernetes Operator for ClickHouse @ GitHub.
- ClickHouse on Kubernetes @ Sr. Data Engineer.
- Inside ClickHouse full-text search: fast, native, and columnar @ ClickHouse.
- How we made ClickHouse log queries 99.5% faster with resource fingerprinting @ SigNoz.
- From Millions to Billions @ geocodio.
- The KFC Architecture Blueprint: Kafka, Flink, and ClickHouse @ Big Data Boutique.
- How we give every user SQL access to a shared ClickHouse cluster @ Trigger.dev.