Tag: data-lake - Biapy's Bookmarks

data-lake

Bauplan

https://www.bauplanlabs.com/

Your data lakehouse, built like software.

Bauplan is a cloud-native lakehouse platform for engineering teams who treat data like software. Ship pipelines without managing infrastructure, using a specialized Python runtime, Git-for-Data built on Apache Iceberg, and just a few simple APIs.

Related contents:

Bauplan: Operate your lakehouse with zero infrastructure @ Data Engineer Things.

apache-iceberg commercial data-lake lakehouse web-service

Added 1 month ago

DuckLake

https://ducklake.select/

DuckLake is an integrated data lake and catalog format

DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and your SQL database. It's an open, standalone format from the DuckDB team.

DuckLake is an open Lakehouse format that is built on SQL and Parquet. DuckLake stores metadata in a catalog database, and stores data in Parquet files. The DuckLake extension allows DuckDB to directly read and write data from DuckLake.

Ducklake @ GitHub.

data-lake duckdb format foss lakehouse mit-licensed open-source parquet sql

Added 4 months ago

BigQuery

http://BigQuery

AI data platform.

From data warehouse to autonomous data and AI platform

BigQuery is the autonomous data to AI platform, automating the entire data life cycle, from ingestion to AI-driven insights, so you can go from data to AI to action faster.

Gemini in BigQuery features are now included in BigQuery pricing models.

Related contents:

BigQuery’s Ridiculous Pricing Model Cost Us $10,000 in Just 22 Seconds!!! @ Data Engineer Things.

big-data bigquery cloud commercial data-lake gcp lakehouse

Added 5 months ago

OLake

https://olake.io/

Fastest way to Replicate your Database data in Data Lake. OLake makes data replication faster by parallelizing full loads, leveraging change streams for real-time sync, and pulling data in a database-native format for efficient ingestion.

Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL

OLake @ GitHub.

Related contents:

Change Data Capture Tools @ Dev Genius' Medium.

apache2-licensed apache-iceberg cdc database data-lake foss lakehouse mongodb mysql open-source postgresql replication

Added 7 months ago

Apache Gravitino

https://gravitino.apache.org/

A unified metadata lake across all your sources, formats, cloud providers, and regions in a federated architecture. World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.

Apache Gravitino @ GitHub.

big-data data-lake foss metadata open-source self-hosted

Added 7 months ago

Apache Hudi

https://hudi.apache.org/

An Open Source Data Lake Platform.

Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.

big-data data-lake foss open-source

Added 1 year ago