Tag: dataflow - Biapy's Bookmarks

dataflow

DLT-META

https://databrickslabs.github.io/dlt-meta/

Metadata driven Spark Declarative Pipelines framework for bronze/silver pipelines.

DLT-META is a metadata-driven framework designed to work with Lakeflow Declarative Pipelines. This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.

DLT-META @ GitHub.

Related contents:

From Chaos to Scale: Templatizing Spark Declarative Pipelines with DLT-META @ databricks.

apache-spark dataflow data-lake data-pipeline declarative metadata source-available

Added 5 months ago

Apache Beam®

https://beam.apache.org/

The Unified Apache Beam Model. The easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production workloads.

Apache Beam is a unified programming model for Batch and Streaming data processing. Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow, and Hazelcast Jet.

Beam @ GitHub.

apache beam dataflow data-science flink processing self-hosted spark workflow

Added 2 years ago