Biapy's Bookmarks

Pyper

https://github.com/pyper-dev/pyper

Concurrent Python made simple.

Pyper is a flexible framework for concurrent and parallel data-processing, based on functional programming patterns. Used for 🔀 ETL Systems, ⚙️ Data Microservices, and 🌐 Data Collection

concurrency etl foss framework open-source python

Added 1 year ago

Pathway

https://pathway.com/

Power Your AI with Live Data.

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Pathway @ GitHub.

ai bsl-licensed etl framework llm rag source-available

Added 1 year ago

SQLFluff

https://www.sqlfluff.com/

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

SQLFluff is an open source, dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.

SQLFluff @ GitHub.

command-line etl formatter foss linter mit-licensed open-source sql

Added 3 months ago

Meltano

https://meltano.com/

Extract & Load with joy.

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Meltano @ GitHub.

Related contents:

Taming the Data Sources: A Scalable Extraction Strategy with Meltano @ Blueprintdata.

data-integration data-pipeline declarative etl foss mit-licensed open-source

Added 4 months ago

Bruin

https://getbruin.com/

Your last data platform. Reliable data. 10x faster, 90% less complexity.

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Bruin is a data pipeline tool that brings together data ingestion, data transformation with SQL & Python, and data quality into a single framework. It works with all the major data platforms and runs on your local machine, an EC2 instance, or GitHub Actions.

Bruin @ GitHub.

Related contents:

Digest #186: Inside the AWS Outage, Docker Compose in Production, F1 Hacks and 86,000 npm Packages Attacks @ DevOps Bulletin.

apache2-licensed data-pipeline data-transformation etl foss open-source python self-hosted sql web-app

Added 5 months ago

PeerDB

https://www.peerdb.io/

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage.

PeerDB is an ETL/ELT tool built for PostgreSQL. We implement multiple Postgres native and infrastructural optimizations to provide a fast, reliable and a feature-rich experience for moving data in/out of PostgreSQL.

PeerDB @ GitHub.

Related contents:

cdc clickhouse data-warehouse etl open-source postgresql replication

Added 1 year ago

Singer

https://www.singer.io/

Simple, Composable, Open Source ETL

Singer powers data extraction and consolidation for all of your organization’s tools.

Singer @ GitHub.

apache2-licensed data-pipeline data-transformation etl foss open-source

Added 4 months ago

Airbyte

https://airbyte.com/

Open-Source Data Movement for LLMs. AI Platform. Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes.

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Airbyte @ GitHub.

data-integration data-pipeline elastic-licensed etl llm open-source self-hosted

Added 1 year ago

Apache Airflow

https://airflow.apache.org/

Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.

Apache Airflow @ GitHub.

Related contents:

airflow apache2-licensed data-pipeline elt etl foss open-source python scheduler workflow

Added 3 years ago

Mage AI

https://www.mage.ai/

Magical Data Engineering Workflows.

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Mage is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.

Mage AI @ GitHub.

Related contents:

Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines @ Seattle Data Guy.

apache2-licensed data-pipeline elt etl foss framework open-source workflow

Added 1 year ago