etl
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
SQLFluff is an open source, dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.
Simple, Composable, Open Source ETL
Singer powers data extraction and consolidation for all of your organization’s tools.
Extract & Load with joy.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Related contents:
Your last data platform. Reliable data. 10x faster, 90% less complexity.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Bruin is a data pipeline tool that brings together data ingestion, data transformation with SQL & Python, and data quality into a single framework. It works with all the major data platforms and runs on your local machine, an EC2 instance, or GitHub Actions.
Related contents:
Magical Data Engineering Workflows.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Mage is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.
Related contents:
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage.
PeerDB is an ETL/ELT tool built for PostgreSQL. We implement multiple Postgres native and infrastructural optimizations to provide a fast, reliable and a feature-rich experience for moving data in/out of PostgreSQL.
Related contents:
Concurrent Python made simple.
Pyper is a flexible framework for concurrent and parallel data-processing, based on functional programming patterns. Used for 🔀 ETL Systems, ⚙️ Data Microservices, and 🌐 Data Collection
Open-Source Data Movement for LLMs. AI Platform. Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Power Your AI with Live Data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
Related contents:
- Apache Airflow Configuration and Tuning @ DZone.
- Exploring Apache Airflow for Batch Processing Scenario @ DZone.
- What Is Apache Airflow @ Seattle Data Guy.
- Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines @ Seattle Data Guy.
- Improving workflow orchestration with Apache Airflow 3.1 in Cloud Composer @ Google Cloud Blog.