etl
Concurrent Python made simple.
Pyper is a flexible framework for concurrent and parallel data-processing, based on functional programming patterns. Used for π ETL Systems, βοΈ Data Microservices, and π Data Collection
Power Your AI with Live Data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
SQLFluff is an open source, dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.
Extract & Load with joy.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Related contents:
Your last data platform. Reliable data. 10x faster, 90% less complexity.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Bruin is a data pipeline tool that brings together data ingestion, data transformation with SQL & Python, and data quality into a single framework. It works with all the major data platforms and runs on your local machine, an EC2 instance, or GitHub Actions.
Related contents:
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage.
PeerDB is an ETL/ELT tool built for PostgreSQL. We implement multiple Postgres native and infrastructural optimizations to provide a fast, reliable and a feature-rich experience for moving data in/out of PostgreSQL.
Related contents:
Simple, Composable, Open Source ETL
Singer powers data extraction and consolidation for all of your organizationβs tools.
Open-Source Data Movement for LLMs. AI Platform. Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
Related contents:
- Apache Airflow Configuration and Tuning @ DZone.
- Exploring Apache Airflow for Batch Processing Scenario @ DZone.
- What Is Apache Airflow @ Seattle Data Guy.
- Alternatives to Talend β How To Migrate Away From Talend For Your Data Pipelines @ Seattle Data Guy.
- Improving workflow orchestration with Apache Airflow 3.1 in Cloud Composer @ Google Cloud Blog.
Magical Data Engineering Workflows.
π§ Build, run, and manage data pipelines for integrating and transforming data.
Mage is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.
Related contents: