Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Related contents:
superglue is an open-source server that sits as a layer between complex APIs and your application. With superglue, you always get the data that you want in the format that you expect. Fetch data from JSON and XML APIs, as well as CSV and Excel files in seconds.
Kubernetes-native platform to run massively parallel data/streaming jobs.
A Kubernetes-native, serverless platform for running scalable and reliable event-driven applications. Numaflow decouples event sources and sinks from the processing logic, allowing each component to independently auto-scale based on demand. With out-of-the-box sources and sinks, and built-in observability, developers can focus on their processing logic without worrying about event consumption, writing boilerplate code, or operational complexities. Each step of the pipeline can be written in any programming language, offering unparalleled flexibility in using the best programming language for each step and ease of using the languages you are most familiar with.
Data processing with ML, LLM and Vision LLM.
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, bank statements, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance.
Related contents:
Efficient data transformation and modeling framework that is backwards compatible with dbt.
SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can efficiently run and deploy data transformations written in SQL or Python with visibility and control at any size.
Related contents:
Generate video, images, audio with AI.
The most powerful open source node-based application for creating images, videos, and audio with GenAI.
This ui will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. For some workflow examples and see what ComfyUI can do you can check out:
Open-Source Data Movement for LLMs. AI Platform.
Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Cloud-native orchestration of data pipelines. Ship data pipelines with extraordinary velocity.
An orchestration platform for the development, production, and observation of data assets.
The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
It is designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
Department of Education (DOE) for New South Wales (AUS) data stack in a box.
With the push of one button you can have your own data stack up and running in 5 mins! .
Open-source developer platform and workflow engine. Turn scripts into auto-generated UIs, APIs and cron jobs. Compose them as workflows or data pipelines. Build complex, data-intensive apps with ease.
Write and deploy software 10x faster, and run it with the highest reliability and observability on the fastest self-hostable job orchestrator.
Open-source developer platform to power your entire infra and turn scripts into webhooks, workflows and UIs. Fastest workflow engine (13x vs Airflow). Open-source alternative to Retool and Temporal.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.