Stateful Computations over Data Streams.
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Related contents:
The fastest Postgres change data capture.
Stream data from Postgres directly to Kafka, Redis, and more. Replace complex tools like Debezium and consolidate workflows.
Related contents:
This is Maxwell's daemon, a change data capture application that reads MySQL binlogs and writes data changes as JSON to Kafka, Kinesis, and other streaming platforms.
Related contents:
Migrate Databases and Stream Data Between Database Systems.
DBConvert Streams (DBS) is a cutting-edge distributed platform designed for data migration between heterogeneous databases and real-time data replication. It simplifies the process of transferring data between on-premises or cloud databases, including relational databases and data warehouses.
Related contents:
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Related contents:
Cloud-native stream processing. Distributed stream processing engine in Rust.
Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results.
Scale from zero to millions of events per second.
Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
Related contents:
A Reliable Stream Storage System. Streaming as a new software defined storage primitive.
Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.
Kubernetes-native platform to run massively parallel data/streaming jobs.
A Kubernetes-native, serverless platform for running scalable and reliable event-driven applications. Numaflow decouples event sources and sinks from the processing logic, allowing each component to independently auto-scale based on demand. With out-of-the-box sources and sinks, and built-in observability, developers can focus on their processing logic without worrying about event consumption, writing boilerplate code, or operational complexities. Each step of the pipeline can be written in any programming language, offering unparalleled flexibility in using the best programming language for each step and ease of using the languages you are most familiar with.
Fancy stream processing made operationally mundane.
Bento is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.
It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck.
Related contents:
Incremental Data Processing in PostgreSQL.
pg_incremental is a simple extension that helps you do fast, reliable, incremental batch processing in PostgreSQL.
When storing an append-only stream of event data in PostgreSQL (e.g. IoT, time series), a common challenge is to process only the new data. For instance, you might want to create one or more summary tables containing pre-aggregated data, and insert or update aggregates as new data arrives. However, you cannot really know the data that is still being inserted by concurrent transactions, and immediately aggregating data when inserting (e.g. via triggers) is certain to create a concurrency bottleneck. You also want to make sure that all new events are processed successfully exactly once, even when queries fail.
Debezium is an open source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.
Related contents:
The best way of working with Protocol Buffers. Elastic, self-hosted Kafka with Advanced Semantic Intelligence
Guarantee streaming data quality and slash cloud costs 10x with Bufstream, a drop-in replacement for Apache Kafka.
Bufstream is a Kafka-compatible streaming system which stores records directly in an object storage service like S3.
Cloud-native orchestration of data pipelines. Ship data pipelines with extraordinary velocity.
An orchestration platform for the development, production, and observation of data assets.
The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
It is designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
Department of Education (DOE) for New South Wales (AUS) data stack in a box.
With the push of one button you can have your own data stack up and running in 5 mins! .
Stream, transform, and route PostgreSQL data in real-time.
The easiest way to move and transform data between PostgreSQL databases using Logical Replication.
pg_flo leverages PostgreSQL's logical replication system to capture and stream data changes. It uses NATS as a message broker to decouple reading from the WAL through the replicator and worker processes, providing flexibility and scalability. Transformations and filtrations are applied before the data reaches the destination.
Open-source framework for building asynchronous web services that interact with event streams.
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
Open source analytics infrastructure. Fast and scalable. No bloat. GDPR compliant.
A single production-ready Docker image built on ClickHouse, Kafka, and Node.js for tracking events, users, page views, and interactions.