distributed
Redis/Valkey Compatible Distributed Transactional Key-Value Store.
EloqKV is a high-performance distributed database with a Redis/ValKey compatible API. It offers features like ACID transactions, full elasticity and scalability, tiered storage, and session-style transaction syntax — all while preserving Redis' simplicity and usability. EloqKV is engineered for developers who need a modern no-compromise database solution to power the next generation of demanding applications in the AI era.
Dragonfly is an open source P2P-based file distribution and image acceleration system. It is hosted by the Cloud Native Computing Foundation (CNCF) as an Incubating Level Project.
Related contents:
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
The Fire-Flyer File System (3FS) is a high-performance distributed file system designed to address the challenges of AI training and inference workloads. It leverages modern SSDs and RDMA networks to provide a shared storage layer that simplifies development of distributed applications.
Production-Grade Container Scheduling and Management.
Kubernetes, also known as K8s, is an open source system for automating deployment, scaling, and management of containerized applications.
Related contents:
- How Kubernetes Works Internally? @ System Design Codex.
- Minimum vital pour survivre sur un sujet Kubernetes @ Téotime Pacreau :fr:.
- Formation Kubernetes : Admin & Développeurs @ DevSecOps :fr:.
- How To Run Kubernetes Commands in Go: Steps and Best Practices @ The New Stack.
- Kubernetes Is Powerful, But Not Secure (at least not by default) @ Tigera.
Instant Distributed Tracing. Enterprise-Grade OpenTelemetry. Distributed tracing without code changes. 🚀 Instantly monitor any application using OpenTelemetry and eBPF.
Accelerate OpenTelemetry implementation with Odigos, an eBPF-based solution providing zero-code, zero-performance overhead for deeper tracing
Odigos is an open-source distributed tracing solution that simplifies and improves observability for Kubernetes environments and Virtual Machines. It provides instant tracing capabilities without requiring any code changes to your applications.
Stateful Computations over Data Streams.
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Related contents:
A Datacenter Scale Distributed Inference Serving Framework.
NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLang or others) and captures LLM-specific capabilities.
Related contents:
A Friendly Federated AI Framework.
A unified approach to federated learning, analytics, and evaluation. Federate any workload, any ML framework, and any programming language.
Postgre SQL Operator for Kubernetes. Run PostgreSQL. The Kubernetes way.
CloudNativePG is the Kubernetes operator that covers the full lifecycle of a highly available PostgreSQL database cluster with a primary/standby architecture, using native streaming replication.
Related contents:
A Reliable Stream Storage System. Streaming as a new software defined storage primitive.
Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.
Connect home devices into a powerful cluster to accelerate LLM inference. More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.
Supports Linux, macOS, and Windows. Optimized for ARM and x86_64 AVX2 CPUs.
Related contents:
Distributed SQLite.
LiteFS is a distributed file system that transparently replicates SQLite databases. You can run your application like it’s running against a local on-disk SQLite database but behind the scenes the database is replicated to all the nodes in your cluster. With LiteFS, you can run your database right next to your application on the edge. You can run LiteFS anywhere!
A distributed tracing system.
Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data.
If you have a trace ID in a log file, you can jump directly to it. Otherwise, you can query based on attributes such as service, operation name, tags and duration. Some interesting data will be summarized for you, such as the percentage of time spent in a service, and whether or not operations failed.
A Cloud Native Distributed Storage System.
CubeFS is a new generation cloud-native open source storage system that supports access protocols such as S3, HDFS, and POSIX. It is widely applicable in various scenarios such as big data, AI/LLMs, container platforms, separation of storage and computing for databases and middleware, data sharing and protection,etc.
Related contents:
Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. The difference is that it's decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems. We hope Raft will make consensus available to a wider audience, and that this wider audience will be able to develop a variety of higher quality consensus-based systems than are available today.
Related contents:
Tahoe-LAFS is a Free and Open decentralized cloud storage system. It distributes your data across multiple servers. Even if some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security.
Distributed SQL Databases
Fastest serverless distributed SQL database for always available applications.
Program against your datacenter like it’s a single pool of resources.
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other frameworks on a dynamically shared pool of nodes.
Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
A Coherent Software Configuration Management System.
Related contents:
Freenet is a distributed, decentralized alternative to the centralized World Wide Web, designed to unleash a new era of innovation and competition, while protecting freedom of speech and privacy.
The heart of Freenet is the Core, which runs on users' computer, smartphone, or other devices. The Core is tiny, less than 5 MB, allowing it to be installed in a matter of seconds and is compatible with a wide range of hardware.
RemoteLocal Environments to build distributed applications.
Development environment as a service. Building distributed applications isn’t complex anymore! With Kloudlite’s unified remote local environments, integrate the comfort of local coding with the power of remote environments
Kloudlite is an open-source platform designed to provide seamless and secure development environments for building distributed applications. It connects local workspaces with remote Kubernetes environments via a WireGuard network, allowing developers to access services and resources with production-level parity. With Kloudlite, there’s no need for build or deploy steps during development— With service intercepts, your changes are reflected in real time, enhancing productivity and reducing the development loop.
SaunaFS is a distributed file system.
Welcome to SaunaFS, a robust distributed POSIX file system meticulously designed to revolutionize your storage solutions by offering unmatched efficiency, security, and redundancy. At its core, SaunaFS is a distributed file system primarily written in C++, inspired by the pioneering concepts introduced by Google File System.
an open source Distributed SQL Database.
YDB is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.
Scalable. Reliable. MySQL-compatible. Cloud-native. Database.
Vitess is a database clustering system for horizontal scaling of MySQL.
Distributed Async Await. A dead simple programming model for modern applications.
Resonate's Distributed Async Await is a new programming model that simplifies coding for the cloud. It ensures code completion even if hardware or software failures occur during execution. The programming model does this with just functions and promises, making it trivial to build coordinated and reliable distributed applications.
Related contents:
the open source, distributed, transactional key-value store.
FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations. It is especially well-suited for read/write workloads but also has excellent performance for write-intensive workloads. Users interact with the database using API language binding.
Related contents:
The Distributed Task Queue for More Resilient Web Applications
Hatchet is a distributed, fault-tolerant task queue which replaces traditional message brokers and pub/sub systems - built to solve problems like concurrency, fairness, and durability.
Hatchet replaces difficult to manage legacy queues or pub/sub systems so you can design durable workloads that recover from failure and solve for problems like concurrency, fairness, and rate limiting. Instead of managing your own task queue or pub/sub system, you can use Hatchet to distribute your functions between a set of workers with minimal configuration or infrastructure:
CockroachDB is a cloud-native distributed PostgreSQL-compatible SQL database designed to build, scale, and manage modern, data-intensive applications.
Related contents:
A file manager from the future. One Explorer. All Your Files.
Unify files from all your devices and clouds into a single, easy-to-use explorer. Designed for creators, hoarders and the painfully disorganized.
Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
corrosion is a distributed system for propagating SQLite state across a cluster of nodes.
Serverless, Fault-Tolerant, Branchable Postgres.
The fully managed multi-cloud Postgres with a generous free tier. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.
Veilid is an open-source, peer-to-peer, mobile-first, networked application framework.
Veilid allows anyone to build a distributed, private app. Veilid gives users the privacy to opt out of data collection and online tracking. Veilid is being built with user experience, privacy, and safety as our top priorities. It is open source and available to everyone to use and build upon.
Veilid goes above and beyond existing privacy technologies and has the potential to completely change the way people use the Internet. Veilid has no profit motive, which puts us in a unique position to promote ideals without the compromise of capitalism.
The fully transactional, cloud-ready, distributed database.
Build flexible, distributed systems that can leverage the entire history of your critical data, not just the most current state. Build them on your existing infrastructure or jump straight to the cloud.
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
Open-Source, cloud-Native Storage for Kubernetes. Production ready management for File, Block and Object Storage.
Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for Ceph storage to natively integrate with Kubernetes.
Ceph is a distributed storage system that provides file, block and object storage and is deployed in large scale production clusters.
A distributed, reliable key-value store for the most critical data of a distributed system.
etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node. Learn more.
Related contents:
Ceph is a distributed object, block, and file storage platform.
Reliable and scalable storage designed for any organization. Use Ceph to transform your storage infrastructure. Ceph provides a unified storage service with object, block, and file interfaces from a single cluster built from commodity hardware components.
A collection of Salt files for deploying, managing and automating Ceph. The goal is to manage multiple Ceph clusters with a single salt master. At this time, only a single Ceph cluster can be managed.
CLI & Library for Sequential, Distributed, POSIX-style job queue processing
Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing, NNCP, ssh, UUCP, USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler. Filespooler is particularly suited to distributed and Asynchronous Communication.
Cloud native distributed block storage for Kubernetes. Longhorn is a distributed block storage system for Kubernetes. Longhorn is cloud native storage built using Kubernetes and container primitives.
Related contents:
MooseFS is a Fault-tolerant, Highly available, Highly performing, Scaling-out, Network distributed file system. It spreads data over several physical commodity servers, which are visible to the user as one virtual disk. It is POSIX compliant and acts like any other Unix-like file system.
GlusterFS is a scale-out network-attached storage file system. It has found applications including cloud computing, streaming media services, and content delivery networks. GlusterFS was developed originally by Gluster, Inc. and then by Red Hat, Inc., as a result of Red Hat acquiring Gluster in 2011.
Related contents:
TiDB is a distributed NewSQL database compatible with MySQL protocol TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is a distributed SQL database. Inspired by the design of Google F1, TiDB supports the best features of both traditional RDBMS and NoSQL.
Akaros® is an open source, GPL-licensed operating system for manycore architectures. Our goal is to provide support for parallel and high-performance applications and to scale to a large number of cores.
XtreemFS is a general purpose storage system and covers most storage needs in a single deployment. It is open-source, requires no special hardware or kernel modules, and can be mounted on Linux, Windows and OS X.
LizardFS is a highly reliable, scalable and efficient distributed file system. It spreads data over a number of physical servers, making it visible to an end user as a single file system.
Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments.
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.