observability
tools and framework for data collection and system inspection on Kubernetes clusters and Linux hosts using eBPF.
Inspektor Gadget is a set of tools and framework for data collection and system inspection on Kubernetes clusters and Linux hosts using eBPF.
It manages the packaging, deployment and execution of Gadgets (eBPF programs encapsulated in OCI images) and provides mechanisms to customize and extend Gadget functionality.
Related contents:
AI-Powered Debugging & Development Monitoring.
Captures your web app's complete development timeline - server logs, browser events, console messages, network requests, and automatic screenshots - in a unified, timestamped feed for AI debugging.
Logs, Traces, Metrics, Session Replay, Exceptions. The only tool you need to know what is happening and how to fix it.
Traceway is an OpenTelemetry-native observability platform that combines logs, traces, metrics, session replay/RUM, exceptions, and AI tracing together. Point an OTLP exporter at it and you're in business. No Collector, no glue code, no per-language vendor SDK.
Related contents:
Zinalog gives teams the essential pieces of a logging platform without the complexity of a full observability suite.
A lightweight, self-hosted logging server with a web dashboard without the complexity of ELK or Grafana.
Real-time observability of claude code sessions & multi-agents. Includes powerful filtering, searching, and visualization of multi-agent sessions.
AI-Native Infrastructure Monitoring. Argus uses AI to detect anomalies, investigate incidents, and resolve issues — before your users notice.
AI-native observability for production systems. Automatically understands your application from runtime data and logs. No dashboards. No manual configuration.
Argus is an open-source observability platform with a built-in AI agent that monitors your infrastructure, investigates anomalies autonomously, and proposes fixes — all through a chat interface. Think Datadog + ChatGPT, self-hosted and under your control.
Get Better at Getting Better.
DORA is the largest and longest running research program of its kind, that seeks to understand the capabilities that drive software delivery and operations performance. DORA helps teams apply those capabilities, leading to better organizational performance.
Related contents:
Kubernetes usage analytics for CPU, Memory, and GPU — track costs and optimize cluster resources.
kube-opex-analytics is a Kubernetes usage accounting and analytics tool that helps organizations track CPU, Memory, and GPU resources consumed by their clusters over time (hourly, daily, monthly).
🛡️ Privacy-first, Agnostic Telemetry for Self-Hosted Software.
Collect usage stats, verify active instances, and understand your user base without spying on them.
Related contents:
Easily collect and report PostgreSQL metrics for scripting, automation and troubleshooting.
pgmetrics is an open-source, zero-dependency, single-binary tool that can collect 350+ metrics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON and CSV for scripting.
Related contents:
Privacy-first, Agnostic Telemetry for Self-Hosted Software. Collect usage stats, verify active instances, and understand your user base without spying on them.
Laravel Pulse is a real-time application performance monitoring tool and dashboard for your Laravel application.
Related contents:
CISA’s LME provides a free, easy-to-deploy log management solution. It includes real-time threat alerts, customizable dashboards, and community collaboration on GitHub, helping small to medium-sized organizations improve their cybersecurity.
CISA's Logging Made Easy (LME) is a no cost, open source platform that centralizes log collection, enhances threat detection, and enables real-time alerting, helping small to medium-sized organizations secure their infrastructure. Whether you're upgrading from a previous version or deploying for the first time, LME offers a scalable, efficient solution for logging and endpoint security.
Related contents:
This repository provides eBPF instrumentation based on the OpenTelemetry standard. It provides a lightweight and efficient way to collect telemetry data using eBPF for user-space applications.
OpenTelemetry e-BPF Instrumentation is commonly referred to as OBI.
Related contents:
a python utility for receiving gitlab webhook events and sending them as log event to grafana loki/cloud. Ideally suited to running in serverless environments like lambda, cloud functions etc.
Related contents:
A scalable, fault-tolerant, and low-latency storage service optimized for real-time append-only workloads.
Related contents:
Telemetry Harbor OSS is the open-source ingestion and visualization stack behind Telemetry Harbor. Self-host your own telemetry backend with full control over your data and infrastructure.
Related contents:
Open Source Continuous Profiling Platform. Debug performance issues down to a single line of code.
Grafana Pyroscope is a continuous profiling platform designed to surface performance insights from your applications, helping you optimize resource usage such as CPU, memory, and I/O operations. With Pyroscope, you can both proactively and reactively address performance bottlenecks across your system.
Related contents:
High Performance, Resource Efficient OpenTelemetry Collection.
Rotel provides an efficient, high-performance solution for collecting, processing, and exporting telemetry data. Rotel is ideal for resource-constrained environments and applications where minimizing overhead is critical.
All-in-One Observability Platform.
Coroot is an open-source APM & Observability tool, a DataDog and NewRelic alternative. Metrics, logs, traces, continuous profiling, and SLO-based alerting, supercharged with predefined dashboards and inspections.
Cloud native networking and network security.
Calico is a single platform for networking, network security, and observability for any Kubernetes distribution in the cloud, on-premises, or at the edge. Whether you're just starting with Kubernetes or operating at scale, Calico's open source, enterprise, and cloud editions provide the networking, security, and observability you need.
Related contents:
A prometheus exporter for PHP-FPM.
The exporter connects directly to PHP-FPM and exports the metrics via HTTP.
Related contents:
eBPF-based Security Observability and Runtime Enforcement.
Tetragon is a flexible Kubernetes-aware security observability and runtime enforcement tool that applies policy and filtering directly with eBPF, allowing for reduced observation overhead, tracking of any process, and real-time enforcement of policies.
Related contents:
The Single Database for Big Observability. Fast, Efficient, Single Database for Real-Time Observability. The real-time, cloud-native observability database for metrics, logs, and traces, providing sub-second insights from edge to cloud—at any scale.
Related contents:
Scalable, Open Source, Logs DB & Logging Solution.
Related contents:
Transform Al Prototypes into Enterprise-Grade Products.
Langtrace is an Open Source Observability and Evaluations Platform for Al Agents.
Related contents:
Get your app Performance score. Flashlight is a Lighthouse-like tool for mobile apps. No installation required.
📱⚡️ Lighthouse for Mobile - audits your app and gives a performance score to your Android apps (native, React Native, Flutter..). Measure performance on CLI, E2E tests, CI...
Related contents:
Dynamic Tracing for Linux.
bpftrace is a high-level tracing language for Linux and provides a quick and easy way for people to write observability-based eBPF programs, especially those unfamiliar with the complexities of eBPF.
Related contents:
Dynamically program the kernel for efficient networking, observability, tracing, and security.
eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.
Related contents:
VictoriaLogs is open source user-friendly database for logs from VictoriaMetrics.
Related contents:
Scraparr is a Prometheus Exporter for various components of the *arr Suite
Self-hosted Error Tracking.
Bugsink offers real-time error tracking for your applications with full control through self-hosting.
Parseable is a disk less, cloud native database for logs, observability, security, and compliance. Parseable is built with focus on simplicity & resource efficiency.
Dashboards for DevOps.
Visualize cloud configurations. Assess security posture against a massive library of benchmarks. Build custom dashboards with code.
Related contents:
OpenTelemetry-native GenAI and LLM Application Observability.
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
Related contents:
Open-source observability for your LLM application, based on OpenTelemetry.
OpenLLMetry is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application. Because it uses OpenTelemetry under the hood, it can be connected to your existing observability solutions - Datadog, Honeycomb, and others.
A distributed tracing system.
Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data.
If you have a trace ID in a log file, you can jump directly to it. Otherwise, you can query based on attributes such as service, operation name, tags and duration. Some interesting data will be summarized for you, such as the percentage of time spent in a service, and whether or not operations failed.
Related contents:
🔥 Airbroke: Lightweight, Airbrake-compatible, PostgreSQL-based Open Source Error Catcher. Self-hosted, Cost-effective, Open Source Error Tracking for a Sustainable Startup Journey.
Open Source Metrics Engine. Distributed TSDB and Query Engine, Prometheus Sidecar, Metrics Aggregator, and more such as Graphite storage and query engine.
M3 is a Prometheus compatible, easy to adopt metrics engine that provides visibility for some of the world’s largest brands.
Empower your testing with AI & usage insights.
Gravity monitors real-world user behaviors and usage patterns in live production and test environments to generate quality analytics, identify test coverage gaps, and assist in prioritizing and generating test cases.
App Monitoring, Error Tracking & Real User Monitoring. Application insights your developers need without the noise. Data means nothing without context. Get the full picture with secure, scalable error tracking and performance monitoring.
This is an OpenTelemetry auto-instrumentation package for Symfony framework applications.
OpenTelemetry Tail Sampling Configuration UI.
OTail is a user-friendly web interface for creating and managing OpenTelemetry tail sampling processor configurations. It provides a visual way to configure complex sampling policies without having to write YAML directly.
Tools to measure and visualize energy use on desktop computers.
Kubernetes Monitoring, Application Debug Platform. Instant Kubernetes-Native Application Observability.
Pixie is an open-source observability tool for Kubernetes applications. Use Pixie to view the high-level state of your cluster (service maps, cluster resources, application traffic) and also drill down into more detailed views (pod state, flame graphs, individual full-body application requests).
Batteries included UI to monitor your Messenger workers, transports, schedules, and messages.
Prometheus exporter for AWS CloudWatch - Discovers services through AWS tags, gets CloudWatch metrics data and provides them as Prometheus metrics with AWS tags as labels.
PoWA is a PostgreSQL Workload Analyzer that gathers performance stats and provides real-time charts and graphs to help monitor and tune your PostgreSQL servers.
pg_activity is a top like application for PostgreSQL server activity
monitoring.
PostgreSQL Remote Control.
temBoard is a powerful management tool for PostgreSQL. It allows to observe, optimize, or configure PostgreSQL instances.
eks-node-viewer is a tool for visualizing dynamic node usage within a cluster. It was originally developed as an internal tool at AWS for demonstrating consolidation with Karpenter. It displays the scheduled pod resource requests vs the allocatable capacity on the node. It does not look at the actual pod resource usage.
Data and AI reliability. Delivered.
Data breaks. Monte Carlo ensures your team is the first to know and solve with end-to-end data observability.
Status Page On Demand. ⛑ Automated developer-oriented status page. The automated status page that you deserve.
If your infrastructure went down right now, how long would it take for you to know?
Gatus is a developer-oriented health dashboard that gives you the ability to monitor your services using HTTP, ICMP, TCP, and even DNS queries as well as evaluate the result of said queries by using a list of conditions on values like the status code, the response time, the certificate expiration, the body and many others. The icing on top is that each of these health checks can be paired with alerting via Slack, Teams, PagerDuty, Discord, Twilio and many more.
Related contents:
Cloud-native orchestration of data pipelines. Ship data pipelines with extraordinary velocity. An orchestration platform for the development, production, and observation of data assets.
The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
It is designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
An 'Observe and Report Buddy' for your SRE toolbox.
Green Orb is a lightweight monitoring tool that enhances your application's reliability by observing its console output for specific patterns and executing predefined actions in response. Designed to integrate seamlessly, it's deployed as a single executable binary that runs your application as a subprocess, where it can monitor all console output, making it particularly useful in containerized environments. Green Orb acts as a proactive assistant, handling essential monitoring tasks and enabling SREs to automate responses to critical system events effectively.
Manage your Observability Systems. Command Line utility for managing Grafana Resources.
Software engineers know how to version and deploy their resources. Tools like Git or CI enable reliable workflows that track changes, with meaningful review processes giving confidence in the expected outcomes. Now, with Grizzly, you can have all this with Grafana resources, dashboards, datasources and more.
Network Analysis & Packet Capture. It's amazing what you discover when you start looking.
Arkime is an open source, large scale, full packet capturing, indexing, and database system.
Open-Source ML Monitoring and LLM Observability.
Open-source evaluation and observability for ML and LLM systems Evaluate, test, and monitor AI-powered systems. From tabular data to LLMs. Built for data scientists, AI, and ML engineers.
An open source, real-time monitoring tool with custom-monitor and agentLess.
Apache HertzBeat is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
Intelligent Prompt Gateway.
Arch is an intelligent prompt gateway. Engineered with (fast) LLMs for the secure handling, robust observability, and seamless integration of prompts with APIs - all outside business logic. Built by the core contributors of Envoy proxy, on Envoy.
Arch is an intelligent Layer 7 gateway designed to protect, observe, and personalize LLM applications (agents, assistants, co-pilots) with your APIs.