llmops
ARK extends Kubernetes with custom resources that make agents, teams, MCP tools, and workflows first-class citizens in your cluster.
Provider-agnostic operations for agentic resources. ARK codifies patterns and practices developed across dozens of agentic application projects.
Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙 Paddler is an open-source LLMOps platform that lets teams run inference and deploy LLMs on their own infrastructure.
Build compliant AI chat agents, in minutes. LLM agents built for control. Designed for real-world use. Deployed in minutes.
Parlant gives you all the structure you need to build customer-facing agents that behave exactly as your business requires
Cut Code Review Time & Bugs in Half. Instantly.
Supercharge your team to ship faster with the most advanced AI code reviews.
Related contents:
Exploring LLM-powered automation in platform-based software collaboration.
Related contents:
simplify and secure MCP servers. ToolHive makes deploying MCP servers easy, secure and fun.
Run any Model Context Protocol (MCP) server — securely, instantly, anywhere.
ToolHive is the easiest way to discover, deploy, and manage MCP servers. Launch any MCP server in a locked-down container with a single command. No manual setup, no security headaches, no runtime hassles.
Standardized Serverless ML Inference Platform on Kubernetes. Highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI.
KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, Huggingface Transformer/LLM models using standardized data plane protocols.
open-source LLM infrastructure.
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
A self-contained, lightweight and OOB research platform for modern ML.
Boson is a lightweight, fully containerized, and feature-rich machine learning research platform. It centralizes essential tools to help teams keep projects lean, organized, and reproducible—while reducing overhead and boosting productivity. Think Databricks/Sagemaker but local and free.
Boson enables engineers and researchers to iterate faster without getting bogged down by infrastructure or tooling complexity.
Principles for building reliable LLM applications.
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
An open protocol enabling communication and interoperability between opaque agentic applications.
One of the biggest challenges in enterprise AI adoption is getting agents built on different frameworks and vendors to work together. That’s why we created an open Agent2Agent (A2A) protocol, a collaborative way to help agents across different ecosystems communicate with each other.
Related contents:
The first and the best multi-agent framework. Finding the Scaling Law of Agents
Building Multi-Agent Systems for Task Automation.
🐫 CAMEL is an open-source community dedicated to finding the scaling laws of agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
CAMEL emerges as the earliest LLM-based multi-agent framework, and is now a generic framework to build and use LLM-based agents for real-world task solving. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
Transform Al Prototypes into Enterprise-Grade Products.
Langtrace is an Open Source Observability and Evaluations Platform for Al Agents.
Related contents:
Balance agent control with agency. Build resilient language agents as graphs.
Gain control with LangGraph to design agents that reliably handle complex tasks. Build and scale agentic applications with LangGraph Platform.
LangGraph — used by Replit, Uber, LinkedIn, GitLab and more — is a low-level orchestration framework for building controllable agents. While langchain provides integrations and composable components to streamline LLM application development, the LangGraph library enables agent orchestration — offering customizable architectures, long-term memory, and human-in-the-loop to reliably handle complex tasks.
Related contents:
JobSet: a k8s native API for distributed ML training and HPC workloads
JobSet is a Kubernetes-native API for managing a group of k8s Jobs as a unit. It aims to offer a unified API for deploying HPC (e.g., MPI) and AI/ML training workloads (PyTorch, Jax, Tensorflow etc.) on Kubernetes.
Related contents:
Bringing Agentic AI to cloud native.
An open-source framework for DevOps and platform engineers to run AI agents in Kubernetes, automating complex operations and troubleshooting tasks.
Related contents:
An in-depth book and reference on building agentic systems like Claude Code. A deep-dive guide into architecture patterns for building responsive, reliable AI coding agents.
There's been a lot of asking about how Claude Code works under the hood. Usually, people see the prompts, but they don't see how it all comes together. This is that book. All of the systems, tools, and commands that go into building one of these.
A practical deep dive and code review into how to build a self-driving coding agent, execution engine, tools and commands. Rather than the prompts and AI engineering, this is the systems and design decisions that go into making agents that are real-time, self-corrective, and useful for productive work.
Go beyond nascent AI demos. The intelligent AI-native gateway for prompts and agentic apps.
Effortlessly build AI apps that can answer questions and help users get things done. Arch is the AI-native proxy that handles the pesky heavy-lifting so that you can move faster in building agentic apps, prevent harmful outcomes, and rapidly incorporate latest models.
AI-native (edge and LLM) proxy for agents. Move faster by letting Arch handle the pesky heavy lifting in building agentic apps -- ⚡️ query understanding and routing, seamless integration of prompts with tools, and unified access and observability of LLMs. Built by the contributors of Envoy proxy.
curated list of resources for AI Engineering.
Related contents:
The Open-Source LLM Evaluation Framework.
DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation.
Simple, secure, and reproducible packaging for AI/ML projects.
KitOps is an open source DevOps tool that packages and versions your AI/ML model, datasets, code, and configuration into a reproducible artifact called a ModelKit. ModelKits are built on existing standards, ensuring compatibility with the tools your data scientists and developers already use.
Secure & reliable LLMs. Test & secure your LLM apps. Open-source LLM testing used by 51,000+ developers.
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Related contents:
llamafile lets you distribute and run LLMs with a single file.
Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
Enable AI to control your browser. Make websites accessible for AI agents.
We make websites accessible for AI agents by extracting all interactive elements, so agents can focus on what makes their beer taste better.
Related contents:
Open Source LLM Engineering Platform. Traces, evals, prompt management and metrics to debug and improve your LLM application.
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Related contents:
Prompt Engineering, Evaluation, and Observability for LLM apps.
Your End-to-End Collaborative Open Source End-to-End LLM Engineering Platform. Agenta provides integrated tools for prompt engineering, versioning, evaluation, and observability—all in one place.
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Ship AI features in minutes. Pezzo enables you to build, test, monitor and instantly ship AI all in one platform, while constantly optimizing for cost and performance.
🕹️ Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.