Tag: inference - Biapy's Bookmarks

inference

llm-d

https://llm-d.ai/

a Kubernetes-native high-performance distributed LLM inference framework.

llm-d is a Kubernetes-native distributed inference serving stack, providing well-lit paths for anyone to serve large generative AI models at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.

llm-d @ GitHub.

Related contents:

Episode 616: From Boston to bootc @ Linux Unplugged.

apache2-licensed foss framework inference kubernetes llm open-source redhat

Added 2 months ago

Mirai

https://trymirai.com/

The future of on device AI. Deploy high-performance AI directly in your app — with zero latency, full data privacy, and no inference costs.

Uzu is a high-performance inference engine for AI models on Apple Silicon.

uzu @ GitHub.

ai apple-silicon foss inference mit-licensed open-source

Added 3 months ago

NVIDIA Dynamo

https://developer.nvidia.com/dynamo

A Datacenter Scale Distributed Inference Serving Framework.

NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLang or others) and captures LLM-specific capabilities.

Dynamo @ GitHub.

Related contents:

A closer look at Dynamo, Nvidia's 'operating system' for AI inference @ The register.

ai apache2-licensed distributed foss genai inference llm machine-learning nvidia open-source

Added 7 months ago

Cerebras

https://cerebras.ai/

Cerebras Inference The world’s fastest inference -70x faster than GPU clouds,128K context, 16-bit precision.

Cerebras Inference Llama 3.3 70B runs at 2,200 tokens/s and Llama 3.1 405B at 969 tokens/s – over 70x faster than GPU clouds. Get instant responses to code-gen, summarization, and agentic tasks.

Related contents:

105 - les news web dev pour janvier 2025 @ Double Slash :fr:.

ai commercial inference llm web-service

Added 9 months ago