Cerebras

Wed Jan 29 10:12:39 2025

📧email

Cerebras Inference
The world’s fastest inference -70x faster than GPU clouds,128K context, 16-bit precision.

Cerebras Inference Llama 3.3 70B runs at 2,200 tokens/s and Llama 3.1 405B at 969 tokens/s – over 70x faster than GPU clouds. Get instant responses to code-gen, summarization, and agentic tasks.

Links per page

Filters