llama
Hundreds of models & providers. One command to find what runs on your hardware.
A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.
Ships with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).
Related contents:
Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙 Paddler is an open-source LLMOps platform that lets teams run inference and deploy LLMs on their own infrastructure.
A CLI utility and Python library for interacting with Large Language Models.
A CLI tool and Python library for interacting with OpenAI, Anthropic’s Claude, Google’s Gemini, Meta’s Llama and dozens of other Large Language Models, both via remote APIs and with models that can be installed and run on your own machine.
Related contents:
Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.
A curated collection of awesome LLM apps built with RAG and AI agents. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and even open-source models like LLaMA that you can run locally on your computer.
Distribute and run LLMs with a single file.
Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
Related contents:
Finetune AI & LLMs faster.
Unslow AI training & finetuning Get 30x faster with unsloth. 5X faster 60% less memory QLoRA finetuning. Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!
Related contents:
Get up and running with large language models, locally.
Run Llama 2, Code Llama, Mistral, Gemma, and other models. Customize and create your own.
Related contents:
- Local RAG with Ollama, Mistral, and Turso @ Turso's blog.
- S4E10 - Quel destin pour l’Apple Vision Pro ? @ Underscore_'s Acast :fr:.
- Ollama Course – Build AI Apps Locally @ freeCodeCamp.org's YouTube.
- Detecting Exposed LLM Servers: A Shodan Case Study on Ollama @ Cisco Blogs.
- Ollama - 14 000 serveurs IA laissés en libre-service sur Internet @ Korben :fr:.
- Faire tourner un LLM localement sur votre ordinateur @ Quoi de neuf les devs ? :fr:.
- The Ultimate Beginner's Guide to Self-Hosting Your Own AI @ Arsturn.
- How to Run and Customize LLMs Locally with Ollama @ freeCodeCamp.
- LLMs on Kubernetes Part 1: Understanding the threat model @ CNCF.
- Faire tourner un modèle IA chez soi avec Ollama @ DomoPi :fr:.
- Using AI for Terraform: running locally with Langflow, OpenSearch, & Ollama @ Rosemary Wang's dev.to.
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.
Get up and running with large language models, locally. Run Llama 2 and other models on macOS. Customize and create your own.
The RedPajama-Data repository contains code for preparing large datasets for training large language models. RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset.
an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue.
Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa