Distribute and run LLMs with a single file.
Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
Finetune AI & LLMs faster.
Unslow AI training & finetuning Get 30x faster with unsloth. 5X faster 60% less memory QLoRA finetuning. Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!
Get up and running with large language models, locally.
Run Llama 2, Code Llama, Mistral, Gemma, and other models. Customize and create your own.
Get up and running with large language models, locally.
Run Llama 2 and other models on macOS. Customize and create your own.