Llama-Cpp

Running a language model locally on a Raspberry Pi 5 is practical in 2026 — if you pick the right model. The Pi 5 (8 GB) handles 1–3B parameter models at speeds that work for interactive tasks without a cloud connection or dedicated AI hardware. This is CPU-only inference, which sets a hard ceiling: expect 5–15 tokens per second for 1.5B models, and 2–5 tokens per second for 3B models. This guide covers setup with both Ollama and llama.cpp, real benchmark data, and which models are worth running on Pi hardware. ...