How to Run Phi-4-mini Locally: Microsoft's 3.8B Model with 128K Context
Phi-4-mini is Microsoft’s 3.8B parameter model and one of the most practical models to run locally right now. The GGUF download is 2.49 GB at Q4_K_M, it handles a 128,000-token context window — rare at this scale — and it scores 88.6% on GSM8K math benchmarks while competing against models that score 30 points lower. This guide covers exactly how to run Phi-4-mini locally with Ollama, llama.cpp, and Python’s Transformers library, with verified RAM requirements and benchmark context throughout. ...