I am launching into a new project to implement Ai ‘on premises’. The two major reasons for this are:
A. As the frontier cloud models become more expensive and small models become better I foresee much more demand to have AI solutions on premises
and
B. I want a local AI to integrate with mt IoT projects.
In the long run I will probably use Azure AI Foundry Local, however, this is going to require new hardware. Wanting to avoid that initially and get up to speed with other cheaper options I going to start with Ollama.
What Ollama Does
Think of Ollama as a model manager and runtime. It:
- Downloads AI models to your PC
- Runs them locally
- Provides a simple command-line interface
- Exposes a local API that applications can connect to
- Manages model updates and storage
Prerequisites: More Than “Windows 10 or Later”
The official download page says Windows 10 or later. That’s technically correct and practically incomplete.
Your OS needs to be Windows 10 22H2 or newer — older builds turn progress indicators into rows of blank squares in the terminal. For NVIDIA cards, driver 531 or newer is required. Cards from the GTX 900 or 1000 era (compute capability 5.0–6.2) need driver 570 specifically — confirm your card’s compute level at nvidia.com/cuda-gpus before assuming you’re covered.
AMD is where people get caught. On Windows, Ollama supports Radeon RX 6000 and 7000 series only, via the ROCm v7 driver stack. If your card isn’t in that range, Ollama will silently fall back to CPU — no warning, no error. Check the supported hardware list before committing to the setup.
RAM: 16 GB is a working floor for 7B models; plan for 32 GB if you want breathing room. Storage tends to catch people off guard — the binary install is roughly 4 GB, but models range from 4 GB for smaller ones to well over 20 GB in the 14B range. NVMe for model storage is worth it. The installer runs under your user account — no admin rights needed.
Install Checklist
- Download
OllamaSetup.exefrom ollama.com and run it. Approve the Windows Defender prompt.
(Personally, I’d use Winget via – winget install –id Ollama.Ollama –e) - Open PowerShell and run
ollama --version— confirm it’s on your PATH.
- Check the service is running:
curl http://localhost:11434/api/tagsshould return JSON.
- Pull a starting model:
ollama pull llama3.2 - Run it:
ollama run llama3.2 - Open a second terminal and run
ollama ps. Check the PROCESSOR column.
That last step matters more than the rest combined.
Gotchas Worth Knowing
The silent CPU fallback is the one that costs the most time. Ollama will run on CPU if it can’t access your GPU — no error, just responses ten to fifteen times slower than they should be. ollama ps shows what’s actually handling inference. Adding --verbose to your run command surfaces device information at startup as well.
Model storage defaults to your home directory. On a machine with a small system drive, this bites quickly. Set the OLLAMA_MODELS environment variable in your user account settings before pulling any models. The catch: if Ollama is already running as a background service, setting the variable in your current terminal won’t reach it — the service has the old environment. Quit from the system tray, save the variable, then relaunch.
When something breaks, start with the logs. Ollama writes to %LOCALAPPDATA%\Ollama\server.log — it’s significantly more informative than anything that surfaces on screen.
The Bigger Picture
I’ll soon share my various builds with local AI here.
