Local LLM inference and private RAG pipelines. Everything runs on the network — nothing leaves it.
Deploy Ollama on a Proxmox Ubuntu VM, expose the API to the local network, stand up Open WebUI and Chroma, and build a full RAG ingestion pipeline that indexes your files locally. CPU-only Phase 1 — GPU passthrough in Phase 2.
Curated reference for models available through Ollama — installed stack, RAM requirements, use cases, and notes on privacy. Chat, code, reasoning, embedding, and vision models covered.
AMD Vega 56 GPU passthrough via Thunderbolt 4 eGPU. ROCm deprecation workaround, Vulkan backend for Ollama, and the full vendor-reset story. Documented in the Proxmox lab.