Skip to main content

3.12 Local LLMs

Local LLMs live under AI Capabilities → Local LLMs. Install and run Ollama on the PC so Moss can call a fast on-device model for lightweight Q&A, summarization, and similar tasks.

Local LLMs: manage the Ollama runtime, download models, run smoke tests, and wire fast mode

Suggested workflow

StepWhat to do
1Open AI Capabilities → Local LLMs
2Install or detect the Ollama runtime
3Start the local model service
4Pull a chat model and run a test
5After tests pass, assign it as Moss’s fast model

What this page covers

CapabilityDescription
Detect runtimeSee whether Ollama is installed and responding
Install / update runtimeDownload RDK Studio–managed Ollama in the full desktop client
Start / stop serviceToggle the local inference daemon
Storage managementInspect model weights, runtime paths, cache, and free disk
Download modelsEnter a model tag and track progress
Test modelRun a minimal chat to confirm responses
Fast modePromote a model to Moss fast-mode usage

First-time walkthrough

  1. Open Local LLMs.
  2. If prompted that the runtime is missing, choose Install and start.
  3. After the service is up, type a model name—often a team-recommended small chat model.
  4. Download and wait for completion.
  5. Run Test and confirm you get an answer.
  6. Choose Apply as fast model config so Moss fast mode routes here.

Common states

StateMeaningNext step
Not installedNo usable Ollama runtimeInstall via button, or install manually and refresh
Installed but stoppedBinaries present, daemon downStart the service
Service failed to startAnother Ollama instance or port conflictClose duplicate processes or reboot
Empty model listDaemon healthy, no weights yetPull a model by name
Test failedWrong tag, non-chat bundle, or runtime errorSwitch models or reinstall weights

Relationship to AI engine settings

Local LLMs focuses on runtime + weights; Settings → AI engine holds model registry + defaults.

Choosing Apply as fast model config writes or updates an Ollama entry there.

To promote the same weights as a “thinking” model, adjust Settings → AI engine manually.

Relationship to OpenClaw

PC-only models default to localhost. Board OpenClaw generally cannot reach that URL.

For standalone OpenClaw inference, pick a remote endpoint the board can route to, or host a model service on the board and point configuration at a board-local address.