Architecture overview
LocoPilot is an open-core monorepo. The MIT-licensed pieces live in locopilot-public-cli/ and ship to every customer. The proprietary cloud control plane (locopilot-core-backend/) is hosted by us and never bundled with the CLI.
Component map
┌────────────────────────┐
│ locopilot-public-cli │ (this repo, MIT)
│ │
┌──────────────┤ src/cli/ (Commander)│
│ │ src/api/ (Fastify) │
│ │ src/worker/(in-proc) │
│ │ src/training/ │
│ │ src/cloud/client.ts │
│ └─────────┬──────────────┘
│ │
│ │ HTTPS (only when Pro token present)
│ ▼
│ ┌────────────────────────┐
│ │ LocoPilot Cloud │ (proprietary, hosted)
│ │ /api/inference │
│ │ /api/train │
│ │ /api/auth/* │
│ │ /api/usage │
│ │ /api/models │
│ │ /api/health │
│ └─────────┬──────────────┘
│ │
│ │
▼ ▼
┌──────────────┐ ┌────────────────────────┐
│ Ollama │ │ RunPod Serverless GPU │
│ (localhost) │ │ (90-second SLA) │
└──────────────┘ └────────────────────────┘
Public CLI source layout
| Folder | Responsibility |
|---|---|
src/cli/ | Commander.js entry — every subcommand is a file under src/cli/commands/ |
src/api/ | Fastify 5 HTTP gateway. Routes: /v1/chat/completions, /v1/completions, /v1/models, /v1/locopilot/health, /v1/locopilot/training/* |
src/worker/ | In-process training worker (Free tier). Single-process, no queue. |
src/training/ | TrainingAdapter interface, TrainingConfig, dataset validator, Unsloth / Axolotl / MLX adapters and Python runners |
src/cloud/client.ts | The only place that talks to the cloud — callCloudInference, callCloudTrain, callCloudAuthVerify, callCloudAuthMe, callCloudUsage, callCloudModels, callCloudHealth, callCloudTunnel, plus isProUser() and getCloudToken() |
src/shared/ | DB pool (SQLite), Ollama runtime client, shared types, AppError and constants (TRAINING_DEFAULTS, MIN_DATASET_EXAMPLES, PROVIDERS, …) |
Tier detection
// src/cloud/client.ts
export function isProUser(): boolean {
return getCloudToken() !== null;
}
getCloudToken() returns the value at token in ~/.locopilot/config.json only if it starts with qs_. That's the entire test. Being logged in is not the same as having an active subscription — the cloud may still return 403 pro_subscription_required and the cloud client surfaces that as a typed ProSubscriptionRequiredError.
Request pipeline
The Fastify pipeline at the top of src/api/index.ts:
onRequest— validatesAuthorization. Skipped for/v1/modelsand/v1/locopilot/health. Non-qs_tokens become anonymous (req.apiKey = undefined).preHandler— in-memory token-bucket rate limiter. Free tier short-circuits with no limit; Pro tier gets a9999/minlocal ceiling (real limit enforced cloud-side).- Route handler — runs
localRouter.resolve()on/v1/chat/completionsand/v1/completions, then streams from the resolved provider. setErrorHandler— serialisesAppError(e.g.ValidationError,NotFoundError,RateLimitError) into JSON.
Chat / completions
localRouter.resolve(model, isProUser) returns one of three providers:
| Provider | When |
|---|---|
local | Ollama lists a model whose name equals model or starts with model.split(':')[0] |
remote | Pro user and no local match (or Ollama unreachable) |
not_found | Free user and no local match (or Ollama unreachable) |
local streams from <OLLAMA_HOST>/api/chat. remote proxies to POST /api/inference on LocoPilot Cloud (which handles RunPod). not_found returns 404 model_not_found.
A 403 pro_subscription_required from the cloud is surfaced with the upgrade URL — there is no silent fallback to local for an explicit Pro request. A generic provider error returns 503 Provider unavailable.
Training
The local API's /v1/locopilot/training/* handlers inspect the Authorization header and act in one of two modes:
| Header | Behaviour |
|---|---|
Bearer qs_… | Proxies the request verbatim to LocoPilot Cloud's /api/train*. Surfaces upstream 403 pro_subscription_required as-is. |
| anything else | Validates the body, runs the dataset validator (MIN_DATASET_EXAMPLES = 10, first 5 lines must be uniformly Alpaca or ShareGPT), inserts a row into training_jobs, and fires the executor in-process. |
The locopilot train --cloud flag bypasses the local API entirely and calls the cloud directly — that's why the local-API auto-proxy path exists separately, for direct API consumers (HTTP clients other than the CLI).
Local data model (SQLite)
Created by locopilot init at ~/.locopilot/db.sqlite:
inference_logs (id, api_key_id, model, provider, tokens_in, tokens_out, latency_ms, ttfb_ms, status, created_at)
training_jobs (id, framework, status, base_model, dataset_path, output_path, config_json, error, started_at, completed_at, created_at)
training_jobs is written by the worker. inference_logs is currently unused in the public CLI v1.0 — the chat handler computes per-request usage but the local tracker.record() is a no-op stub (see src/api/services/localStubs.ts). Pro-tier inference is metered server-side.
Failure handling
| Failure | Behaviour |
|---|---|
| Ollama unreachable + Free tier | 503 Provider unavailable |
| Ollama unreachable + Pro tier | Falls through to RunPod via the cloud |
Cloud 403 pro_subscription_required | Surfaced as 403 with upgrade_url — no silent local fallback |
| Cloud generic failure (5xx, network) | 503 Provider unavailable |
| Local API down | Fastify never started — CLI prints Could not connect to API: … |
| Pro token expired | Cloud returns 401; CLI surfaces Token rejected by cloud — run locopilot login |
Security boundaries
- The local API binds to
0.0.0.0by default. Use a reverse proxy if you need TLS or auth in front of it. - The Pro token in
~/.locopilot/config.jsonismode 0600— only the current user can read it. - Model names everywhere are validated against
^[a-zA-Z0-9:._-]+$to prevent shell injection in theollama pull/ollama rmpaths. cloudflaredis spawned withshell: falseand an explicit args array — no command-line construction.- Chat / completions request bodies are validated by Fastify schema before they reach the handler (
messages≤ 500 items, content ≤ 128 KB,temperature0-2,max_tokens1-65536, role ∈system | user | assistant).