Run AI locally. Scale globally.
LocoPilot is an OpenAI-compatible inference platform that runs models locally via Ollama, falls back to remote GPU automatically, and fine-tunes on your data — all from one CLI.
$ npm install -g @infrarix/locopilot && locopilot initCopyEverything you need to run AI in production
No more wrestling with fragmented tools. LocoPilot is a unified runtime for local inference, remote GPU, and fine-tuning.
Local-first inference
Run any Ollama-compatible model on your hardware. Zero latency, zero cost per token, complete data privacy.
Smart GPU fallback
When a model isn't local, requests automatically route to RunPod Serverless GPUs with a 90-second SLA.
Fine-tune on your data
Submit Unsloth, Axolotl, or MLX training jobs via the CLI. Alpaca and ShareGPT datasets supported out of the box.
OpenAI-compatible API
Drop-in replacement for OpenAI. Point any SDK at LocoPilot and it just works — no code changes needed.
Automatic failover
RunPod timeout? LocoPilot retries locally. Both down? It returns a clean 503. No hanging requests ever.
Developer CLI
`locopilot init`, `doctor`, `start`, `train`, `logs`, `expose` — everything you need from one command.
Up and running in three commands
From zero to production-grade inference in under 5 minutes.
Install & init
Install the CLI and run locopilot init. It detects your Ollama setup, initialises a local SQLite database, and writes a default .env.
$ npm install -g @infrarix/locopilot $ locopilot init 🐌 LocoPilot Doctor ✔ Ollama ✔ SQLite ✔ Ready. Free tier activated.
Call the API
Use the OpenAI SDK, curl, or any HTTP client. No API key needed for local free-tier use — LocoPilot routes to Ollama if the model is available.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed',
});
const stream = await client.chat.completions.create({
model: 'llama3:8b',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});Fine-tune
Submit a training job with your JSONL dataset. LocoPilot validates the format, runs it in-process via the local worker, and streams logs live.
$ locopilot train \ --config train.json Submitting job... Job ID: job_7f3a2b Status: running [unsloth] Loading llama3:8b ... [unsloth] Step 50/300 — loss: 1.42 [unsloth] Step 100/300 — loss: 0.98 [unsloth] Training complete ✔
See it in action
Watch a real init → inference flow. Output streams in real-time, just like the actual CLI.
Explore LocoPilot
Every surface is designed for developers. Pick your entry point.
Command-line interface
Manage your whole stack from the terminal. Init, start, fine-tune, expose — all in one binary.
OpenAI-compatible REST API
POST /v1/chat/completions, GET /v1/models — identical surface to OpenAI so any existing SDK works instantly.
Fine-tuning pipeline
Submit Unsloth, Axolotl, or MLX jobs via the training API. Live log streaming via SSE while your model trains.
Works with any OpenAI SDK
Python, Node.js, Go, Rust — if it supports a custom baseURL, it works with LocoPilot out of the box.
Ready to run AI at turbo speed?
Get started in under 5 minutes. No credit card required.