Open-source · OpenAI-compatible · Zero cloud lock-in

Run AI locally. Scale globally.

LocoPilot is an OpenAI-compatible inference platform that runs models locally via Ollama, falls back to remote GPU automatically, and fine-tunes on your data — all from one CLI.

Get started

See how it works

Open source on GitHub·MIT licensed·No vendor lock-in

$ npm install -g @infrarix/locopilot && locopilot initCopy

Everything you need to run AI in production

No more wrestling with fragmented tools. LocoPilot is a unified runtime for local inference, remote GPU, and fine-tuning.

Local-first inference

Run any Ollama-compatible model on your hardware. Zero latency, zero cost per token, complete data privacy.

Smart GPU fallback

When a model isn't local, requests automatically route to RunPod Serverless GPUs with a 90-second SLA.

Fine-tune on your data

Submit Unsloth, Axolotl, or MLX training jobs via the CLI. Alpaca and ShareGPT datasets supported out of the box.

OpenAI-compatible API

Drop-in replacement for OpenAI. Point any SDK at LocoPilot and it just works — no code changes needed.

Automatic failover

RunPod timeout? LocoPilot retries locally. Both down? It returns a clean 503. No hanging requests ever.

Developer CLI

`locopilot init`, `doctor`, `start`, `train`, `logs`, `expose` — everything you need from one command.

Up and running in three commands

From zero to production-grade inference in under 5 minutes.

Install & init

Install the CLI and run locopilot init. It detects your Ollama setup, initialises a local SQLite database, and writes a default .env.

$ npm install -g @infrarix/locopilot
$ locopilot init

🐌 LocoPilot Doctor
  ✔ Ollama
  ✔ SQLite

✔ Ready. Free tier activated.

Call the API

Use the OpenAI SDK, curl, or any HTTP client. No API key needed for local free-tier use — LocoPilot routes to Ollama if the model is available.

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'not-needed',
});

const stream = await client.chat.completions.create({
  model: 'llama3:8b',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

Fine-tune

Submit a training job with your JSONL dataset. LocoPilot validates the format, runs it in-process via the local worker, and streams logs live.

$ locopilot train \
  --config train.json

Submitting job...
Job ID: job_7f3a2b
Status: running

[unsloth] Loading llama3:8b ...
[unsloth] Step  50/300 — loss: 1.42
[unsloth] Step 100/300 — loss: 0.98
[unsloth] Training complete ✔

See it in action

Watch a real init → inference flow. Output streams in real-time, just like the actual CLI.

terminal — locopilot

Explore LocoPilot

Every surface is designed for developers. Pick your entry point.

CLI

Command-line interface

Manage your whole stack from the terminal. Init, start, fine-tune, expose — all in one binary.

locopilot init locopilot train locopilot expose

API

OpenAI-compatible REST API

POST /v1/chat/completions, GET /v1/models — identical surface to OpenAI so any existing SDK works instantly.

/v1/chat/completions /v1/models /v1/locopilot/health

Training

Fine-tuning pipeline

Submit Unsloth, Axolotl, or MLX jobs via the training API. Live log streaming via SSE while your model trains.

Configuration Datasets Adapters

SDK

Works with any OpenAI SDK

Python, Node.js, Go, Rust — if it supports a custom baseURL, it works with LocoPilot out of the box.

Quickstart Architecture Chat completions

Read the full documentation

Ready to run AI at turbo speed?

Get started in under 5 minutes. No credit card required.

Read the quickstart

View on GitHub