Public API
Lattis serves one local HTTP API that speaks both the OpenAI and Anthropic
dialects. It listens on 127.0.0.1:1234 by default (configurable; the router
child uses the next port up).
Point any OpenAI- or Anthropic-compatible client at it. Use a local or connected
cloud model by passing its id as the model field; Lattis routes and translates
formats as needed.
Endpoints
Section titled “Endpoints”| Method & path | Purpose |
|---|---|
GET /health | Liveness check. |
GET /v1/models | Local + connected-remote models, merged. |
POST /v1/chat/completions | OpenAI chat (stream or whole). |
POST /v1/responses | OpenAI Responses. |
GET /v1/responses | OpenAI Responses over WebSocket (Codex). |
POST /v1/messages | Anthropic Messages. |
POST /v1/messages/count_tokens | Anthropic token counting. |
Examples
Section titled “Examples”OpenAI chat completions:
curl http://127.0.0.1:1234/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"qwen3-4b-instruct-2507","messages":[{"role":"user","content":"Hello!"}]}'Anthropic messages:
curl http://127.0.0.1:1234/v1/messages \ -H 'Content-Type: application/json' \ -d '{"model":"qwen3-4b-instruct-2507","max_tokens":256,"messages":[{"role":"user","content":"Hello!"}]}'List models (local + connected cloud):
curl http://127.0.0.1:1234/v1/modelsEach entry includes the model’s context window (meta.n_ctx) where Lattis knows
it — read from GGUF/MLX metadata for local models, and from a built-in table for
cloud models.
Streaming
Section titled “Streaming”POST /v1/chat/completions and POST /v1/messages support streaming responses.
The GET /v1/responses WebSocket transport is used by Codex clients on the
OpenAI subscription path.
- The API is unauthenticated and assumes a loopback bind. Don’t expose it on a non-loopback interface without putting your own auth in front of it.
- For local administration (downloads, loading models, connecting providers), see the Control API.