Quick Start

Once the daemon is running, Lattis listens on 127.0.0.1:1234 by default. Point any OpenAI- or Anthropic-compatible client at it.

1. Confirm it’s up

curl http://127.0.0.1:1234/health
# {"status":"ok"}

2. Add a model

Open the app and either:

Download a local model from the Library (a GGUF from Hugging Face), or
Connect a cloud provider (Anthropic or OpenAI / Codex) — see Cloud Providers.

List what is currently available:

curl http://127.0.0.1:1234/v1/models

This merges your local models and any connected cloud models.

3. Send a request

OpenAI-compatible:

curl http://127.0.0.1:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3-4b-instruct-2507",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Anthropic-compatible:

curl http://127.0.0.1:1234/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3-4b-instruct-2507",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

4. Switch to a cloud model

Use a connected cloud model by passing its id as model — Lattis routes it to the provider and translates formats as needed:

curl http://127.0.0.1:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4-8",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Point your SDK at it

Any OpenAI or Anthropic SDK works — just set the base URL:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="local")
resp = client.chat.completions.create(
    model="qwen3-4b-instruct-2507",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Next: the full Public API, or learn about Local Models and Usage & Cost.