Installation

Prebuilt binaries are not published yet. For now, build from source — it is a single Cargo workspace and the bundled inference engine is fetched automatically.

Requirements

A recent stable Rust toolchain (1.95+).
curl and tar (PowerShell on Windows) — used at build time to fetch a pinned, prebuilt llama-server for your platform.

Build

git clone https://github.com/ZetaMinusOne/lattis.git
cd lattis/app

make build      # builds the workspace and vendors llama-server
make run        # runs the desktop app (it can start the daemon for you)

The GUI looks for lattisd next to its own binary (the normal cargo / install layout), so a plain build puts both in app/target/debug/.

To run the daemon directly without the app:

make rund       # cargo run -p lattisd

The bundled inference engine

Lattis drives llama.cpp’s llama-server in router mode as a child process. Rather than compile it, build.rs downloads a pinned prebuilt release for your platform at build time and stages it next to the daemon binary, so it ships with no first-run download. On Apple Silicon the preset offloads all layers to the Metal GPU backend.

Escape hatches:

LATTIS_LLAMA_SERVER=/path/to/llama-server — use an existing binary.
LATTIS_SKIP_VENDOR=1 — skip vendoring; fall back to a llama-server on PATH at runtime. (The test, lint, and fmt make targets set this so they run fast and offline.)

Optional: MLX on Apple Silicon

Lattis can also serve MLX models through Apple’s mlx_lm, run as an mlx_lm.server child process. MLX is not bundled — the daemon detects it at startup and only offers MLX models when it is installed.

# Apple Silicon (arm64) only
pip install mlx-lm
# or, isolated:
#   uv tool install mlx-lm
#   pipx install mlx-lm

Lattis finds an interpreter that can import mlx_lm by trying, in order: $LATTIS_MLX_PYTHON, then python3, then python. To pin a specific environment:

export LATTIS_MLX_PYTHON=/path/to/venv/bin/python
python3 -c "import mlx_lm.server"   # must exit 0

Restart the daemon and MLX models appear in the Library and in GET /v1/models. Without mlx_lm, the daemon behaves exactly as before (MLX models are hidden).

See Local Models for downloading and serving models, and Launch on Login to run the daemon in the background.