Installation
Prebuilt binaries are not published yet. For now, build from source — it is a single Cargo workspace and the bundled inference engine is fetched automatically.
Requirements
Section titled “Requirements”- A recent stable Rust toolchain (1.95+).
curlandtar(PowerShell on Windows) — used at build time to fetch a pinned, prebuiltllama-serverfor your platform.
git clone https://github.com/ZetaMinusOne/lattis.gitcd lattis/app
make build # builds the workspace and vendors llama-servermake run # runs the desktop app (it can start the daemon for you)The GUI looks for lattisd next to its own binary (the normal cargo / install
layout), so a plain build puts both in app/target/debug/.
To run the daemon directly without the app:
make rund # cargo run -p lattisdThe bundled inference engine
Section titled “The bundled inference engine”Lattis drives llama.cpp’s llama-server in router mode as a child process.
Rather than compile it, build.rs downloads a pinned prebuilt release for
your platform at build time and stages it next to the daemon binary, so it ships
with no first-run download. On Apple Silicon the preset offloads all layers to
the Metal GPU backend.
Escape hatches:
LATTIS_LLAMA_SERVER=/path/to/llama-server— use an existing binary.LATTIS_SKIP_VENDOR=1— skip vendoring; fall back to allama-serveronPATHat runtime. (Thetest,lint, andfmtmake targets set this so they run fast and offline.)
Optional: MLX on Apple Silicon
Section titled “Optional: MLX on Apple Silicon”Lattis can also serve MLX models through Apple’s
mlx_lm, run as an mlx_lm.server child
process. MLX is not bundled — the daemon detects it at startup and only
offers MLX models when it is installed.
# Apple Silicon (arm64) onlypip install mlx-lm# or, isolated:# uv tool install mlx-lm# pipx install mlx-lmLattis finds an interpreter that can import mlx_lm by trying, in order:
$LATTIS_MLX_PYTHON, then python3, then python. To pin a specific
environment:
export LATTIS_MLX_PYTHON=/path/to/venv/bin/pythonpython3 -c "import mlx_lm.server" # must exit 0Restart the daemon and MLX models appear in the Library and in GET /v1/models.
Without mlx_lm, the daemon behaves exactly as before (MLX models are hidden).
See Local Models for downloading and serving models, and Launch on Login to run the daemon in the background.