v0.2.0 · macOS 14+ · Apple Silicon

Local AI, built natively for Apple.

LocalEngine is a pure-Swift, App Store-ready local AI runtime for macOS. It loads GGUF models through a real llama.cpp runtime, accelerates them with Metal, manages models on-device, and powers Privy apps and browser extensions — all without a single byte leaving your Mac.

SwiftEnd to end
MetalGPU backend
0Cloud calls
LocalEngine — EngineCoordinator.swift
import ProviderKit
import LlamaKit

// Real llama.cpp + Metal, no cloud.
let runtime = AutoLlamaRuntime()
try runtime.setModelPath("qwen.gguf")

for await token in runtime.stream(prompt) {
    transcript.append(token)   // live UI
}

 backend  = metal
 model    = local-gguf
 sandbox  = on-device
backendmetal
readytrue
Swift · SwiftUI · llama.cpp · Apple Metal · GGUF · MLX · App Sandbox
Capabilities

Everything the engine does

A genuine local inference stack written in Swift — not a thin wrapper. Built to feel native on Apple Silicon and to drop straight into the Privy product family.

Pure Swift, natively Apple

Built end-to-end as a real SwiftUI app — no Electron, no Python sidecar, no embedded browser. Just a signed, App-Sandboxed macOS app that feels at home on the platform.

Real llama.cpp on Metal

LlamaKit links the genuine llama.cpp runtime and loads GGUF models through its Metal backend, so generation runs on the Apple Silicon GPU — fast, cool, and battery-aware.

True token streaming

ProviderKit's LlamaProvider streams tokens the moment they're produced. The Chat surface renders them live, and you can cancel an in-flight generation at any time.

On-device model management

ModelKit scans, validates, and activates local GGUF models, and reconciles a remote manifest into a downloadable catalog — all from a native Models tab.

Resumable download jobs

Model downloads run as persistent, SQLite-backed jobs via JobQueueKit — with live progress, cancellation, and resume across launches. No half-downloaded weights.

Private & offline by design

Inference and your model files never leave the Mac. No telemetry, no cloud round-trips. API tokens live in the Keychain; the engine runs entirely in the App Sandbox.

Under the hood

A clean, modular Swift runtime

From the SwiftUI control plane down to the Metal kernels, every kit has one job — and all of them run on your machine.

App
LocalEngine.app SwiftUI control plane · Dashboard · Chat · Models · Logs
coordinates ↓
Core
EngineCoordinator · ModelRegistry · ChatService Runtime selection · active model · transcript & streaming
drives ↓
Kits
LlamaKit → ProviderKit → llama.cpp C API GGUF loading · token generation · JobQueueKit downloads
runs on ↓
Hardware
Apple Silicon GPU · Metal Vendored Metal-enabled build · zero cloud
LlamaKitProviderKitLlamaModelKitJobQueueKitMLXKitTranslateKitSpeechKitChatKit
Bridge for Privy & extensions

A fast local API for the things you ship

LocalEngine exposes an opt-in HTTP API on loopback so a Chrome extension or a Privy app can ask the on-device model to translate a selection, a page, or a whole batch — with no API key leaving the Mac.

  • GET /health Engine readiness probe
  • GET /v1/status Runtime & active model
  • POST /v1/translate Single-string translation
  • POST /v1/translate/batch Batched translation
Read the API reference →
translate.js extension
// Chrome extension → LocalEngine, on-device
const res = await fetch(
  "http://127.0.0.1:8765/v1/translate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      text: "hello world",
      source: "auto",
      target: "zh-CN",
      mode: "selection",
    }),
  });

const { translation } = await res.json();
console.log(translation); // 你好,世界
Native experience

A real Mac app, not a config file

The SwiftUI control plane gives the runtime a face — run it, feed it models, talk to it, and watch its logs from one window.

01

Dashboard

Start and stop the engine and watch live runtime status — including LlamaKit backend and Metal availability — without touching a terminal.

02

Chat

A native multi-turn chat surface wired straight to the active local model. Streaming bubbles, send, stop, and clear — the fastest way to confirm it works.

03

Models

Scan your local GGUF inventory, import new weights, browse the remote catalog, and watch download jobs progress live. Pick the active model in one tap.

04

Logs

Inspect the engine end-to-end — switch between local API request logs and the full runtime log to see exactly what generation and model jobs are doing.

Get going

Running in three steps

1

Open the project in Xcode

open LocalEngine.xcodeproj
# target: LocalEngine · macOS 14.0+
2

Add a GGUF model

# Models → Import…  or drop a file into
~/Library/Application Support/
  LocalEngine/Models
3

Select it and chat

# Pick the active model, hit Run,
# then talk to it in the Chat tab —
# fully offline.

No GGUF model is bundled — you stay in full control of which weights run. Browse the in-app catalog to download one as a resumable job, or bring your own.

Ready when you are

Bring intelligence on-device.

LocalEngine runs on macOS 14+ Apple Silicon. Download the app and bring a signed, sandboxed local AI runtime to your Mac today.