Pure Swift, natively Apple
Built end-to-end as a real SwiftUI app — no Electron, no Python sidecar, no embedded browser. Just a signed, App-Sandboxed macOS app that feels at home on the platform.
LocalEngine is a pure-Swift, App Store-ready local AI runtime for macOS. It loads GGUF models through a real llama.cpp runtime, accelerates them with Metal, manages models on-device, and powers Privy apps and browser extensions — all without a single byte leaving your Mac.
import ProviderKit
import LlamaKit
// Real llama.cpp + Metal, no cloud.
let runtime = AutoLlamaRuntime()
try runtime.setModelPath("qwen.gguf")
for await token in runtime.stream(prompt) {
transcript.append(token) // live UI
}
✓ backend = metal
✓ model = local-gguf
✓ sandbox = on-device▋ A genuine local inference stack written in Swift — not a thin wrapper. Built to feel native on Apple Silicon and to drop straight into the Privy product family.
Built end-to-end as a real SwiftUI app — no Electron, no Python sidecar, no embedded browser. Just a signed, App-Sandboxed macOS app that feels at home on the platform.
LlamaKit links the genuine llama.cpp runtime and loads GGUF models through its Metal backend, so generation runs on the Apple Silicon GPU — fast, cool, and battery-aware.
ProviderKit's LlamaProvider streams tokens the moment they're produced. The Chat surface renders them live, and you can cancel an in-flight generation at any time.
ModelKit scans, validates, and activates local GGUF models, and reconciles a remote manifest into a downloadable catalog — all from a native Models tab.
Model downloads run as persistent, SQLite-backed jobs via JobQueueKit — with live progress, cancellation, and resume across launches. No half-downloaded weights.
Inference and your model files never leave the Mac. No telemetry, no cloud round-trips. API tokens live in the Keychain; the engine runs entirely in the App Sandbox.
From the SwiftUI control plane down to the Metal kernels, every kit has one job — and all of them run on your machine.
LocalEngine exposes an opt-in HTTP API on loopback so a Chrome extension or a Privy app can ask the on-device model to translate a selection, a page, or a whole batch — with no API key leaving the Mac.
/health Engine readiness probe /v1/status Runtime & active model /v1/translate Single-string translation /v1/translate/batch Batched translation // Chrome extension → LocalEngine, on-device
const res = await fetch(
"http://127.0.0.1:8765/v1/translate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "hello world",
source: "auto",
target: "zh-CN",
mode: "selection",
}),
});
const { translation } = await res.json();
console.log(translation); // 你好,世界 The SwiftUI control plane gives the runtime a face — run it, feed it models, talk to it, and watch its logs from one window.
Start and stop the engine and watch live runtime status — including LlamaKit backend and Metal availability — without touching a terminal.
A native multi-turn chat surface wired straight to the active local model. Streaming bubbles, send, stop, and clear — the fastest way to confirm it works.
Scan your local GGUF inventory, import new weights, browse the remote catalog, and watch download jobs progress live. Pick the active model in one tap.
Inspect the engine end-to-end — switch between local API request logs and the full runtime log to see exactly what generation and model jobs are doing.
open LocalEngine.xcodeproj
# target: LocalEngine · macOS 14.0+ # Models → Import… or drop a file into
~/Library/Application Support/
LocalEngine/Models # Pick the active model, hit Run,
# then talk to it in the Chat tab —
# fully offline. No GGUF model is bundled — you stay in full control of which weights run. Browse the in-app catalog to download one as a resumable job, or bring your own.
LocalEngine runs on macOS 14+ Apple Silicon. Download the app and bring a signed, sandboxed local AI runtime to your Mac today.