Inside the engine.

How George processes a message. Type one and watch.

Send a message through the pipeline

connecting to Pi...

pipeline — idle

Message In

raw text

▸

The Gate

Deterministic / Local

14 questions in <5ms. No API call. No cost.
Real names replaced before anything reaches the cloud.

▸

Classify

14 local + 8 deferred

▸

Routing

decision

internet

Sort

classify + plan

▾

Recall

memory retrieval

▾

Fetch

external data

External data pulled before the model thinks — not after.

▾

Think

generates response

skipped on DEEP_LITE

Simple messages skip the expensive reasoning call.

▾

Shape

voice + format

restore

Response Out

pseudonyms rehydrated

pipeline —

stages —

total time —

george

Classification

The gate.

A message comes in. Before anything reaches the cloud, it hits a local machine. The gate asks 22 yes-or-no questions — is this a greeting? Does it contain a name? Is it asking for something complex? Fourteen get answered by pattern matching in under 5 milliseconds. No AI involved. The other eight need language understanding, so those get deferred to the first cloud call.

A greeting never leaves the building. A real question has already been classified, scored for complexity, and routed — before the cloud has seen a word.

TIER 1 — Local

14 binary rules. Regex and pattern matching. <5ms. Zero API cost.

TIER 2 — Cloud

8 complex rules. Requires language understanding. Deferred to first cloud call.

Privacy

Privacy.

While the gate classifies, a scanner looks for anything personal. Names, phone numbers, emails, addresses. When it finds them, it doesn't redact — it replaces. Real names become realistic pseudonyms. Real phone numbers become different real-looking numbers. The cloud gets a message that reads like natural language, just with someone else's identity. After the cloud responds, the local machine swaps everything back. The real data never left.

LOCAL (Raspberry Pi) ─────────────────── CLOUD (sanitized only)

Real data stays here: ─── sanitized text ───▸ Cloud sees only:
Phil Henderson ──────────────────── Michael Chen
403-555-0192 ───────────────────── 403-555-0147
[email protected] ──────────────────── [EMAIL_1]

◄──────────────── RESTORE ────────────────
rehydrate pseudonyms back to real values

10-vector schema

Memory.

Most AI remembers by keeping a log of everything and hoping the model finds the relevant part. This system doesn't do that. Every fact gets ten numerical scores — how important, how recent, what domain, how sensitive, how confident the source. A question triggers math against those scores. Only what's relevant gets pulled. Nothing more.

Routing

Four paths.

Not every question needs the same amount of work. The system decides which path based on the gate scores, not on how long the message is.

DEEP

Full pipeline. Classification, memory retrieval, external data, reasoning, and formatting across multiple models that check each other's work.

DEEP_LITE

Same pipeline, skips the expensive reasoning call. The first call classifies and drafts the response in one pass.

QUICK

One fast cloud call. Casual questions, low-stakes lookups.

SNAP

Zero API calls. Pattern-matched. Greetings, confirmations. Instant. Free.

Components

The modules.

Each piece does one thing. Green border means local execution — no API cost. Amber means an external model call.