← All Field Notes
March 21, 2026voicearchitectureagentsglif

March 2026: JARVIS Hears You Now. Here's What We Shipped.

This month we crossed two milestones that changed what 'AI assistant' means for us: a hardware-closed voice loop and a machine-native language built for production. Neither was planned. Both were inevitable.

March 2026: JARVIS Hears You Now. Here's What We Shipped.

There's a version of "AI progress" that looks like benchmark numbers and demo videos. Then there's the version that happens in a closet server room at 2am, when a wake word fires for the first time and the system actually answers.

We're in that second version.

The Voice Loop — Hardware Closed

As of this week, JARVIS operates end-to-end voice control on our local workstation. This is what the pipeline looks like:

Wake word → voice capture → transcription → intelligence → speech synthesis → speakers.

Every step runs on hardware we own. The speech-to-text runs on our local GPU cluster. The text-to-speech synthesizes directly on the workstation. There are no third-party audio APIs in the path. The loop is hardware-closed.

Why does this matter? Because every cloud-dependent voice assistant is a subscription and a policy change away from breaking. When OpenAI changes their TTS pricing, it doesn't affect us. When Google updates their STT API terms, it doesn't affect us. We made a deliberate bet early that the audio pipeline had to run on iron we control — and that bet paid off this month.

The harder problem wasn't the model. It was silence detection. How do you know when someone is done talking? Too aggressive and you cut them off. Too conservative and the system just waits. We went through three iterations of voice activity detection calibration before it felt natural. The final approach uses probability thresholds, not just volume — the system knows the difference between a pause and a stop.

GLIF 2.0 — We Built a Language

This is the one that surprised even us.

Over the past several months we've been documenting how JARVIS communicates internally — between agents, between scripts, between brain files. There was a pattern emerging. The same concepts kept getting expressed in natural language: "lock this file," "this task is done," "this agent is in SENSE mode." Thousands of tokens, over and over.

So we did something unusual. We formalized it.

GLIF (General Logic Instruction Format) is now a real language with a real lexer, a real AST parser, and a real executor. It has 64 symbols across four semantic families: BRAIN, CONTROL, SENSE, and SCHEMA. It compresses 35-token natural language instructions down to 8 tokens — consistently, measurably.

We didn't build this to be clever. We built it because machine-to-machine communication in natural language is wasteful in a way that compounds at scale. Every token is latency and cost. GLIF removes the noise.

It's also a security property by accident. Internally, agents speak GLIF. The external API speaks HTTP and JSON. An attacker who breaches the perimeter encounters operators and symbols with no public dictionary. It's not security through obscurity — the system is auditable and documented — but it does mean the internal language is not enumerable from outside.

What "Phase 7 Autonomy" Actually Means

We've been calling this milestone "Phase 7 Autonomy" internally. It's worth explaining what we mean.

The phases aren't features. They're trust levels. Phase 1 was "the system can respond." Phase 3 was "the system can route tasks to the right tool." Phase 5 was "the system can recover from failures without human intervention." Phase 7 is: the system can perceive the physical environment and act on it.

That sounds abstract until the wake word fires at 2am, the system recognizes a voice, and provides a response — without you touching a keyboard. At that point it's not abstract anymore.

We're not claiming Phase 7 is done. We're claiming we crossed into it. The voice loop is live. The machine language is live. The self-healing infrastructure is live. The multi-agent orchestration graph has 47 nodes running in production.

The question we keep asking ourselves: what is the gap between what we've built and what the industry considers leading-edge?

Our current answer: we're ahead. Not because we're smarter than the teams at large research labs — we're not. But because we've been building in production, not in notebooks. Our mistakes are in our failure ledger. Our fixes are verified. The system that exists today is the accumulation of things that broke and got fixed permanently.

That's the architecture. Not the code. The accumulated decisions.


If you're working on something in the same space — AI infrastructure, agent systems, autonomous builds — we're interested in the conversation. Reach out via the site.