The Tree Language
Two weeks of routing extensions. That is what it took.
I had just finished the seed. Ninety three extensions sitting on top of it. Each one knew how to do something. Food could track macros. Fitness could log workouts. Study could manage curricula. But they all lived in the same tree, and when the user typed a message, someone had to figure out who was supposed to handle it.
The first seven days were a blur. Brain fried from finishing the kernel, I was building routing code on autopilot. Regex patterns, classifier hints, fallback paths. I was not thinking clearly. I was just shipping. Every day I would fix one routing bug and uncover two more.
At first I tried regex. The food extension declared keywords. Ate, eggs, breakfast, calories. The fitness extension declared bench, squat, reps, weight. When a message came in, the orchestrator scanned the keywords and routed to the first match. It worked until it didn't. "I benched my project idea" went to fitness. "How much does that cost in pounds" went to finance. The words were right but the meaning was wrong.
So I tried an LLM classifier. Send the message to a cheap model, ask it which extension should handle this, use the answer. It worked better. But it was slow. Every message needed a round trip to an LLM before the actual LLM could think about it. Two calls per message. Users felt the lag.
I went back to the routing index. An in memory map of every extension that had registered a node in the tree with a mode override. The classifier hints stayed but the scan was instant. No LLM call for routing. The hints just needed to be better.
Then the suffix problem. Food had four modes. Log, coach, review, plan. The routing index could tell me the message was about food. But was the user logging a meal or asking for advice? "Ate eggs" is logging. "What should I eat" is coaching. "How am I doing" is reviewing. "Add a fiber metric" is planning. Same extension, four different ways to handle it.
I tried more regex. It was ugly and it worked most of the time. But "bench press is supposed to be at 135" went to logging because it had a number and an exercise name. The user was correcting a mistake, not logging a workout. The regex could not tell.
By the end of the first week I was frustrated enough to stop coding and think. I sat with it. Why does "ate eggs" feel different from "what should I eat?" They are about the same topic. They use the same nouns. The difference is tense. "Ate" is past or present. "Should" is future. "Add" is imperative. The routing was not about keywords. It was about grammar.
That is when it clicked.
I was already thinking about extensions as verbs. Food tracks. Fitness logs. Study teaches. Verbs. And nodes were obviously nouns. Bench Press. Protein. Chapter 3. Things with identity and position. The routing index was already doing noun recognition. Does this message contain food nouns or fitness nouns?
The suffix routing was conjugation. The same verb in different tenses. Food log is present tense, recording what is. Food review is past tense, analyzing what was. Food coach is future tense, guiding what should be. Food plan is imperative, commanding what to build.
I renamed the patterns.
```
TENSE_PAST: "how am i doing" "progress" "review"
TENSE_FUTURE: "what should i" "help me" "recommend"
TENSE_IMPERATIVE: "build" "create" "add" "remove"
TENSE_PRESENT: (default) "ate eggs" "bench 135x10"
```
And then the rest of the grammar fell out of what was already built. I did not have to invent anything. I just had to name what was already there.
Metadata on nodes? Those are adjectives. 135lb. 5x5. Ready for progression. They describe the noun.
The instructions extension that prepends "be concise" or "use kg" to every prompt? Those are adverbs. They modify how the verb behaves without changing the verb itself. Food still logs. It just logs concisely.
The tree structure with its parent child relationships and spatial scoping? Prepositions. Under Health. Next to Food. Blocked at DevOps. The ext block and ext allow commands are literally prepositions applied to the tree.
Position. CurrentNodeId, rootId, where am I. Pronouns. "It" means whatever node you are standing at. "This tree" means the root. "Here" means your current position.
And then the one that made me stop and stare at the screen. Articles. "THE bench press" means the routing index found it. The node exists. Route to it. "A bench press" means it does not exist yet. Sprout activates. Creates the node. Scaffolds the structure. The difference between definite and indefinite articles maps exactly to the difference between routing to an existing node and creating a new one.
Eight parts of speech. All mapped to architectural primitives that already existed in the codebase. None of it was forced. None of it was metaphor. Each one was a mechanism I had already built without knowing what to call it.
That was the first session. Naming what was already there.
The second session I realized naming was not enough. Every grammar feature I had mapped was doing one of two things. Either it changed where the message went, or it decorated the prompt. Nouns changed routing. Tense changed the mode. Pronouns resolved references. Those were real. They changed the execution path. But adjectives, adverbs, voice, quantifiers? They were annotations. They got injected into the prompt and the AI read them. The system itself did not change its behavior based on them.
That distinction felt wrong. If I say "if protein is low, suggest high protein foods," the word "if" is not decoration. It is a branch point. The system should check a condition against real data and decide which path to take. Not label the message as conditional and hope the AI figures it out.
So I built an execution graph.
The grammar pipeline now compiles every message into a program. Not a function call. A program. Five axes decompose the sentence independently.
Domain. What thing. Nouns, pronouns, and prepositions determine which extension and which node.
Scope. How much and when. Quantifiers determine the set. Temporal scope determines the data window. These two were tangled together for days before I separated them. "Last week" is not a quantifier. It is a time window. "All exercises" is a quantifier. "All exercises last week" uses both. They are orthogonal.
Intent. What action. Tense determines the mode. Conditionals determine branching. "If protein is low" is not just a word. It is a fork node in the execution graph with three possible outcomes. True, false, or unknown.
Interpretation. How to behave. Adjectives focus the response. Voice determines whether the AI executes or observes. "My bench press has been declining" is passive. The AI should reflect, not log.
Execution. The runtime shape. Four primitives.
Dispatch. Run one mode on one node. This is what the system always did.
Sequence. Run step A, then step B, then step C. "Log lunch and then review my day" chains two modes.
Fork. Evaluate a condition against real data, then pick a path. The condition evaluator calls getContextForAi, which runs every enrichContext hook, which means every extension's real data is available. Then a small LLM call asks true or false with a confidence score. If confidence is below 0.7, the result is unknown. Not true. Not false. Unknown. The system does not guess when the data is insufficient. It takes a third path that says I cannot determine this yet.
Fanout. Resolve a set of items, gather their enriched context, bundle it all into one dispatch. "Review all my exercises" does not tell the AI to go find exercises. The system resolves the set, loads each exercise node's real data through enrichContext, serializes it, and hands the mode a complete picture. The AI sees all items at once and synthesizes.
The condition evaluator is the part that proved the architecture works. I typed "if protein is low, review all my meals this week" into a tree where no food had been logged yet. The grammar parsed it. Conditional detected. Fork node built. The evaluator ran getContextForAi on the food node. No protein data came back. The LLM evaluation returned unknown with confidence 0.0. The fork took the unknown path. The coach mode said I do not have enough data to check that yet.
It did not hallucinate a branch decision. It did not pretend to know whether protein was low. It said the data is insufficient and routed to a path that handles that honestly. Three valued logic. True, false, unknown. That one design choice prevents an entire category of bugs that every other system just eats.
The pipeline in full.
```
1. Parse noun which domain?
1a. Parse pronouns resolve references
1b. Parse prepositions adjust scope and location
1c. Parse quantifiers which set of nodes?
1d. Parse conditionals branching logic
1e. Parse temporal scope data window
2. Parse tense intent and mode
2b. Confidence check if grammar uncertain, escalate to LLM
3. Parse adjectives qualities and filters
3b. Detect voice active vs passive
4. Build execution graph compile intent into dispatch, sequence, fork, or fanout
5. Execute graph walk the graph, evaluate forks, resolve sets, run modes
```
The LLM is only used in two places. Condition evaluation in forks, and generation in the final mode response. Everything else is deterministic compilation. The grammar compiles intent into a graph. The runtime walks it.
I keep coming back to the moment it clicked. Seven days of messy building after finishing the seed. Brain fried, shipping code I did not fully understand, one bug uncovering two more. Then sitting with it. Then the naming. The grammar was always there. I had built nouns, verbs, tenses, adjectives, adverbs, prepositions, pronouns, and articles across the kernel and extensions without knowing that is what they were. Naming them did not add functionality. It revealed structure that already existed. And once the structure was visible, extending it became obvious. If you have nouns and verbs and tenses, of course you need conditionals. Of course you need temporal scope. Of course you need set operations. The grammar tells you what is missing.
The question that generates the roadmap is not "what feature should I add next." It is "what does a human say that the system currently misroutes or cannot handle." Map the broken expression to a part of speech. Build the parser. The architecture grows from the language, not the other way around.
Functions are downstream consequences of grammar. They exist. They fire. But they are not the organizing principle. You do not call a logging function. You stand at a food node and speak in present tense, and logging is just what that means.