The fore man at the front of the work
"Foreman" comes from someone who stands at the front of a crew, watching the work as it happens. The TreeOS Foreman does the same. They watch execution as a call stack and decide, when judgment is required, what comes next. Trees execute like functions on a stack. Step N+1 cannot start until step N's entire descendant subtree settles. The Foreman is the role that holds that frame discipline.
The Foreman doesn't run on every step. Routine forward motion stays programmatic. Sequential queue dispatch, step N done advances to step N+1, branch step rollup from sub statuses. The Foreman wakes when judgment is required. A branch failed. A swarm completed. A resume was requested. A user asked the Ruler to pause.
This is deliberate. A Foreman call between every step on a deep tree would be untenable. Hundreds of LLM calls per session for decisions that don't need an LLM. The Foreman is the judgment surface. The structural enforcement (queue halts, frame discipline, terminal status writes) is programmatic.
The Foreman is persistent for the duration of an execution but transient per wakeup. Each wakeup is a fresh LLM call reading a fresh snapshot. No carryover from prior wakeups, no accumulated assumptions about the work. The execution record persists. The cognition that judges it does not. This is the same fresh eyes principle that keeps Planners and Contractors honest, applied to the role that reads stack state. The position holds across the execution. The occupant of any single wakeup is replaceable. The Foreman's continuity lives in the execution record's persistent state, not in the LLM call.
The Foreman operates under the Ruler's authority. The Ruler dispatches execution. The Foreman manages it. The Ruler decides what to do. The Foreman decides how the doing flows.
When the Foreman's judgment exceeds its authority, when a failure pattern needs court adjudication, when a cancel decision belongs to the Ruler, when the Ruler should pause work for reasons the Foreman can't know, the Foreman escalates. The Ruler decides. The Foreman manages. Together they hold the scope's coherence across the lifecycle.
Four principles shape every Foreman wakeup.
A parent dispatches a sub Ruler. The sub Ruler runs its full lifecycle. Only when the sub Ruler's entire descendant subtree settles does the parent move to the next step. Cancel a parent and all descendants halt. The shape is familiar from any programming language. The Foreman holds it across LLM driven execution.
Judgment is reserved for ambiguity. Deterministic decisions (sequential queue dispatch, step rollup arithmetic, status comparison) stay programmatic. Judgment required decisions (recoverability of a failure, terminal status when results are mixed, when to escalate) wake the Foreman.
The architecture is honest about what deserves cognition. Routine motion doesn't. Ambiguous moments do. This is what keeps Foreman cost bounded at depth while preserving judgment where it matters.
"Cancelled" means decided not to finish. "Failed" means tried and couldn't. They have different semantics for future judgment. Courts read failure as evidence of capability mismatch or contract violation. Cancellation reads as a deliberate halt that doesn't reflect on the cancelled work's quality.
Reputation tanks signatures that fail repeatedly. Cancellation is neutral. The Foreman writes the right terminal status. Readers downstream never collapse them into a generic "didn't complete" bucket.
Pause and resume must remember where the stack was. Cancel a subtree and the cancel marker survives session refresh. The Foreman writes persistent markers, not just in memory signals. Whatever halts the stack today is still halted tomorrow.
Four lifecycle events bring the Foreman in. Each comes with its own judgment shape.
A branch hit a validator violation or exhausted retries. The Foreman reads the failure, the contracts in force, the branch's history. Decides retry, mark failed, or escalate to the Ruler.
All branches at a step settled. Foreman reads terminal statuses across the swarm and decides the freeze. Completed, failed, partial, escalate.
Execution was paused. User said continue. Foreman reads what was paused, calls the resume helper to dispatch pending branches, reenters frame at the right position.
Ruler routes a pause, cancel subtree, or advance step request. Foreman writes the persistent markers, aborts in flight controllers. The swarm queue's halt check picks up the marker on next dispatch.
The Foreman picks one of these. Each has a clear shape and a clear tool.
The branch is recoverable. Spawn the branch's Ruler turn again with the failure reason in context.
The branch can't be recovered at this scope. Stamp the terminal status. Let the rollup propagate.
The execution record reaches its terminal state. Status. Completed, failed, partial. No more work at this record.
Walk down. Mark every descendant cancelled. Abort in flight controllers. Persistent marker survives session refresh.
Halt the queue at the current step (or at a specified step index for deferred pause). Writes pausedAtStepIndex so resume reenters at the right position.
Clear pause markers. Use detectResumableSwarm to find pending branches. Redispatch via runBranchSwarm in resume mode.
The judgment exceeds the Foreman's authority. Bump the case up to the Ruler with the relevant stack context. The Ruler decides revise, archive, or convene court.
The user asked something the Foreman can answer from the stack snapshot. No state change. Just an explanation.
Different lens than the Ruler's. The Ruler reads "what does my domain need now?". The Foreman reads "where in the execution stack am I, what's the next correct move?". The execution stack snapshot is built fresh per wakeup.
One per active execution record in the stack. Each frame. Depth, ruler nodeId, current step index, step statuses, cancellable flag, mid execution flag. Done frames collapse to one liners. Active frames render full step lists.
For sub Ruler frames. The lineage derived parent step and sibling state. The Foreman sees what the parent expected and how its other branches are doing.
Derived list of what's holding the stack. Each entry has a requiredAction hint pointing at the right tool.
Non prescriptive suggestions the renderer surfaces. The Foreman reads them as advisory, not commands.
Per frame current step index captures. Read by foreman-resume-frame to know where to reenter when resuming.
When the Foreman freezes a record, the status it writes carries semantic weight. Readers downstream (dashboards, court hooks, reputation accounting) discriminate on it.
Distinct hooks fire per terminal status. governing:executionCompleted, governing:executionFailed, governing:executionCancelled, and others. So downstream consumers can discriminate cleanly without inspecting a status field.
User starts a 3 deep build. Mid flight, says "pause." Here's what happens.
The user's "pause" arrives at the Ruler. Snapshot shows execution running. Ruler calls governing-route-to-foreman with a wakeup payload.
Foreman's snapshot renders 3 frames deep. Active step on each frame, sibling statuses, in flight branches. Foreman recognizes a pause request and picks foreman-pause-frame on the root record.
Status flips to paused. pausedAtStepIndex records position. The marker is persistent. Survives session refresh.
The swarm queue's halt check reads the marker before pulling the next branch. Sees paused. Halts. In flight controllers are aborted.
Foreman's tool returns to the Ruler with a summary. "Paused at step 2 of root record. Resume with 'continue'." The Ruler synthesizes for the user. Next session, resume reenters at the right frame.
One Foreman LLM call. The pause is structural. The programmatic halt check picks up the marker. Resume reenters at the right position because the anchor was written.
Same Flappy Bird Battle Royale, mid build at depth 2. The entities sub Ruler dispatched its Worker. The contract validator just detected a violation. This wakeup shows the Foreman doing what makes it distinctive. Reading state across contracts, retries, and error category, then picking a tool that fits.
Worker emitted gameStateChanged, but the ratified contract is gameStateChange. Validator marks the branch failed with reason "name mismatch on event:gameStateChange." Lifecycle event branchFailed wakes the Foreman.
Snapshot shows 2 frames. The entities frame at depth 2, mid branch with one failure. The Foreman reads the contract that was violated, the branch's retry budget (2 of 3 remaining), and the error category. Name mismatch is typically transient. Worker drifted on a literal name; a retry with the failure reason in context usually fixes it.
Plain prose, one sentence. "Branch failed on a name mismatch. Recoverable. Retrying with the contract surfaced." Then call foreman-retry-branch with the failure reason, the violated contract name, and the retry budget update.
The branch's Ruler turn spawns again with the failure reason in context. Worker reads the contract carefully this time, emits gameStateChange, validator passes. Branch flips to done. The swarm queue advances the rollup.
All three sub Rulers settle. Lifecycle event swarmCompleted wakes the Foreman a second time. Snapshot shows all branches done, no failures pending. The Foreman calls foreman-freeze-record with status completed. The execution record reaches its terminal state.
Two Foreman LLM calls. Two different kinds of judgment. Both needed. The first wakeup judged recoverability. The second judged terminal status across the whole swarm. Routine forward motion in between, the queue advancing branches as they settled, stayed programmatic.
The fore man stands at the front of the work. They don't do the work. They watch how it flows, decide when something needs intervention, hold frame discipline across the whole stack. When the work settles, the foreman steps back. When it stutters, the foreman judges.