Reference

Mechanisms

The parts of Agents Never Sleep that matter when you need to trust an unattended run. Each is described at the level of what it does and why — mechanism, not marketing.

ASK / PARK / HALT — the autonomy contract

An unattended run cannot wait for an answer, so ANS forbids ASK while unattended and routes every decision into one of three outcomes:

  • PROCEED — low blast-radius, reversible work: assume a sensible default and do it.
  • PARK — a decision a machine should not make alone (ambiguous requirement, high blast-radius): the ticket is set aside, the reason recorded, and the run continues with other work. The human sees every PARK in the run report.
  • HALT — continuing could do harm that a revert cannot undo (no safety net, destructive-and-irreversible): the whole run stops cleanly rather than gamble.

The classifier is deterministic: the same ticket always routes the same way, so autonomy is bounded by a rule, not by a model's mood.

Deterministic gates — done means the test passed

Every ticket carries a gate: a concrete, repeatable check (its tests) that decides pass or fail without a model's opinion. Work that does not pass the gate is reverted to the last known-green state, classified against a failure taxonomy (so the report says why, not just "failed"), and surfaced for the human. Nothing is marked done on a model's say-so; the gate is the only thing that grants "done". Because reverts restore green, a bad attempt costs time, not correctness.

F5 — consensus-assisted PARK resolution

ANS's autonomy contract parks a decision a machine shouldn't make alone rather than gambling on it. F5 is the one narrow place a parked decision can be re-opened: when the only thing blocking a ticket is an ambiguous requirement (we don't know what to build), the agent may run a grounded, multi-model consensus to try to disambiguate it from cited evidence.

It is deliberately constrained by four properties — remove any one and it becomes the failure it exists to prevent:

  • Downgrade-only — a consensus can only turn a PARK into a PROCEED on strong, grounded evidence; it can never push a decision toward something riskier.
  • Evidence-gated — if the synthesis itself hedges ("could be either", "unclear"), the ticket stays parked.
  • One-shot — at most once per ticket.
  • Narrowly eligible — only a locally reversible, file-scoped requirement-meaning ambiguity. Never credentials, dependencies, blast-radius, money, or a hard stop — those are facts and authority, not interpretation, and no number of correlated models can supply them.

The decision stays deterministic even though the evidence-gathering is a model call: the harness owns the eligibility check and the downgrade-only gate; the consensus only supplies evidence.

Council review — advisory, routed from the diff

For finished work, ANS can convene a cross-model council. The review tier is computed from the actual diff, not the ticket text — a "rename a field" that touches auth still routes HEAVY. The council is advisory: it does not block the run. Instead it decides trust: a clean pass lets the work be auto-trusted; concerns, an error, or a skipped review on a high-risk change mark it "needs daylight review" so a human looks before it is relied upon. Deterministic gates remain the only hard gate on execution; the council governs trust, not flow.

Heartbeat watchdog + process reaping

A long unattended run can freeze without an error — a stalled call, a wait that never returns. The watchdog watches a heartbeat and, when it goes stale past a threshold, restarts the frozen run rather than leaving it hung until morning.

Separately, a run can leak child processes (helper tools, servers it spawned). Reaping cleans up a run's OWN child tree by parent-chain lineage — rooted at the run's process, so it can only reach its own subtree — so leaked processes don't pile up and starve the machine. A hard-killed watchdog cannot self-reap; that residual case is reduced, not eliminated (stated plainly).

Revert-surviving scratchpad + do-not-repeat digest

When a gate fails, the code is reverted to green — but the reasoning that got there should not be thrown away with it. The scratchpad stores a ticket's redacted progress notes outside the reverted tree, so they survive the rollback. A compact "do-not-repeat" digest records the dead ends already tried this run. On resume, both are handed back to the agent so it continues from where thinking left off rather than re-deriving ruled-out approaches. It is opt-in; with it off, the run behaves exactly as before.