# Research-Gated Mechanisms

Use this standard when a chunk changes a common complex mechanism where a
small explicit design checkpoint prevents symptom-by-symptom patching.

## Applies To

- connection/reconnect
- state machines
- service recovery
- queues
- scheduling/parallelism
- retries/backoff/circuit breakers
- validation/test runners
- watchdogs
- journal consumers
- auth/session/device registration
- UI live-state coordination
- notification/toast root-cause state

## Required Checkpoint

Before implementation, record a short `## Research/Design Checkpoint` with:

- existing repository architecture
- common pattern or relevant resource
- chosen simple state model or control model
- why not a simpler implementation
- test oracle
- failure modes
- explicit non-goals

Use primary or official sources when current external behavior matters. For
internal Runtime mechanisms, repository/runtime state is the primary source.
For well-known mechanism classes, research must cover the mechanism pattern, not
only the local library API. For example, connection/recovery work should review
state-machine and reconnect/recovery patterns, root-cause precedence,
independent health domains, canonical recovery probes, timeout/fallback rules,
and how to avoid mixing unrelated signals such as socket transport health,
backend API health, frontend server reachability, and Runtime/service status.

## Fast-Learn / Fast-Fail

Do not narrow a gap merely because it is larger than the first patch if later
chunks depend on that gap functioning correctly.

When a critical-path gap is found:

- research or inspect enough to know whether a clear feasible success path
  exists
- close the gap now when it affects a bounded mechanism, state model, service,
  interface, or subsystem and does not require broad architecture replacement
- stop only when research/inspection cannot determine a clear path forward,
  no plausible solution exists, success criteria remain ambiguous, authority
  policy conflicts, unsafe global settings are required, product semantics are
  unclear, or a needed dependency cannot be verified locally
- if stopped, record the exact stop condition and the smallest next decision

Localized redesign is allowed in autonomous mode. Changing a state mechanism,
redesigning one subsystem, or changing interfaces with bounded impact is not by
itself a stop condition.

## State And Recovery Mechanisms

When a chunk touches live-state, reconnect, restart, recovery, or toast/root
cause behavior, the checkpoint must explicitly answer:

- what independent state dimensions exist
- which signal is authoritative for each dimension
- which events can only invalidate or hint
- which probe verifies recovery for each dimension
- which root cause wins when multiple symptoms appear
- which states are allowed to clear each other
- which stale snapshots or delayed events must be ignored

This prevents treating symptoms from different layers as one interchangeable
failure state.

## QA Gate

QA must query:

```sh
node ai/runtime/dist/cli.js qa required-evidence --chunk <chunk> --json
node ai/runtime/dist/cli.js qa evidence-check --chunk <chunk> --json
```

If the resolver requires `research_checkpoint`, QA must block when the
checkpoint is absent unless policy explicitly allows an operator override with
a risk statement.
