# AI Workflow Architecture Audit

## Summary

The AI workflow system has moved from ad hoc chunk prompts into a credible file-backed engineering workflow. It now has persistent chunk lifecycle folders, Developer and QA roles, a Definition of Done, QA gates, pass history, orchestration guidance, requirements intake/review/chunk-planning roles, and a Telegram bridge that can report workflow state and hand prompts to a configured Codex tmux session.

The system is ready for continued hardening, but not for substantially more autonomy yet. The main gap is that the workflow is documented as conventions and shell helpers, not as a single explicit state machine with enforceable transitions. Telegram reports and prompt generation are useful, but they derive state from markdown sections and can still diverge from the intended Orchestrator model when sections are stale, multiple active chunks exist, or runtime validation requires human permission.

The next work should focus on state consistency, prompt synthesis rules, requirements quality gates, and read-only analysis roles before adding product features or stronger automation.

## Current Strengths

- The active app architecture is clearly documented in `AGENTS.md`, including NestJS + Prisma + GraphQL code-first and Angular + Apollo + GraphQL Code Generator.
- Chunk lifecycle folders and helper scripts make chunk creation, activation, completion, and validation repeatable.
- `ai/standards/done.md`, `ai/standards/qa-gates.md`, and `ai/standards/iteration-policy.md` prevent treating validation as the only completion signal.
- Developer and QA roles now separate implementation from approval.
- Pass history gives repeated Developer/QA cycles a chronological audit trail.
- Requirements Intake, Requirements Review, and Chunk Planner roles establish a pre-implementation workflow for rough or broad ideas.
- Telegram tooling has matured from terminal mirroring into a workflow notification, report, decision, prompt generation, and tmux handoff layer.
- Commands are mostly allowlisted and state-derived, which limits arbitrary shell execution risk.
- Validation has a standard full command through `ai/commands/validate.sh`.

## Current Weaknesses

- Workflow state is still inferred from markdown text sections rather than a normalized state file or strict metadata model.
- `## QA Review` and `## Pass History` can disagree; current guidance says how to interpret disagreement, but no helper enforces consistency.
- The Orchestrator role owns completion decisions, but there is no command that checks all Definition of Done conditions before allowing completion.
- Requirements quality gates exist in prose, but there is no checklist or helper equivalent to chunk QA gates.
- Prompt generation logic is embedded inside Telegram shell code instead of a reusable prompt synthesis layer.
- Several roles repeat related guidance about validation, cleanup, scope, and pass history. Duplication increases drift risk.
- Telegram and terminal workflows can diverge because Telegram state depends on local `.tmp`, tmux availability, active chunk count, and the runtime location of the bridge.
- The system lacks a dedicated read-only repo-analysis role for discovering current architecture and risk before solution design.
- The system lacks a solution-architect role for translating approved requirements into architecture decisions before chunk planning.
- Manual intervention gates are documented, but not represented as explicit workflow states that Telegram and Orchestrator helpers can report uniformly.
- Prompt handoff can submit to Codex via tmux, but prompt generation does not yet have a central policy for context size, source priority, stale review handling, or redaction beyond current fixed inputs.

## Role Ownership Assessment

Current role ownership is mostly coherent:

- Requirements Intake owns turning rough ideas into user-centered requirements drafts.
- Requirements Review owns deciding whether requirements are ready for chunk planning.
- Chunk Planner owns converting approved requirements into ordered implementation chunks.
- Orchestrator owns planning, iteration, manual intervention, and completion decisions.
- Developer owns scoped implementation and current Execution Notes.
- QA owns review, validation, current QA Review, and QA pass history.

Ownership gaps:

- Repo discovery is currently performed by whichever role needs it. This is acceptable for small chunks, but risky for larger product features because Developer or Orchestrator may blend analysis with design or implementation.
- Architecture decisions are split between Requirements Review, Chunk Planner, and Orchestrator. A lightweight Solution Architect role would help when requirements imply data model, auth, integration, or cross-layer design decisions.
- Prompt generation is owned by Telegram implementation, not by a reusable role or standard. A Prompt Synthesizer role would let Telegram, Orchestrator, and manual workflows use the same prompt construction rules.

Roles should remain narrow. Additional roles should not replace Orchestrator, Developer, or QA. They should prepare better inputs for those roles.

## Requirements Workflow Assessment

The requirements workflow now supports rough idea intake, user-perspective-first clarification, functional and non-functional requirement refinement, review with PASS/BLOCKED outcomes, and chunk planning from approved requirements.

The current format is strong because it requires:

- Raw idea.
- User perspective.
- User workflows.
- Functional and non-functional requirements.
- Data/model implications.
- Permissions/auth implications.
- UI/UX implications.
- Out-of-scope boundaries.
- Assumptions and open questions.
- Acceptance criteria.
- Runtime smoke expectations.
- Risks.
- Requirements review.
- Chunk plan.
- Pass history.

Gaps:

- There is no requirements helper script for creating, activating, approving, or completing requirement files.
- There is no explicit requirements review checklist comparable to `ai/standards/qa-gates.md`.
- Requirements PASS is not mechanically tied to movement into `ai/requirements/approved`.
- Approved requirements are not yet linked to generated chunks in a structured way beyond prose guidance.
- Requirements pass history is compatible with chunk pass history, but no helper validates numbering, latest verdict, or stale review state.

## Chunk Workflow Assessment

The chunk workflow is the most mature part of the system. It has naming conventions, lifecycle folders, metadata, helper scripts, role files, Definition of Done, QA gates, pass history, and archive behavior.

Gaps:

- The active folder policy says exactly one chunk per active implementation thread, but helpers can still be used in environments where old active files linger or multiple active chunks appear.
- `complete-chunk.sh` moves files safely, but it does not validate QA PASS, current notes, pass history consistency, or Definition of Done compliance before completion.
- `orchestrator-next.sh` gives broad recommendations, but does not parse QA verdict, pass history, or iteration count as deeply as Telegram does.
- There is no preflight helper for "is this chunk ready for QA?" or "is this chunk ready to complete?"
- Completed chunk immutability is documented but not enforced beyond convention.

## Developer / QA Pass History Assessment

Pass history solves the earlier stale-review problem by separating current summaries from chronological audit history.

Current source-of-truth model:

- `## Execution Notes`: current Developer summary.
- `## QA Review`: current QA verdict summary.
- `## Pass History`: chronological Developer/QA record.

This is workable, but still fragile because it depends on markdown discipline. The main risks are:

- Developer may update Execution Notes but forget the matching Developer Pass entry.
- QA may append a QA pass but leave a stale current QA Review.
- A previous PASS can remain visible after new Developer changes unless it is clearly superseded.
- Telegram and Orchestrator helpers may derive different next actions if one parser reads QA Review and another reads latest pass history.

Recommended direction: add a read-only state-check helper that reports inconsistencies and a completion gate helper that refuses to call a chunk ready when current QA Review and latest pass state disagree.

## Orchestrator Assessment

The Orchestrator role correctly owns planning, iteration, manual intervention, and completion decisions. It also now routes larger or unclear work through requirements intake, requirements review, and chunk planning.

Gaps:

- Orchestrator instructions are still largely prose. There is no canonical state transition table for Draft -> Requirements Active -> Requirements Approved -> Chunk Draft -> Chunk Active -> Developer Pass -> QA Pass -> Complete -> Commit Ready.
- Manual intervention conditions are documented, but not attached to machine-readable state.
- The retry limit is documented, but helper scripts do not enforce it.
- The Orchestrator has no dedicated "completion gate" command to check DoD, QA PASS, validation, cleanup, pass history, and git status before archiving.
- The Orchestrator can generate focused Developer prompts manually, and Telegram can generate them, but no central Prompt Synthesizer policy governs both paths.

## Telegram Workflow Assessment

Telegram is useful as a mobile workflow layer. It provides:

- Status and diff reports.
- Active/backlog chunk listing.
- Workflow status, last report, and next action.
- Execution Notes and QA Review retrieval.
- QA and Developer prompt generation.
- Stored prompt inspection.
- Confirmation-based prompt handoff to tmux/Codex.
- Confirmation-based mutating commands.
- Tap-friendly commands for mobile.

Key risks:

- Telegram state can diverge from terminal state if `TELEGRAM_STATE_DIR`, `TELEGRAM_REPO_ROOT`, or tmux target differs across macOS host, devcontainer, and Codex sessions.
- Prompt handoff depends on tmux availability and permissions; this is known to fail in some containers.
- Telegram report logic parses markdown with shell tools, so section headings and formatting must remain stable.
- Telegram does not own orchestration decisions, but its commands can appear authoritative. This could confuse users unless messages keep saying whether the next step is a recommendation or an approved action.
- Confirmation tokens are local state. A bridge restart, state-dir mismatch, or copied old token can cause expected confirmation failures.

Telegram should remain an intervention and prompt handoff layer until the underlying workflow state model is more explicit.

## Prompt Generation / Prompt Handoff Assessment

Generated QA and Developer prompts are derived from fixed repository state and avoid arbitrary file reads. This is a strong safety boundary.

Current prompt inputs:

- Active chunk path.
- Definition of Done.
- QA gates.
- Execution Notes.
- QA Review.
- Latest Pass History entry.
- Git status.
- Diff stat.
- Current QA blockers for Developer prompts.
- Workflow next action.

Gaps:

- Prompt synthesis rules live in `ai/tools/telegram/lib.sh`, so non-Telegram workflows cannot reuse the same behavior.
- There is no prompt-size budget or source-priority policy.
- There is no stale-review policy beyond including current and latest sections.
- There is no redaction policy for future prompts that may include logs, environment names, database URLs, or operational traces.
- There is no role that owns prompt correctness independent of Telegram transport.

Recommended direction: add `ai/roles/prompt-synthesizer.md`, `ai/standards/prompt-synthesis.md`, and a read-only prompt-generation command that Telegram can call.

## Safety And Manual Intervention Assessment

The workflow has good safety intent:

- Developer does not self-approve.
- QA gates require runtime smoke decisions.
- Orchestrator owns completion.
- Retry limits and stop conditions are documented.
- Telegram mutating commands require confirmation.
- Arbitrary Telegram shell execution is not allowed.
- `.env` and `.tmp` should not be staged.

Remaining safety gaps:

- No helper enforces Definition of Done before completion.
- No helper validates stale QA Review after Developer changes.
- No helper validates requirements approval before chunk planning.
- No helper blocks Orchestrator from continuing beyond max iteration count.
- Runtime smoke inability is documented as a manual intervention condition, but not represented as a first-class state.
- The current system trusts role prompts and human discipline more than state checks.

The system should add lightweight state validation before adding more autonomous orchestration.

## Recommended Role Architecture

Keep the core roles:

- Requirements Intake
  - Owns rough idea normalization and user-centered requirements drafts.
  - Must not plan implementation chunks before requirements are reviewable.
  - Needs `ai/roles/requirements-intake.md`, `ai/tasks/requirements-intake-template.md`, and `ai/standards/requirements.md`.
- Requirements Review
  - Owns PASS/BLOCKED review for requirements.
  - Must not implement or hide unresolved product decisions in assumptions.
  - Needs `ai/roles/requirements-review.md`, `ai/tasks/requirements-review-template.md`, and requirements quality gates.
- Chunk Planner
  - Owns converting approved requirements into ordered chunk drafts.
  - Must not implement code.
  - Needs `ai/roles/chunk-planner.md`, `ai/tasks/chunk-plan-template.md`, and chunk conventions.
- Orchestrator
  - Owns workflow routing, iteration, manual intervention, and completion decisions.
  - Must not implement by default and must not skip QA approval.
  - Needs `ai/roles/orchestrator.md`, `ai/standards/orchestration-workflow.md`, and state-check helpers.
- Developer
  - Owns scoped implementation and current Execution Notes.
  - Must not self-approve DONE or overwrite QA history.
  - Needs `ai/roles/developer.md`, conventions, and feature chunk templates.
- QA
  - Owns validation, risk review, QA Review, and QA pass history.
  - Must not implement fixes unless explicitly asked.
  - Needs `ai/roles/qa.md`, QA review templates, DoD, and QA gates.

Add only these minimal supporting roles:

- Repo Analysis
  - Owns read-only repository discovery for larger work.
  - Must not design the solution or edit files.
  - Needs a role file and a repo-analysis report template.
- Solution Architect
  - Owns architecture options and recommended approach for approved requirements when cross-layer design is non-trivial.
  - Must not implement chunks or approve QA.
  - Needs a role file and architecture decision template.
- Prompt Synthesizer
  - Owns safe prompt construction from approved state sources.
  - Must not execute prompts, mutate files, or approve results.
  - Needs a role file, prompt synthesis standard, and reusable prompt templates.

Do not add separate roles for tasks already covered by Developer or QA unless a future workflow proves the boundary is necessary.

## Recommended Next Chunks

1. `chunk-000028-workflow-state-checks`
   - Add read-only helpers that inspect active chunk state, QA verdict, latest pass, iteration count, stale review risk, and DoD readiness.
   - Validation: shell syntax checks plus sample state-report commands.

2. `chunk-000029-requirements-lifecycle-helpers`
   - Add safe Bash helpers for creating, activating, approving, and completing requirements files.
   - Include checks for PASS before moving to approved.

3. `chunk-000030-prompt-synthesis-standard`
   - Add Prompt Synthesizer role, prompt synthesis standard, and reusable prompt templates.
   - Keep Telegram behavior unchanged or only retarget it to the new reusable command if explicitly scoped.

4. `chunk-000031-repo-analysis-role`
   - Add a read-only Repo Analysis role and report template for larger feature preparation.
   - Define what it may inspect and what it must not decide.

5. `chunk-000032-solution-architect-role`
   - Add a Solution Architect role and architecture decision template for approved requirements that affect multiple layers.
   - Define handoff from Requirements Review to Chunk Planner.

6. `chunk-000033-orchestrator-completion-gate`
   - Add a safe helper that checks Definition of Done readiness before `complete-chunk.sh`.
   - It should report blockers rather than mutating state.

7. `chunk-000034-telegram-state-consistency-report`
   - Add Telegram-facing read-only diagnostics for repo root, state dir, tmux target, active chunk count, latest pass, and completion readiness.
   - Keep it informational and non-mutating.

8. `chunk-000035-requirements-quality-gates`
   - Add a requirements gates standard similar to QA gates.
   - Update Requirements Review templates to use explicit gates and PASS/BLOCKED criteria.

## Risks If Not Fixed

- Automation may proceed from stale QA PASS after new Developer changes.
- Orchestrator may complete chunks based on incomplete notes or inconsistent pass history.
- Telegram may show a different next action than terminal helpers.
- Larger product features may start implementation before requirements are complete.
- Prompt generation may become duplicated across Telegram, Orchestrator, and manual workflows.
- Mac host, devcontainer, and tmux session differences may cause confusing confirmation or prompt handoff failures.
- Manual intervention may happen too late because stop conditions are documented but not checked.

## Suggested Immediate Next Step

Create `chunk-000028-workflow-state-checks` to add read-only workflow state validation before adding more autonomy. This should give Orchestrator, Telegram, Developer, and QA one shared answer for active chunk state, stale review risk, iteration count, and completion readiness.