Mo Khan's Blog: April 2026

One planned. One executed entirely autonmously as lead AI agent co-ordinator (managing 3-4 parallel agents). Human watched each other and steered. Here's what happened.

The Problem Nobody Warns You About Your Fast-Built MVP

You build an AI feature on a shoestring budget leveraging free Azure services, like an SWA app (that I later migrated to a standalone service - future blog post). It works in dev. It works in staging. Then it hits production, and the platform underneath it starts fighting you.

Our AI chat feature — a financial intelligence analyst chatbot built on Claude — was deployed inside Azure Functions behind a Static Web App proxy. On paper, it worked. In practice:

45-second hard timeout from the SWA proxy killed any complex analysis
HTTP 500 errors appeared randomly under load
No reconnection — if your browser tab hiccupped, your 30-second AI response was gone
Request-bound execution — the AI's "life" was the HTTP request. When the request died, so did the AI

The feature was unreliable enough that users had learned to retry. That's a product failure.

I decided to rip it out of Azure Functions entirely and move it to a standalone always-on service. Not a rewrite — a port. Same AI logic, new runtime.

The question was: could I plan this migration in a way that an autonomous AI agent could execute it end-to-end?

The Cast

This project involved three actors:

Me — the architect and orchestrator. I wrote the runbook, designed the execution plan, reviewed every decision, and verified the final result. A taste of the future role of an SDM (Software Development Manager)

Claude (Opus) — my planning partner and shadow observer. Claude reviewed the runbook across three iterations, helped me design the parallel execution strategy, drafted the system prompt for the executor, and then spent the entire execution day "snooping" on progress — reading the codebase in real-time and reporting back.

Codex — the autonomous executor and master agent coordinator. Codex received the runbook and a carefully crafted system prompt, then worked for ~9.5 hours straight spawning multiple agents on demand, producing 21 PRs, ~19,000 lines of code, provisioning Azure infrastructure, and deploying to production. Codex effectively did the job of a senior engineer working with 3-4 engineers

┌─────────────┐     reviews/plans       ┌─────────────┐
│             │◄───────────────────────►│             │
│   Human     │     system prompt       │   Claude    │
│  Architect  │────────────────────────►│  (Opus)     │
│             │     snoop reports       │  Reviewer   │
│             │◄────────────────────────│    Planner  │
└──────┬──────┘                         └─────┬───────┘
       │                                      │
       │  runbook + prompt                    │ reads codebase
       │                                      │ (read-only)
       ▼                                      ▼
┌─────────────────────────────────────────────────────┐
│                                                     │
│                    Codex                            │
│               Autonomous Executor                   │
│                                                     │
│   Thread A ─── Thread B ─── Thread C ─── Thread D   │
│  (critical)    (infra)      (proxy)    (diagnostics)│
│                                                     │
│              21 PRs │ ~19K lines │ 9.5 hours        │
└─────────────────────────────────────────────────────┘

Act 1: The Runbook

Why a Runbook, Not a Prompt

Most people using AI coding agents write prompts. I wrote a runbook — a 2,100-line execution plan that reads more like an engineering specification than a chat message.

Why? Because autonomous agents fail in predictable ways:

Scope drift — they start "improving" things you didn't ask for
Missing context — they make reasonable-sounding decisions based on wrong assumptions
No gates — they charge ahead past failure points without stopping
Sequential thinking — they do everything in order even when work can be parallelized

The runbook addressed all four:

Explicit scope fences: "Port, do not rewrite. This is a runtime extraction, not a feature redesign."
Complete architecture context: every Azure resource name, every file path, every dependency version, every API contract
Stop gates at every phase: "Stop and escalate if any of the following is true..."
Parallel thread definitions: five named threads with explicit entry gates and exit artifacts

The runbook went through three review rounds with Claude before I was satisfied. Each round caught real issues:

Round 1 identified that the parallel thread definitions were missing concrete entry/exit gates — they said what work to do, but not what must be true before starting or what artifact proves completion.

Round 2 caught that the integration-branch merge strategy was underspecified. Without explicit merge ordering, parallel agents would create conflicting merges.

Round 3 hardened the execution contract: what the agent can do without permission, what requires stopping and reporting, and how to handle the managed-identity-vs-connection-string decision for Cosmos DB.

The Migration Architecture

The target was clean:

Before:
  Browser → SWA Proxy → Azure Functions (aiChat.js monolith)
                         ↳ 3,356 lines, one file
                         ↳ 45-second timeout
                         ↳ request-bound execution

After:
  Browser → App Service (Express proxy)
              ├── /api/ai/* → Standalone AI Service (Fastify)
              │                ↳ Durable runs (survive disconnects)
              │                ↳ True SSE streaming
              │                ↳ Reconnect by runId
              │                ↳ No hard timeouts
              └── /api/*    → Azure Functions (unchanged)

The AI logic itself — prompts, tools, model selection, orchestration — would be extracted into a shared package (../ai-core) and reused by the new runtime. Port, don't rewrite.

The Architecture: Before and After

To appreciate what changed, you need to see what we were working with — and what we built to replace it.

Before: The Monolith

The entire AI feature lived in one Azure Functions HTTP handler. One file. 3,356 lines. Everything from authentication to prompt construction to tool execution to streaming — all coupled to the Azure Functions request lifecycle.

┌─────────────────────────────────────────────────────────────────────┐
│                        BROWSER                                      │
│   AIChatPanel.jsx ──POST /api/ai/chat──►                            │
│                                         │                           │
│   ◄──── raw SSE text stream ────────────┘                           │
│   (no reconnect, no runId, no resume)                               │
└─────────────────────────┬───────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                  SWA / App Service Proxy                            │
│                  ┌──────────────────┐                               │
│                  │  45-second hard  │                               │
│                  │  timeout on ALL  │                               │
│                  │  proxied requests│                               │
│                  └──────────────────┘                               │
│                  /api/* ──► Azure Functions                         │
└─────────────────────────┬───────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│              Azure Functions  (app-name-backend-api)                │
│                                                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                  aiChat.js  (3,356 lines)                      │ │
│  │                                                                │ │
│  │  ┌──────────┐  ┌───────────────┐  ┌─────────────────────────┐  │ │
│  │  │  Auth    │  │  System       │  │  Anthropic SDK          │  │ │
│  │  │ Bootstrap│  │  Prompt       │  │  (buffered call)        │  │ │
│  │  │ Validate │  │  Builder      │  │                         │  │ │
│  │  └────┬─────┘  └───────┬───────┘  │  ┌───────────────────┐  │  │ │
│  │       │                │          │  │  Tool Dispatch    │  │  │ │
│  │       ▼                ▼          │  │  Loop             │  │  │ │
│  │  ┌──────────┐  ┌───────────────┐  │  │                   │  │  │ │
│  │  │  Feature │  │  Question     │  │  │  Claude response  │  │  │ │
│  │  │  Toggle  │  │  Classifier   │  │  │  ──► tool_use?    │  │  │ │
│  │  │  Check   │  │  Telemetry    │  │  │  ──► execute      │  │  │ │
│  │  └──────────┘  └───────────────┘  │  │  ──► feed back    │  │  │ │
│  │                                   │  │  ──► loop         │  │  │ │
│  │  ┌──────────┐  ┌───────────────┐  │  └───────────────────┘  │  │ │
│  │  │  Model   │  │  Tool Profile │  │                         │  │ │
│  │  │ Selection│  │  Resolution   │  │  request dies = AI dies │  │ │
│  │  │ Policy   │  │   Routing     │  │  no run state persisted │  │ │
│  │  └──────────┘  └───────────────┘  └─────────────────────────┘  │ │
│  │                                                                │ │
│  │  ┌─────────────────────────────────────────────────────────┐   │ │
│  │  │              Tool Executor (22 tools)                   │   │ │
│  │  │  query_products │ query_customers │ query_actuals       │   │ │
│  │  │  run_financial_model │ run_services_model               │   │ │
│  │  │  compare_fiscal_years │ get_margin_analysis             │   │ │
│  │  │  get_customer_concentration │ generate_chart            │   │ │
│  │  │  run_what_if_simulation │ ...12 more                    │   │ │
│  │  └─────────────────────────┬───────────────────────────────┘   │ │
│  └────────────────────────────┼───────────────────────────────────┘ │
│                               │                                     │
│                               ▼                                     │
│                    ┌─────────────────────┐                          │
│                    │     Cosmos DB       │                          │
│                    │  (products, budgets,│                          │
│                    │   actuals, configs, │                          │
│                    │   users, groups)    │                          │
│                    └─────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────┘

Problems:
  ✗ Request-bound: browser disconnect kills the AI mid-thought
  ✗ No run persistence: nothing survives a dropped connection
  ✗ Buffered streaming: text arrives in chunks after provider finishes
  ✗ 45s proxy timeout: complex multi-tool analyses get killed
  ✗ No reconnect: lose your tab, lose your answer
  ✗ Monolith coupling: auth, prompt, tools, streaming all in one file

After: The Decomposed Architecture

The new architecture separates concerns across three layers: a proxy that routes traffic, a standalone AI runtime that manages durable runs, and an extracted core package that holds all the AI logic.

┌─────────────────────────────────────────────────────────────────────┐
│                        BROWSER                                      │
│                                                                     │
│   AIChatPanel.jsx                                                   │
│   │                                                                 │
│   ├── POST /api/ai/chat ──────────────► start run, get runId        │
│   │   ◄── SSE stream (attach immediately)                           │
│   │                                                                 │
│   ├── GET /api/ai/runs/:runId/stream ─► reconnect to live run       │
│   │   ◄── SSE replay (snapshot) + live deltas                       │
│   │                                                                 │
│   ├── GET /api/ai/runs/:runId ────────► poll status; final result   │
│   │                                                                 │
│   ├── POST /api/ai/runs/:runId/cancel ► user-initiated cancel       │
│   │                                                                 │
│   └── POST /api/ai/chat/feedback ─────► thumbs up/down on run       │
│                                                                     │
│   Client persists runId in React state                              │
│   Disconnect ≠ cancel (run continues server-side)                   │
└─────────────────────────┬───────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│           Web App Service  (Express proxy — server.js)              │
│                                                                     │
│   ┌─────────────────┐        ┌──────────────────────┐               │
│   │  /api/ai/*      │        │  /api/*              │               │
│   │  ──► AI Service │        │  ──► Azure Functions │               │
│   │  (dedicated)    │        │  (unchanged)         │               │
│   └────────┬────────┘        └──────────┬───────────┘               │
│            │                            │                           │
│   x-internal-proxy-secret      x-internal-proxy-secret              │
│   x-ms-client-principal        x-ms-client-principal                │
└────────────┬────────────────────────────┬───────────────────────────┘
             │                            │
             ▼                            ▼
┌────────────────────────────┐  ┌─────────────────────────┐
│  AI App Service (Fastify)  │  │  Azure Functions        │
│                            │  │  (non-AI APIs)          │
│                            │  │  /api/me                │
│  Dedicated B1 plan         │  │  /api/budgets           │
│  Always-on process         │  │  /api/products          │
│  No hard timeouts          │  │  /api/definitions       │
│                            │  │  etc.                   │
│  ┌──────────────────────┐  │  └─────────────────────────┘
│  │    Route Layer       │  │
│  │  /api/ai/chat        │  │
│  │  /api/ai/runs/:id    │  │
│  │  /api/ai/runs/:id/   │  │
│  │    stream            │  │
│  │  /api/ai/runs/:id/   │  │
│  │    cancel            │  │
│  │  /api/ai/chat/       │  │
│  │    feedback          │  │
│  │  /healthz  /readyz   │  │
│  └──────────┬───────────┘  │
│             │              │
│             ▼              │
│  ┌──────────────────────┐  │
│  │   Runtime Manager    │  │
│  │                      │  │
│  │  Admission Control:  │  │
│  │   2 active/user      │  │
│  │   2 queued/user      │  │
│  │   4 active global    │  │
│  │   8 queued global    │  │
│  │                      │  │
│  │  Run State Machine:  │  │
│  │   queued ──► running │  │
│  │     │         │  │   │  │
│  │     │         ▼  ▼   │  │
│  │     │    completed   │  │
│  │     │    failed      │  │
│  │     ▼    cancelled   │  │
│  │  cancelled_by_admin  │  │
│  │                      │  │
│  │  Watchdog: 10 min    │  │
│  │  Shutdown: 15s grace │  │
│  └──────────┬───────────┘  │
│             │              │
│             ▼              │
│  ┌──────────────────────┐  │
│  │   SSE Event Stream   │  │
│  │                      │  │
│  │  Contract (Phase 5): │  │
│  │   ready ──► status   │  │
│  │   ──► text(delta)    │  │
│  │   ──► tool_start     │  │
│  │   ──► tool_end       │  │
│  │   ──► text(delta)    │  │
│  │   ──► chart          │  │
│  │   ──► follow_ups     │  │
│  │   ──► done           │  │
│  │                      │  │
│  │  Reconnect mode:     │  │
│  │   text(snapshot)     │  │
│  │   + live deltas      │  │
│  └──────────┬───────────┘  │
│             │              │
│             ▼              │
│  ┌──────────────────────┐  │
│  │  Legacy AI Bridge    │  │     ┌──────────────────────────┐
│  │  (adapter pattern)   │──┼────►│        /ai-core          │
│  │                      │  │     │  (workspace package)     │
│  │  Wraps existing AI   │  │     │                          │
│  │  chat logic with     │  │     │  contracts/              │
│  │  new runtime hooks   │  │     │   sseEvents.js           │
│  └──────────────────────┘  │     │   interfaces.js          │
│                            │     │   uiContext.js           │
│                            │     │   chatFeedback.js        │
│                            │     │                          │
│                            │     │  prompts/                │
│                            │     │   systemPrompt.js        │
│                            │     │                          │
│                            │     │  tools/                  │
│                            │     │   definitions.js (22)    │
│                            │     │   anthropic.js           │
│                            │     │                          │
│                            │     │  orchestration/          │
│                            │     │   questionTelemetry.js   │
│                            │     │   modelSelection.js      │
│                            │     │   toolProfiles.js        │
│                            │     │                          │
│                            │     │  domain/                 │
│                            │     │   financialModel.js      │
│                            │     │   servicesModel.js       │
│                            │     └──────────────────────────┘
│                            │
│  ┌──────────────────────┐  │
│  │  Data Access Layer   │  │
│  │  (injected adapters) │  │
│  │                      │  │
│  │  getGroupDoc()       │  │
│  │  hasAiPermission()   │  │
│  │  getUserProfile()    │  │
│  │  canViewBu()         │  │
│  │  probeReadiness()    │  │
│  └──────────┬───────────┘  │
│             │              │
└─────────────┼──────────────┘
              │
      ┌───────┴────────┐
      │                │
      ▼                ▼
┌───────────┐  ┌──────────────────────────────────┐
│ Cosmos DB │  │  Azure Storage                   │
│           │  │                                  │
│ products  │  │                                  │
│ customers │  │  Table:  AiRuns (run metadata)   │
│ actuals   │  │  Queue:  ai-run-dispatch         │
│ configs   │  │  Blob:   ai-run-events           │
│ users     │  │  Blob:   ai-run-snapshots        │
│ groups    │  │  Blob:   ai-run-transcripts      │
│ rbac      │  │                                  │
└───────────┘  └──────────────────────────────────┘

Improvements:
  ✓ Durable runs: AI continues even if browser disconnects
  ✓ Reconnect by runId: pick up where you left off
  ✓ True SSE streaming: tokens arrive as they're generated
  ✓ No proxy timeout: dedicated service, no 45s wall
  ✓ Admission control: queuing, per-user limits, watchdog
  ✓ Graceful shutdown: in-flight runs drain before restart
  ✓ Separated concerns: proxy / runtime / core / storage
  ✓ Dependency injection: testable without Azure

The Run Lifecycle

This is the part that makes the new architecture fundamentally different. A "run" is now a first-class entity that exists independently of any HTTP connection.

  Client                    AI Service                     Storage
    │                          │                              │
    │  POST /api/ai/chat       │                              │
    │─────────────────────────►│                              │
    │                          │  create run (queued)         │
    │                          │─────────────────────────────►│
    │                          │                              │
    │  ◄─ SSE: ready{runId} ── │                              │
    │  ◄─ SSE: status{...}  ── │  transition: running         │
    │                          │─────────────────────────────►│
    │                          │                              │
    │                          │  ┌─────────────────────┐     │
    │  ◄─ SSE: text{delta} ─── │  │  Claude streaming   │     │
    │  ◄─ SSE: text{delta} ─── │  │  response arrives   │     │
    │  ◄─ SSE: text{delta} ─── │  │  token by token     │     │
    │                          │  └─────────────────────┘     │
    │                          │                              │
    │  ◄─ SSE: tool_start ──── │  Claude requests tool_use    │
    │                          │  execute tool (Cosmos query) │
    │  ◄─ SSE: tool_end ────── │  feed result back to Claude  │
    │                          │                              │
    │  ◄─ SSE: text{delta} ─── │  Claude continues response   │
    │  ◄─ SSE: text{delta} ─── │                              │
    │                          │                              │
    │       ╔═══════════════╗  │                              │
    │       ║ DISCONNECT!   ║  │                              │
    │       ║ tab closed    ║  │  run keeps going...          │
    │       ╚═══════════════╝  │  events persisted to blob    │
    │                          │─────────────────────────────►│
    │                          │                              │
    │  (reconnect)             │                              │
    │  GET /runs/:id/stream    │                              │
    │─────────────────────────►│  replay from storage         │
    │                          │◄─────────────────────────────│
    │  ◄─ SSE: text{snapshot}─ │  full text so far            │
    │  ◄─ SSE: text{delta} ─── │  live deltas resume          │
    │                          │                              │
    │  ◄─ SSE: follow_ups ──── │                              │
    │  ◄─ SSE: done ────────── │  transition: completed       │
    │                          │─────────────────────────────►│
    │                          │  archive transcript          │
    │                          │─────────────────────────────►│
    │                          │                              │

The key insight: the run outlives the connection. When you disconnect, the AI doesn't stop thinking. When you reconnect, you get a snapshot of everything that happened while you were away, followed by live deltas. This is what makes AI chat feel like a real product instead of a fragile demo.

Act 2: Designing the Parallel Execution Plan

This is where Claude earned its keep as a planning partner.

A naive sequential execution would have looked like: baseline → scaffold → extract → build runtime → add streaming → update proxy → provision infra → deploy → diagnostics → cutover. That's a 12+ hour critical path with no parallelism.

Instead, we designed five concurrent threads:

Thread	Scope	Can Start When
A (Critical Path)	Baseline → Scaffold → Extract → Runtime → Streaming	Immediately
B (Infrastructure)	Azure provisioning, deploy workflow	Immediately (parallel with A)
C (Integration)	Proxy routing, frontend run-client	After A delivers runtime + SSE contract
D (Diagnostics)	Cost ledger, health monitoring	After B provisions resources
E (Cutover)	Dark deploy, brownout test, go-live	After A+B+C converge

The key insight: infrastructure work (Thread B) has zero code dependencies on the AI logic (Thread A). You can provision Azure resources, set up managed identity, create storage primitives, and write deploy workflows while the extraction work is still happening. This saved hours.

Time ────────────────────────────────────────────────────────────────────────►

Thread A: [Phase 0][Phase 1][Phase 2a][Phase 2b][Phase 3][ Phase 5 ]
Thread B: [Phase 7-infra  ][Phase 7-deploy]
Thread C:                                       [P6-back][P6-front]
Thread D:                                                [Phase 8 ]
Thread E:                                                          [Phase 9]
                                                                      │
                                                              All threads
                                                               converge

The Integration Branch Model

With multiple threads writing code simultaneously, merge strategy matters. We used:

Integration branch (codex/ai-migration-main) as the merge target
Named feature branches per thread (codex/ai-migration-a-phase3, codex/ai-migration-b-deploy, etc.)
Explicit merge order: Phase 1 first → Thread B infra → Thread A checkpoints in order → Thread C after SSE freeze → Thread E last
Master updated only after verified integration checkpoints

This kept master stable throughout the entire migration.

Act 3: The System Prompt

The runbook was the what. The system prompt was the how.

Here's the actual prompt I gave Codex (sensitive values redacted):

## Mission
Execute the AI service migration runbook at
docs/TODO_MigrateAIFeatureToStandaloneAIServince.md.
That document is the single source of truth for scope,
architecture, sequencing, gates, and success criteria.

## Execution Model - Parallel Threads
Spawn parallel agent threads with explicit sequencing:

- Thread A (critical path): Phase 0 → 1 → 2a → 2b → 3 → 5
- Thread B (infra/deploy): Phase 7 infra starts immediately.
  Phase 7 deploy-workflow waits for Phase 1.
- Thread C: Phase 6 backend after Phase 3.
  Phase 6 frontend after Phase 5 SSE contract freeze.
- Thread D: Phase 8 after Phase 7 + runtime telemetry stable.
- Thread E: Phase 9 after A+B+C converge.

Start Thread A and Thread B in parallel immediately.

## First Actions
1. Run required preflight commands (Section 6)
2. Begin Thread A Phase 0: baseline + characterization tests
3. Begin Thread B Phase 7 infra: Azure provisioning
4. Use integration branch codex/ai-migration-main

## Operating Rules
- Follow the runbook literally
- Update Migration Status table at every checkpoint
- Stop on stop gates
- Merge order matters
- One PR per bounded checkpoint
- No rewrites — this is a port
- Never commit or log secrets

## Do NOT Need Permission For
- Creating branches, PRs, running CLI commands
- Provisioning Azure resources, setting GitHub variables/secrets
- Making the connection-string fallback decision if managed
  identity fails

## MUST Stop and Report If
- Any stop gate failure
- Any az or gh command fails due to auth/RBAC
- Any ambiguity where the runbook doesn't specify a clear path
- Any temptation to redesign rather than port

Design Principles Behind the Prompt

Three deliberate choices:

Point at the runbook, don't repeat it. The prompt says "that document is the single source of truth." Duplicating architecture context in the prompt creates drift — two sources that can disagree.
Explicit permission boundaries. The "do NOT need permission" section prevents the agent from stalling on confirmations for actions the runbook already authorizes. The "MUST stop" section catches the failure modes where human judgment is actually needed.
Parallel execution is called out explicitly. Without this, Codex defaults to sequential execution. You have to tell it to spawn concurrent work.

Act 4: The Miss — Worktree Isolation

Here's where I have to be honest about a gap.

The runbook specified parallel threads. The prompt told Codex to spawn them. But neither document specified how to physically isolate the parallel work. We described the logical dependency graph but not the physical isolation model.

When I checked on Codex's early progress, I found it was working sequentially in a single directory, mixing Phase 0 and Phase 7 changes in the same checkout. The parallel execution model existed on paper but not in practice.

This matters because git doesn't handle concurrent modifications to the same working directory gracefully. You need separate worktrees — independent checkouts of the repository where each thread can work without stepping on the others.

I had to inject a mid-session correction:

"Each parallel agent thread should work in its own git worktree with a named branch convention (e.g., codex/ai-migration-a-phase3). The orchestrating agent uses the primary checkout for coordination only."

Codex received this, stashed its in-progress work, split changes into proper branches, created worktrees, and adopted the model. It recovered well, but the gap shouldn't have existed.

Lesson learned: the DAG (dependency graph) is necessary but not sufficient. You need to specify the physical isolation model — worktrees, branch naming, merge-back protocol — not just the logical thread dependencies.

I've since added this to my standard checklist for any multi-agent execution plan.

Act 5: The Shadow — 12 Snoop Reports

While Codex worked, I periodically asked Claude to observe the codebase and report on progress. Claude had read-only access — it could inspect files, read git history, check PR statuses, and review code quality, but couldn't modify anything.

This created a powerful feedback loop: one AI executes, another AI reviews, and a human makes decisions based on both perspectives.

Here's the timeline of observations:

Report	Time	Finding
#1-#5	Early	Phase 0 + Phase 7 infra done. Worktree issue identified.
#6	Mid	Phase 2b merged. 1,863 lines extracted into ai-core with parity tests.
#7	Mid	Phase 3 worktree created but no commits yet (Codex working locally).
#8	Mid	Phase 3 merged. 916-line runtime manager with full run state machine.
#9	Late	Phase 5 + Phase 6 backend both merged. True streaming + proxy split.
#10	Late	Phase 6 frontend merged. Durable run client with reconnect.
#11	Final	All phases complete. 20 PRs merged. Integration promoted to master.
#12	Post	Hotfix for missing `AI_CHAT_BOOTSTRAP_SECRET` on AI service.

What the Shadow Caught

The snoop reports weren't just progress tracking. They caught real issues:

Lost artifact: The Phase 2a SDK migration audit was written to a gitignored directory. When the worktree was removed, the artifact disappeared. Flagged in report #6.
Session continuity gap: Codex sessions have quota limits. If a session expires mid-work, the next session needs to know where to pick up. We drafted a HANDOVER.md protocol — a structured file that captures current phase, what's done, what's blocked, branch state, and next action.
Code quality validation: Report #8 confirmed the Phase 3 runtime manager had proper admission control, watchdog timers, graceful shutdown, and queue drain — not just placeholder stubs.
Contract compliance: Report #10 verified the frontend was consuming the Phase 5 SSE contract correctly — text(mode=delta) for live tokens, text(mode=snapshot) for reconnect reconstruction.

Act 6: The Numbers

Timeline (March 31, 2026)

Time	Event
08:52	First runbook commit
09:31	Codex starts — PR #4 (Phase 7 infra)
09:41	PR #5 (Phase 0 baseline)
10:16	PR #6 (Phase 1 workspace scaffold)
12:41	PR #12 (Phase 2a contracts)
13:27	PR #13 (Phase 2b extraction — 1,863 lines)
14:29	PR #14 (Phase 3 runtime — 916-line manager)
15:08	PR #15 (Phase 6 backend proxy split)
15:28	PR #16 (Phase 5 provider streaming + SSE freeze)
15:59	PR #17 (Phase 6 frontend run-client)
17:30	PR #18 (Phase 8 diagnostics)
19:07	PR #19 (Phase 9 cutover)
19:10	PR #20 (Master promotion)
Total	~9.5 hours, 21 PRs, ~19,000 net lines

Code Produced

Component	Lines	Purpose
`ai-service/`	~4,500	Fastify runtime, routes, run manager, storage
`packages/ai-core/`	~3,800	Extracted AI logic, contracts, domain models
Infrastructure scripts	~1,700	Provisioning, verification, packaging
Deploy workflow	~250	CI/CD pipeline
Frontend changes	~1,100	Run-aware chat panel, API client
Tests	~1,500	Characterization, parity, integration, contract
Remaining	~6,000+	Internalized API libs, domain model moves, etc.

What Got Deployed

Standalone Fastify AI service on app-name
Dedicated App Service plan, storage account, managed identity
Express proxy splitting /api/ai to AI service, /api to Functions
GitHub Actions deploy workflow with pre/post-deploy health probes
Run-aware frontend with reconnect, cancel, and status polling

Act 7: The Hotfix

Twenty PRs merged. Three deploy workflows green. Dark deploy verified. Brownout rollback proven.

Then I opened the chat panel and got "Access Denied."

The AI_CHAT_BOOTSTRAP_SECRET — a shared secret used to verify frontend bootstrap tokens — was configured on the Azure Functions app but had never been provisioned to the new AI service. The dark deploy health probes (which test /healthz and /readyz) didn't catch it because they don't send authenticated chat requests.

Codex fixed it in one commit: added the secret to the provisioning script, the verification checklist, and the readiness probe. PR #21 merged and deployed.

Lesson: Dark deploys verify infrastructure health, not end-to-end user flows. You need both.

What I Learned About Multi-Model Development

The Reviewer/Executor Pattern

The most powerful pattern I discovered is using two AI models in complementary roles:

Model A (Claude Opus): Planner, reviewer, and observer. It reviews the plan before execution, monitors progress during execution, and validates quality after execution. It never touches the code directly.
Model B (Codex): Autonomous executor. It receives a plan and system prompt, then works independently — creating branches, writing code, provisioning infrastructure, creating PRs, and deploying.

This separation of concerns mirrors how engineering teams work: architects review, developers execute, and neither role is diminished by the other.

How to Write Plans for Autonomous Agents

Be exhaustive about context. Every file path, resource name, dependency version, and API contract. Agents can't ask clarifying questions mid-execution the way humans can.
Define stop gates, not just tasks. "Do X" is a task. "Do X, but stop if Y is true" is a gate. Gates prevent agents from charging past failure points.
Specify physical isolation, not just logical dependencies. Parallel threads need worktrees, branch naming conventions, and merge protocols — not just a DAG.
Include permission boundaries. "You can do X without asking" prevents stalling. "You must stop for Y" prevents runaway execution.
Plan for session turnover. Long-running agents hit quota limits. Build in handover protocols — structured files that capture state for the next session.

How to Monitor Autonomous Agents

The "snoop" pattern — having a second model periodically observe the codebase — turned out to be valuable for three reasons:

Progress tracking without interruption. The observer reads git history and file state without disturbing the executor.
Quality validation in real-time. The observer can assess whether extracted code maintains behavioral parity, whether contracts are properly frozen, and whether test coverage is meaningful.
Issue detection before merge. The observer caught the lost SDK audit artifact and the session continuity gap before they became blocking problems.

The Human's Role Changes, But Doesn't Disappear

I didn't write 19,000 lines of code. But I:

Designed the migration architecture
Wrote a 2,100-line runbook that eliminated ambiguity
Reviewed the plan across three iterations with an AI partner
Identified the worktree isolation gap and injected a correction mid-flight
Monitored execution through 12 observation cycles
Made the "big bang, fix forward" cutover decision
Performed the first manual end-to-end test that found the bootstrap secret issue

The human role shifts from writing code to writing specifications precise enough that code writes itself. That's a different skill, but it's still a skill — and it's the skill that determined whether this project took 9 hours or 9 days.

Try This Yourself

If you want to experiment with multi-model development:

Start with a runbook, not a prompt. Write down everything the agent needs to know before it starts. If you find yourself thinking "it'll figure that out," write it down instead.
Use one model to review the plan before another executes it. The reviewer will catch ambiguities, missing gates, and underspecified contracts.
Set up a shadow observer. Ask a model to periodically read your codebase and report on the executor's progress. You'll catch issues faster than waiting for CI failures.
Plan for failure. Include stop gates, rollback strategies, and session handover protocols. Autonomous agents don't get tired, but they do get stuck — and they need structured ways to communicate that.
Test the real user path. Health probes are necessary but not sufficient. Someone needs to open the browser and click the button.

The future of software development isn't AI replacing developers. It's developers who can orchestrate AI systems effectively building things that would have been impractical before.

The runbook and system prompt used in this project are shared below for anyone who wants to adapt this approach.

Appendix: The Full Runbook

Below is the complete 2,100-line migration runbook I wrote to drive this project — the single document that governed every phase, gate, and decision. All Azure resource names and company identifiers have been redacted. The structure, sequencing, execution model, and stop gates are authentic.

This is the actual artifact that Codex followed autonomously for 9.5 hours. If you want to adapt this approach for your own projects, this is your template.

Scroll within the frame to read the full runbook. The original is approximately 2,100 lines covering 22 sections.

Mo Khan is just an old-timer engineer-turned-manager, who left coding a long time ago, and is having so much fun learning again and building with AI tools with a special interest in AI-augmented development workflows, cloud architecture, and autonomous agent orchestration.

Mo Khan's Blog

Pages

Wednesday, 1 April 2026

How I Used Two AI Models to Re-Architect a Production Backend in 9.5 Hours