One runbook. One AI agent. Zero portal clicks. A full SWA-to-App-Service migration from the US to South Africa.
The Problem: Your Frontend Is on the Wrong Continent
Our internal financial business intelligence tool — a React SPA backed by Azure Functions and Cosmos DB — had a geography problem. When I rapidly developed the MVP, I thought I could leverage free cloud services to not only prove the concept, but also since this tool was going to be used internally by a small group of users, I thought I could get away with free Azure services. Alas, as the MVP evolved into a real release, it became clear I had to do something about latency, cross-region calls, data sovereignty and the inherent limitations of free cloud services! So a migration was without question.
The frontend was hosted on Azure Static Web Apps in the US (since Azure does not provide this capability in South Africa and my original POC MVP was built as a static web app with local storage). The database and all backend services lived in South Africa North. Every API call crossed the Atlantic and back.
- Cross-region latency on every Cosmos DB query — users in South Africa waited for round-trips to the US and back to South Africa
- Data sovereignty concerns — even static HTML was served from US infrastructure
- Architectural complexity — a free-tier SWA in the US proxying to paid Functions in South Africa made cost attribution and debugging harder than it needed to be
- Auth coupling — SWA's built-in auth model injected identity in a platform-specific format that wouldn't survive a hosting change
The decision was made: move everything to South Africa. Same region as the data. Same region as the users.
But this wasn't just a redeploy. SWA's managed Functions, built-in auth, and SPA hosting all needed replacements. The target was a Linux App Service running Express, a standalone Azure Functions app, EasyAuth with a dedicated Entra app registration, and a completely new CI/CD pipeline. All while keeping the existing SWA running as a live fallback. Frugality is top-of-mind for me, aiming for the lowest cost options as the driving constraint.
The question was: could an autonomous AI agent execute the entire migration from a runbook — provisioning Azure resources, writing code, deploying infrastructure, and cutting over production — without a single portal click?
The Cast
This project used the same three-actor model I described in my previous post about the AI service migration:
Me — architect and orchestrator. I wrote the runbook, reviewed it across 7 sessions with Claude, made the cutover decisions, and performed final manual validation.
Claude (Opus) — planning partner. Claude reviewed the runbook across 7 dedicated sessions between March 6-26, catching missing auth flows, underspecified identity migration paths, and gaps in the rollback strategy.
Codex — autonomous executor. Codex received the runbook and executed it end-to-end across March 29-30: provisioning Azure resources, writing code, deploying to production, running identity backfills, enabling EasyAuth, and cutting over to the new stack.
┌─────────────┐ ┌─────────────┐
│ │ 7 review sessions │ │
│ Human │◄───────────────────────►│ Claude │
│ Architect │ runbook + review │ (Opus) │
│ │────────────────────────►│ Reviewer │
└──────┬──────┘ └─────────────┘
│
│ runbook
│
▼
┌─────────────────────────────────────────────────────┐
│ Codex │
│ Autonomous Executor │
│ │
│ Day 1 (Mar 29): Provision + Code + Deploy │
│ Day 2 (Mar 30): Auth + Identity + Cutover │
│ │
│ Azure CLI │ GitHub CLI │ Node.js │ PowerShell │
│ 14 files created │ 18 files modified │
│ 537 tests passing │ 12 user identities migrated │
└─────────────────────────────────────────────────────┘
The Architecture: Before and After
Before: Cross-Region SWA
The existing architecture had the frontend and its managed Functions in the US, making cross-Atlantic calls to Cosmos DB in South Africa on every API request.
┌─────────────────────────────────────────────────────────────────┐
│ BROWSER (South Africa) │
│ React SPA ──── fetch('/api/*') ────► │
└─────────────────────┬───────────────────────────────────────────┘
│
🔻 Atlantic crossing (~180ms RTT)
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Azure Static Web Apps (US Region) │
│ │
│ ┌─────────────────┐ ┌────────────────────────────┐ │
│ │ SWA Built-in │ │ SWA-Managed Functions │ │
│ │ Auth (EasyAuth)│ │ (co-located in US) │ │
│ │ SWA headers │ │ │ │
│ │ Platform- │ │ /api/me │ │
│ │ specific format│ │ /api/data │ │
│ └─────────────────┘ │ /api/ai/chat │ │
│ │ /api/etl/upload │ │
│ Serves React SPA │ ... 40+ API endpoints │ │
│ (static files US) └──────┬─────────────────────┘ │
└──────────────────────────────│──────────────────────────────────┘
│
🔻 Another Atlantic crossing
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ South Africa North │
│ │
│ ┌────────────────┐ ┌────────────┐ ┌───────────────┐ │
│ │ Cosmos DB │ │ ETL Extract│ │ Blob Storage │ │
│ │ (all data) │ │ (Python) │ │ (SAP exports)│ │
│ └────────────────┘ │ ETL Sync │ └───────────────┘ │
│ │ (Node.js) │ │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Problems:
✗ Every API call crosses the Atlantic twice (browser → US → SA → US → browser)
✗ Static files served from US for South African users
✗ Auth format is SWA-specific (platform lock-in)
✗ SWA-managed Functions can't be independently scaled or monitored
✗ Cost attribution across regions is opaque
After: Single-Region App Service
Everything co-located in South Africa North. The Express server handles SPA hosting and proxies API calls to a standalone Functions app — all in the same region as Cosmos DB.
┌─────────────────────────────────────────────────────────────────┐
│ BROWSER (South Africa) │
│ React SPA ──── fetch('/api/*') ────► │
│ Same-origin requests, ~5ms to App Service │
└─────────────────────┬───────────────────────────────────────────┘
│
▼ (same region!)
┌─────────────────────────────────────────────────────────────────┐
│ All South Africa North │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ App Service B1 Linux (Express server) │ │
│ │ │ │
│ │ ┌───────────────┐ ┌───────────────────────────────────┐│ │
│ │ │ EasyAuth │ │ Express Web Host ││ │
│ │ │ (Entra ID) │ │ ││ │
│ │ │ Dedicated app│ │ /healthz → direct 200 ││ │
│ │ │ registration │ │ /api/* → proxy to Functions ││ │
│ │ │ Claims-array │ │ /* → serve dist/index.html ││ │
│ │ │ format │ │ dist/assets/* → immutable cache ││ │
│ │ └───────────────┘ └────────┬──────────────────────────┘│ │
│ └───────────────────────────────│───────────────────────────┘ │
│ │ │
│ │ x-internal-proxy-secret │
│ │ x-ms-client-principal │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Standalone Functions App (Consumption plan) │ │
│ │ │ │
│ │ AUTH_MODE=appservice │ │
│ │ Validates proxy secret → parses claims-array │ │
│ │ IP restrictions: App Service outbound IPs only │ │
│ │ │ │
│ │ /api/me /api/data /api/ai/chat /api/etl/upload │ │
│ │ ... 40+ endpoints (same business logic, new auth mode) │ │
│ └─────────────┬─────────────────────────────────────────────┘ │
│ │ │
│ ▼ (same region, ~1ms) │
│ ┌────────────────┐ ┌────────────┐ ┌───────────────┐ │
│ │ Cosmos DB │ │ETL Extract │ │ Blob Storage │ │
│ │ (same region!)│ │(unchanged) │ │ (unchanged) │ │
│ └────────────────┘ │ETL Sync │ └───────────────┘ │
│ │(unchanged) │ │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Improvements:
✓ All traffic stays in South Africa — no cross-region hops
✓ Express serves SPA + proxies to Functions in same region
✓ Dedicated EasyAuth with claims-array auth (no SWA lock-in)
✓ Functions independently scalable and monitorable
✓ IP-restricted: Functions only accept traffic from App Service
✓ Shared-secret trust boundary on every proxied request
✓ SWA kept as parked standby for emergency failover
✓ Cost: +$13/month for the App Service plan
The Auth Migration
This deserves its own diagram because it was the hardest part of the migration. SWA and App Service EasyAuth present identity differently. The backend had to understand both.
SWA Auth (before): App Service Auth (after):
────────────────── ────────────────────────
x-ms-client-principal x-ms-client-principal
│ │
▼ ▼
Base64 → JSON Base64 → JSON
{ {
userId: "abc", claims: [
userRoles: ["admin"], { typ: "oid", val: "abc" },
identityProvider: "aad" { typ: "email", val: "..." },
} { typ: "roles", val: "admin" }
]
Top-level fields }
(SWA-specific)
Claims array
(standard Entra format)
AUTH_MODE=swa AUTH_MODE=appservice
+ no proxy secret needed + x-internal-proxy-secret required
+ SWA manages the hop + timing-safe secret validation
+ claims-array parsing
+ identity canonicalization
Act 1: The Runbook
Why This Migration Needed a Runbook
This wasn't a "lift and shift." Moving from SWA to App Service touched:
- 4 Azure resources to provision (App Service plan, web app, Functions app, storage account)
- 56 app settings to migrate from SWA to the standalone Functions app
- 12 user identities to canonicalize from SWA format to Entra format
- A new auth mode (App Service EasyAuth with claims-array parsing)
- A new web host (Express server replacing SWA's built-in hosting)
- 2 CI/CD pipelines running in parallel during validation
- An ETL pipeline that needed seamless ownership transfer between workflows
- A parked standby mode for the old SWA (not decommission — failover readiness)
The runbook grew to 21 sections with 2,300+ lines. Every Azure CLI command. Every app setting category. Every auth claim extraction rule. Every verification checkpoint.
Seven Review Sessions
Before Codex touched anything, Claude reviewed the runbook across seven dedicated sessions between March 6-26, 2026:
| Session | Date | Focus |
|---|---|---|
| 1 | Mar 6 | Initial architecture validation and scope framing |
| 2 | Mar 25 (18:55) | Plan validity: are the phases correctly sequenced? |
| 3 | Mar 25 (19:34) | Autonomous execution review: can an AI agent run this without portal clicks? |
| 4 | Mar 25 (20:03) | Architecture and design review: is the auth migration sound? |
| 5 | Mar 25 (20:36) | Implementation plan: critical path validation |
| 6 | Mar 25 (20:59) | Expert engineer review: what would a senior engineer push back on? |
| 7 | Mar 25 (21:24) | Deep review: identity migration, rollback, and edge cases |
Key issues caught during review:
- Identity continuity gap: The initial runbook assumed user IDs would carry over. Claude caught that SWA uses platform-managed service principals while App Service EasyAuth uses Entra object IDs — a completely different identity format. This led to adding the
userIdentity.jscanonicalization layer and the one-time backfill script. - Auth lightweight path: The
verifyTokenLightweightfunction used by AI chat was SWA-only. Without an App Service equivalent, AI chat would break silently after migration. - ETL upload streaming: If Express body-parsing middleware was added before the
/apiproxy, multipart ETL uploads would break. The runbook was updated to explicitly forbidexpress.json()ahead of the proxy mount. - Rollback strategy: The original plan assumed SWA decommission. I later changed my mind and pushed for a parked-standby model instead — keep SWA deployable as emergency failover, not delete it.
Act 2: The Execution
Codex received the runbook and started working on March 29, 2026 at 14:56 SAST.
Day 1: Infrastructure and Code (March 29)
| Time | Event |
|---|---|
| 14:56 | Inventory capture + code implementation (auth, identity, server, tests) |
| 15:22 | Azure provisioning: App Service plan, web app, Functions app, storage account |
| 15:41 | Identity backfill dry-run: 12 users scanned, 12 canonical migrations found |
| 15:42 | Identity backfill applied: 12 users migrated, zero conflicts |
| 16:43 | Web deploy (first attempt — Windows zip failed, rebuilt with POSIX paths) |
| 17:00 | API deploy: standalone Functions packaging fixed, proxy verified |
| 17:06 | Auth blocker: Entra app registration failed (insufficient tenant privileges) |
| 18:05 | Full verification: 56/56 config parity, health green, smoke blocked only by auth |
| 19:18 | Workflow + deploy path hardening committed |
| 22:10 | API deploy recovery: Functions-action produced 503; switched to source-only Kudu |
| 23:33 | Kudu false-negative analysis: rsync symlink errors masked a healthy deploy |
| 23:56 | Both GitHub Actions workflows green. SA web + API deployed successfully. |
Day 2: Auth, Validation, and Cutover (March 30)
| Time | Event |
|---|---|
| 14:32 | Entra auth unblocked: dedicated app registration created with new privileges |
| 14:40 | EasyAuth enabled: login redirect verified working |
| 15:01 | Identity re-audit: 9 canonical, 2 clean migrations, 1 overlap detected |
| 15:15 | Overlap identity fix + verification hardening deployed |
| 15:48 | Auth flow correction: enableIdTokenIssuance was false, fixed live |
| 16:05 | Workflow smoke alignment: accept EasyAuth-protected probe responses |
| 17:18 | ETL admin regression: EtlPipelineView used wrong role authority, fixed |
| 18:24 | Documentation strategy rewrite: park SWA, don't decommission |
| 21:03 | ETL ownership switched to SA workflow |
| 21:22 | Final cutover: SWA parked, SA primary, both workflows green |
| 21:34 | Failover drill fix: workflow_dispatch jobs were gated to push-only |
| 21:49 | Failover drill complete: SWA restored, re-parked, verified end-to-end |
Act 3: The Battles
Autonomous doesn't mean smooth. Codex hit real obstacles and worked through them.
Battle 1: The Windows Zip
The first web deploy failed because the zip archive built on Windows contained backslash paths. Azure's OneDeploy rejected them. Codex rebuilt the package with POSIX-style paths and redeployed successfully.
Battle 2: The Functions 503
The standard Azure/functions-action@v1 with pre-built node_modules produced a deployed Functions app that returned 503. Codex diagnosed it, switched to source-only Kudu zipdeploy with SCM_DO_BUILD_DURING_DEPLOYMENT=true (matching the pattern already proven by the ETL sync app), and restored the API to 200.
Battle 3: The Kudu False Negative
After fixing the deploy shape, Kudu still reported "failed" — because rsync couldn't create node_modules/.bin/* symlinks. But the app was actually healthy. Codex analyzed the log pattern, hardened the Kudu helper script to recognize this specific false negative, and added a health-gated fallback.
Battle 4: The Tenant Privilege Blocker
Creating the Entra app registration required Application Administrator privileges that Codex didn't have on Day 1. This blocked EasyAuth completely. I resolved the privilege overnight, and Codex resumed on Day 2.
Battle 5: The ID Token Gap
After enabling EasyAuth, browser logins failed silently. The Entra app registration had enableIdTokenIssuance=false, but App Service EasyAuth requests response_type=code id_token. Codex found this, set the flag to true via CLI, and updated both the provisioning and verification scripts to treat it as required state.
Battle 6: The ETL Role Regression
The ETL admin page broke for the admin user on the new stack. Root cause: EtlPipelineView preferred raw tokenRoles (which under App Service auth is just ["authenticated"]) over the database-backed profileRoles. Codex fixed the precedence and added a regression test.
Act 4: The Numbers
| Metric | Value |
|---|---|
| Total execution time | ~28 hours across 2 days |
| Files created | 14 |
| Files modified | 18 |
| Tests passing | 537 across 149 test files |
| User identities migrated | 12 |
| App settings migrated | 56 (verified parity) |
| Azure resources provisioned | 4 (plan, web app, Functions app, storage) |
| GitHub Actions workflows | 2 running in parallel, both green |
| Execution ledger entries | 30+ timestamped operations |
| Portal clicks | 0 |
| Incremental monthly cost | +$13 (one B1 Linux plan) |
What Got Deployed
- Express web host serving the React SPA with immutable asset caching
- API proxy with shared-secret trust boundary and 180s timeout
- Standalone Functions app with
AUTH_MODE=appserviceand IP restrictions - Dedicated Entra app registration with EasyAuth
- App Service auth parser with claims-array extraction and timing-safe secret validation
- Identity canonicalization layer with SWA-to-Entra migration
- Kudu zipdeploy helper with false-negative resilience
- Curated deploy artifact with dependency pruning
- ETL workflow parity with ownership switch variable
- SWA parked standby with verified failover drill
What I Learned
The Execution Ledger Pattern
The most valuable artifact wasn't the code — it was the execution ledger. Every action Codex took was recorded with timestamp, phase, command, sanitized result, and next action. This append-only log became the working memory across sessions and the audit trail for the entire migration.
When Codex hit the tenant privilege blocker on Day 1 and had to resume on Day 2, the ledger told it exactly where to pick up. When the Kudu deploy shape needed three iterations, the ledger captured each failure and its resolution.
If you're planning autonomous multi-session work, build the ledger into the runbook from the start.
CLI-First Changes Everything
The runbook's execution rule — "no Azure Portal or GitHub UI dependency; all setup must be executable by az, gh, PowerShell, or GitHub Actions" — was the single most important constraint. It made the entire migration automatable.
Every resource provisioned by az appservice plan create. Every secret set by gh secret set. Every EasyAuth configuration by az webapp auth update. Every verification by scripted probes. Zero portal clicks meant zero human bottlenecks.
Park, Don't Decommission
My push toward a parked-standby model instead of immediate SWA decommission was the right call. On Day 2, after cutover, Codex ran a full failover drill: unparked SWA, verified the full app was serving, then re-parked it. The whole cycle took 15 minutes and proved the rollback path works.
For any production migration: keep the old thing alive in standby until you're confident you'll never need it. The cost of maintaining a parked SWA ($0) is much less than the cost of recreating one in an emergency.
Auth Migrations Are Never Simple
We hit five distinct auth-related issues across two days: tenant privileges, ID token issuance flags, claims-array format differences, role authority precedence, and identity canonicalization. Any one of them could have broken the migration silently.
The runbook's detailed auth specification — with pseudocode for claim extraction, validation ordering, and normalized return shapes — was essential. Without it, the agent would have guessed at the auth format and produced subtly wrong behavior.
The Human's Role
I didn't write the Express server, the auth parser, the identity backfill script, the deploy workflows, or the provisioning scripts. But I:
- Designed the target architecture
- Wrote a 2,300-line runbook that left nothing ambiguous
- Reviewed it across 7 sessions with an AI planning partner
- Resolved the tenant privilege blocker that no CLI command could fix
- Made the cutover decision based on the verification evidence after confirming with pilot users
- Performed manual browser validation that proved the stack worked end-to-end
The pattern is the same as before: the human's job is to write specifications precise enough that code writes itself. The better the runbook, the more the agent can do autonomously.
Try This Yourself
Compared to the AI service migration (which was a code extraction and new service build), this SWA migration was a different kind of challenge: less code, more infrastructure, more auth complexity, more operational choreography.
If you're planning a similar hosting migration:
- Audit your auth surface before you start. SWA, App Service, and B2C/Entra all present identity differently. Map the claim shapes explicitly.
- Build the execution ledger into the plan. Autonomous agents that work across sessions need persistent working memory.
- Require CLI-only execution. If the plan needs portal clicks, the agent can't run it.
- Run both stacks in parallel. Shared data (same Cosmos, same Blob) means zero data migration. Two active frontends during validation costs almost nothing.
- Park, don't delete. Your rollback is only useful if it still exists.
- Test the real user path manually. Health probes pass, workflows are green, config parity is 56/56 — and then a human opens a browser and auth fails because of a flag nobody thought to check.
The runbook used in this project is shared in the appendix below.
Appendix: The Complete Migration Runbook (Redacted)
Below is the full runbook that guided this migration, exactly as Codex executed it. Sensitive identifiers — Azure resource names, GitHub references, email addresses, Entra IDs, connection strings, and deployment credentials — have been replaced with <placeholder> tokens. The architecture decisions, execution patterns, CLI commands, and verification checklists are preserved verbatim.
This is the document that Claude reviewed across seven sessions and Codex executed autonomously over two days. Scroll through to see the level of detail that makes autonomous agent execution possible.
Scroll inside the frame to read the complete runbook. The document contains 21 sections covering architecture, auth design, CLI automation, deployment workflows, and verification checklists.
Muhammad Khan is a GM moonlighting as software engineer in his spare time, learning about AI-augmented development workflows, cloud architecture, and autonomous agent orchestration.

No comments:
Post a Comment