Showing posts with label chatGPT. Show all posts
Showing posts with label chatGPT. Show all posts

Wednesday, 31 December 2025

The SMT Chronicles - MVP Version 2 - NotebookLM story


In my previous post, I shared how NotebookLM synthesised the evolution of my experimental app, SMT (Software Management Tools) from my Gemini chat transcripts till June 2025. I then went further and added just the latest Readme file as an additional source for NotebookLM to see if it could connect the dots from June 2025 to December 2025. I also created a simple dashboard using Antigravity for my repo's storyline here

NotebookLM generated this infographic that was spot on! 

By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.

Since the June 2025 MVP Version 1, I woke up again in November and added a ton of features. My productivity sky rocketed with the release new LLMs and coding assistants. I added a powerful AI feature which I think is a game changer for software management. I clobbered technical debt, rebuilt the app UX, added quality controls and overall rearchitected the codebase to something more respectable than a toy app coded by AI...

NotebookLM generated this audio overview, mostly correct!


NotebookLM generated this slide deck...

NotebookLM generated this video...

The SMT Chronicles - MVP Version 1 - Oct 24 to June 25, building with AI co-pilots

Over the last year, I've been building an experimental app, SMT (Software Management Tools) that started out initially out of curiosity to test LLM's ability of code generation, before the proliferation of AI coding agents and integrated development environments or vibe coding platforms. I worked on this project in my spare time over weekends and evenings, testing the waters - often getting quite frustrated in the early days, sometimes having streaks of productivity and other times, just pure going in circles and breaking things - often found myself cursing the AI :-) I didn't give up, despite the breaks with long periods of inactivity, I kept monitoring the latest AI news and releases - and continued to test the waters. Each time, learning something new, seeing the progress of LLMs, witnessing the pure potential this technology has to not only disrupt the software industry, but also the immense potential at empowering people to translate ideas and concepts into prototypes, without depending on outsourced developers. The journey of learning continues. I stuck with Gemini because it has, since the beginning been enormously generous with quotas and large context windows, unlike Claude and ChatGPT at the time. Even today, I prefer to stick with learning just one tool like Antigravity than to context switch with others - although my workflow includes Antigravity Agent Manager, and a separate VSCode with Codex to audit changes and pick up where Gemini or Claude Opus fails to complete their tasks.

Here's the activity story from GitHub:

I also created a simple dashboard using Antigravity for my repo's storyline here

In this post, I'm sharing some history of SMT. How did all begin? Interestingly enough, I went through a phase of saving major chat sessions with Gemini, that led to the first MVP of SMT. I saved all my prompts in google docs. I stopped tracking my prompts in June because it got quite tedious! With this chat record, I wanted to see what Google's NotebookLM would make of the doc's contents, here is what it produced - super fascinating the power of NotebookLM!

NotebookLM generated this infographic that was spot on! 


By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.

NotebookLM generated this audio overview, mostly correct!


NotebookLM generated this slide deck...

NotebookLM generated this video...


My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!

Sunday, 28 December 2025

How I use visual annotations with Antigravity to build UI / UX

Sharing some of my workflow experience in building my SMT (Software Management Tools) application using AI as my coding assistant. With the launch of Google's Antigravity platform in November 2025, my curiosity got the better of me! I re-engaged with my codebase after about 5 months of dormancy. In under 2 months I was able to migrate the look and feel to a new UX paradigm, introduce a coding contract and constitutional framework, introduce dark/light mode theming, fundamentally refactoring the codebase to best practices software design patterns, integrate AI-features into the app, and clobber technical debt. Thanks to Antigravity... what this video shares is how powerful visual annotations can be for changing UX... As I experienced this journey, I became more aware the future of UX and Frontend Engineers is going to be disrupted quite radically!


Play with the SMT app here Codebase on Github

Codex 5.2 vs the Long Game: Building a Quality Ladder in a Real Product Codebase

If you’ve ever tried to institutionalize code quality (not just talk about it), you’ll know the trap: quality work is rarely one big change. It’s a hundred small, unglamorous decisions—done consistently—until the codebase starts behaving like a system.

That’s what this post is about.

I’ve been evolving a real planning product codebase (“altsoftwareplanning”)—workflows for planning, roadmaps, org views, capacity tuning, and a growing set of UI/interaction patterns. It shipped features. I think my app will solve real problems. Yes, it's still a concept, and I wanted to ensure it's being built the right way. But the safety net was thin.

At some point, “it works” stops being good enough.

I wanted a codebase where I can say, with a straight face:

We can prove it works.

Codex 5.2 turned out to be the right partner for the long-running version of this work: contract enforcement, lint hardening, unit testing, E2E testing, coverage instrumentation, CI visibility, and documentation—all in a compounding sequence that actually sticks.

Codex 5.2 CLI helped transform an active product codebase into a quality-first system by enforcing contracts, hardening lint discipline, building broad unit + Cypress E2E coverage, and surfacing coverage artifacts in CI—over a sequence of compounding iterations.


The Landscape: What We Were Working With

You can’t talk quality without acknowledging the stack. The strategy must fit the reality.

  • App layer: vanilla JavaScript, HTML, CSS (no framework), service-oriented front-end modules under /js

  • Visualization tooling: D3, Mermaid, Frappe Gantt

  • Background logic: a feedback worker (smt-feedback-worker) and AI-adjacent utilities

  • Quality/tooling: ESLint (flat config), stylelint, depcheck, plus a custom contract scanner

  • Testing:

  • CI: GitHub Actions, publishing coverage artifacts on every push/PR

This is no longer a toy repo! It demonstrates patterns for a real product surface area with enough moving parts to punish sloppy changes.


The Problem: Velocity Without Proof

We had signals, but they weren’t a coherent system:

  • some rules existed (informally)

  • linting existed (but was noisy and inconsistent)

  • testing existed (but not at a level that lets you refactor with confidence)

  • E2E coverage didn’t reflect how users actually flow through the product

  • coverage existed locally, not visibly

In other words: tribal knowledge plus hope.

The goal wasn’t “add tests.” The goal was a quality ladder—each rung enabling the next.


The Quality Ladder (The Sequence Matters)

Here’s the order of operations that worked, and why:

  1. Codify a contract (design rules, data flow rules, UI rules)

  2. Make lint clean and trustworthy (so warnings mean something)

  3. Cover domain logic with unit tests (turn behavior into executable specs)

  4. Add E2E tests for real workflows (where regressions actually happen)

  5. Instrument coverage (otherwise you’re guessing)

  6. Publish coverage in CI (visibility changes behavior)

  7. Keep docs current (quality must be repeatable)

Codex 5.2 helped me climb this ladder without losing the thread.


Phase 0: Compliance Cleanup (The Real Beginning)

Every long-running quality push has a starting “wake up” moment.

For this repo, that moment looked like compliance cleanup:

  • removing risky innerHTML usage and window-bound patterns

  • tightening DOM checks and wiring

  • simplifying defensive guards that hid intent

This wasn’t glamorous. It was foundational. You can’t build a quality system on top of brittle glue.


Phase 1: Turn Ideas Into Contracts

Before I was willing to scale testing, I needed the codebase to have enforceable rules—especially in the UI layer.

We already had an early foundation:

  • a contract-scanning script

  • a “coding contract” that enforces rules like:

    • no inline styles (even for dynamic values)

    • DOM-only toolbars (explicit, deterministic wiring)

    • centralized storage patterns (no random settings scattered across views)

Then we reinforced it.

What changed (in practical terms)

  • Inline style mutations were removed or centralized.

  • View code was pushed toward CSS classes / CSS variables.

  • Global header/toolbars were wired consistently.

  • Storage moved behind the repository boundary (e.g., SystemRepository).

This is the crucial insight:

Quality starts with clarity. If you can’t describe the rules, you can’t test them.

Codex 5.2 was useful here because it didn’t just “fix a file.” It was willing to chase contract violations across dozens of small edits until the rule was actually enforced.


Phase 2: Make Linting a Trusted Signal

Linting is only useful if “clean” is achievable.

So we did the boring work:

  • cleared ESLint warnings (unused locals, scoping issues, assignment-in-conditional traps)

  • standardized hasOwnProperty usage

  • tightened property access and scoping

Only then did we upgrade:

  • ESLint v9

  • flat config

  • reduced deprecated dependencies

Once lint was noise-free, a new lint warning started meaning something again.

This is one of those hidden tipping points: when lint becomes a real gate, behavior shifts.


Phase 3: Unit Tests That Match the Domain

This was not a token unit-test pass. The suite mapped to the product’s domain logic.

We built unit tests across:

  • Planning

  • Roadmap

  • Org

  • System

  • Initiative

  • Forecast

  • WorkPackage

  • App state helpers and supporting utilities

Tooling:

  • Vitest + jsdom (DOM simulation where needed)

  • later: Vitest v4 upgrade (refreshed the Vite/esbuild chain and cleared CI audit issues)

The real win wasn’t “coverage percentage.”

The real win was this:

Domain behavior became executable.

Codex 5.2 helped by translating real product flows into testable units—without scattering random micro-tests that don’t align to how the app behaves.


Phase 4: Cypress E2E Tests for Real Flows

Unit tests prove logic.

E2E tests prove the product.

We started with smoke tests and selector stabilization, then expanded into specs that mirror how users actually move:

  • Core UI and workflow validation

  • Planning and detailed planning flows

  • Product management flows

  • System creation/editing and org management

  • Settings and AI feedback flows

  • Smoke tests and selector hardening

Two deliberate choices here:

  1. Readable and commented tests. These become “living docs.”

  2. Selector hardening before scale. Flaky selectors are how E2E suites die.

Codex 5.2 showed long-running strength here: iterating until the suite is stable, not just “present.”


Phase 5: Coverage You Can See (Not Just Locally)

Testing without visibility is guesswork.

So we instrumented coverage for both layers:

  • Unit coverage: Istanbul

  • E2E coverage: @cypress/code-coverage

We intentionally split reports:


  • coverage/unit

  • coverage/e2e

And then made it visible:

  • GitHub Actions uploads coverage artifacts on every push and PR

  • no special tooling required to inspect results—just download the artifact

CI now runs (as part of the quality pipeline):

  • npm run test:coverage

  • npm run test:e2e:coverage

We also fixed dependency checks so coverage tooling is first-class and green in CI (depcheck included).

This matters because it changes the social contract:

Coverage stops being “something one dev ran.” It becomes a team-visible signal.


What Codex 5.2 Actually Enabled

This wasn’t “AI wrote tests.” That’s the shallow version of the story.

Codex 5.2 behaved more like a quality software-engineer-in-test (SWET) expert partner who doesn’t lose the thread that worked for hours (I exhausted my five hour quota on more than one occasion):

  • it read the codebase and translated workflows into test cases

  • it stayed consistent with constraints across many commits

  • it revisited flaky E2E specs and hardened selectors

  • it fixed CI failures immediately instead of letting them linger

  • it updated docs alongside implementation changes

Short tasks are easy.

The long game—contracts + lint + unit + E2E + coverage + CI + docs—requires continuity.

That’s where Codex 5.2 shined.

Hint: If you've been delaying a long term technical debt task to introduce (or migrate to other frameworks) unit tests and integration tests, integrated with your CI/CD workflows, you no longer have to do this yourself. Codex 5.2 is a very capable model that can do literally all the heavy lifting for you. Codex can help transform your codebase in a matter of hours, taking your codebase quality to the next level. 


The Working Loop I Used (Practical, Not Magical)

If you want to replicate this kind of quality transformation, here’s the loop that worked:

  1. Start with an audit task

    • “Scan the repo for contract violations and list them by severity.”

    • “List lint rule violations that prevent a clean run.”

  2. Fix in small diffs

    • insist on small PR-sized changes

    • prevent “helpful refactors” that mix concerns

  3. Lock in with tooling

    • don’t accept “we fixed it once”

    • enforce it through npm run verify / CI gates

  4. Convert workflows into tests

    • unit tests for domain rules

    • E2E specs for user flows

  5. Make coverage visible

    • separate unit vs E2E

    • publish artifacts

  6. Document the happy path

    • how to run tests locally

    • where coverage lands

    • what “green” means

Codex 5.2 helped keep this loop tight—especially the follow-through steps that humans tend to procrastinate.


Before / After (The Shape of the System)


Outcomes (What Changed, Concretely)

By the end of this effort, the repo gained:

  • a stronger contract + lint foundation

  • ESLint v9 flat config with warning-free lint

  • stylelint + depcheck integrated into the quality gates

  • a comprehensive unit test suite across core services

  • a multi-spec Cypress E2E suite covering real user journeys

  • separate unit and E2E coverage reports

  • CI workflows that upload coverage artifacts on every commit/PR

  • updated docs that make the whole thing discoverable and repeatable

This is the difference between:

  • “we have tests”

  • and “we have quality.”

Stats

Counts (current HEAD)

  •   Unit tests: 68 test cases across 27 tests
  •   Cypress: 7 specs in cypress/e2e, 39 E2E test cases

  Files Touched

  •   94 files changed since linting was introduced 
  •   37 files changed since unit tests were added 
  •   37 files changed since Cypress tests were added 

  E2E Coverage Delta (baseline = first coverage workflow commit 68418c)

  •   Statements: 54.60% → 54.61% (+0.01pp)
  •   Branches: 33.34% → 33.34% (+0.00pp)
  •   Functions: 52.90% → 52.93% (+0.03pp)
  •   Lines: 56.90% → 56.91% (+0.01pp)

  CI Runtime Before/After (quality.yml, success runs only)

  •   Before coverage/tests in CI (pre‑68418c): avg 29.2s, median 32.0s, n=10.
  •   After coverage/tests in CI (post‑68418c): avg 303.7s (~5:04), median 306.0s, n=6.

  tests.yml (unit coverage workflow)

  •   all runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s
  •   success runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s

  e2e.yml (Cypress coverage workflow)

  •   all runs: n=19 avg=229.7s median=284.0s min=38.0s max=310.0s
  •   success runs: n=18 avg=239.8s median=285.0s min=38.0s max=310.0s


Why This Matters (Beyond Preventing Regressions)

A real quality system does more than reduce bugs:

  • Onboarding improves because expectations are written down (and executable).

  • Refactors get cheaper because the safety net is layered.

  • Velocity improves because you stop paying the “regression tax.”

  • Confidence increases because you can prove correctness instead of arguing.

The whole point of quality is not to slow down shipping.

It’s to preserve shipping speed as the product grows.


The Reusable Playbook

If you’re trying to do the same in your own codebase, don’t start with a thousand tests. Start with the ladder.

  1. Define and enforce a contract (rules of the road)

  2. Make lint clean and trustworthy

  3. Build unit tests that map to domain logic

  4. Start with smoke E2E tests, then expand into full flows

  5. Instrument coverage for both layers

  6. Publish coverage artifacts in CI

  7. Treat docs as first-class artifacts

Quality is not a tool.

Quality is a system.

I've experienced the power of Codex as my AI coding agent quality partner - this stuff is real. My app codebase might still be in its infancy, but I know it comes close to what software engineering teams face everyday - especially if you're building enterprise tooling apps - as my ex-teams at AWS build.
Integrating AI-coding agents to handle refactoring, migrations or as in my case - introducing end-to-end quality automated testing - is now a no-brainer with powerful assistants like Codex, and a significant productivity enhancer!

How AI helped me create a Git Repo Storyline

I've been building my SMT App with the codebase hosted on Github since October 2024. It has been an interesting journey of learning indeed! What started out as an experiment to test an idea of mine (basically create a set of practical tools for software managers to manage their teams and delivery plans better, pivoted on a clearly mapped out system architecture) whilst learning and upskilling to the new world of AI-assisted coding - has ended up with me refining and polishing the code to get it to a respectable level, worthy of being used by other people :-)

I'm also a fan of data and visualizations - I thought it would be really cool to have a tool to help visualize the evolution of my codebase. Here's what I created in a few hours:

Saturday, 27 December 2025

Animating Git repo evolution of SMT codebase

I thought I'd have some fun today with Google Antigravity using Gemini and Claude Opus - to mine git repository data and build timeline visualisations. Since I'm a big fan of "Race" timeline graph animations, I thought I'd have a go at building one myself. This video shows how my codebase evolved over time, from a single monolithic index.html/js file to a production-grade, well-architected system. This also tells a story of the journey of coding with AI assistants from the early models to current day. My productivity literally increased exponentially from November, when newer models released -- especially Google's Antigravity platform. And in the last week, Codex 5.2 Max is proving to be quite powerful too -- it would be great if Antigravity expands support for Codex!! #gemini #antigravity #claude #codex #ai #sofware #github


Play with the app here Codebase on Github!

Monday, 15 December 2025

From "Vibe Coding" to Engineering: How I Tamed an AI-Generated Monolith

A survival guide for building software in the age of AI Agents.


If you've played with the latest AI coding tools, you know the feeling. It's intoxicating. You type "Build me a dashboard," and poof—code appears. You feel like a wizard. You are "Vibe Coding"—surfing on a wave of generated features, moving faster than you ever thought possible.

I know this feeling because I lived it. I spent Oct 2024 to June 2025 vibe-coding my way to a feature-rich management app called SMT (Software Management Tools). But then, the wave crashed. I decided the AI wasn't mature enough yet, and decided to take a break from vibe coding further. Until I woke up again in November '25, and in December '25 plunged right back in with the release of newer models and new tooling from Google, especially Antigravity got me hooked solid! Wow, the landscape is changing dramatically!

This is the story of how I went from a chaotic, AI-generated monolith to a nearly professional, engineered codebase. And more importantly, it's a guide on how you can avoid the trap of "Vibe Coding" and start using AI more effectively to build serious software, than just demos.

The Visual Journey: The "Model War" Timeline

The Industry Context: Riding the Wave

To understand why the code changed, you have to understand how the tools changed. The explosion of antigravity features coincided perfectly with the "Agentic Arms Race" of 2025.

Phase 1: The Trap of "Vibe Coding" & The Quota Shuffle (Winter 2025)

After a long summer working flat-out at Amazon, I found myself with a 3-month sabbatical in May 2025. It was a golden age for AI releases. Google had just dropped "Jules" (in Public Beta). Anthropic and OpenAI were trading blows with massive context windows.

I went fast. Recklessly fast.

I treated the AI models like a tag-team of interns.

  • Morning: I'd burn through my Gemini quota building the Gantt chart.
  • Afternoon: I'd switch to Claude for major technical debt refactoring, while using Gemini for UX refactoring and plumbing.
  • Evening: I'd use OpenAI to debug the logic and provide a third-party review.

The "Tower of Babel" Problem:

Each model had a different "personality." Because I wasn't providing governance, the codebase became a schizophrenic mess. main.js was a mix of 3 different coding styles fighting for dominance.

The Agent Blame Game (Whack-a-Mole)

My frustration peaked just yesterday (Dec 14) during the Gantt refactor. I entered a cycle of "Agent Whack-a-Mole":

  1. I was working with Claude 3.5 Opus to stylize the complex SVG dependency arrows. It was working perfectly.
  2. CRASH: "Quota Exceeded."
  3. I swapped to Gemini 2.0 to keep the momentum going. "Gemini, continue styling these arrows."
  4. Gemini, eager to help, rewrote the GanttRenderer class. It fixed the styling but deleted the logic for dependency calculation, breaking the core feature.
  5. My quota reset. I went back to Claude: "Claude, Gemini deleted the logic. Fix what it broke."

I spent hours just mediating disputes between AI models that couldn't see each other's work. It was clear: without a shared "Constitution" (The Contract), they were just tagging over each other's graffiti.

The Structural Evolution (Visualized)

The transformation wasn't just in the code; it was in the very shape of the project.

Phase 1: Inception (Oct 2024)

(Simple, clean, but incapable)


/root
├── index.html
├── LICENSE
└── README.md


Phase 2: The Sprawl (June 2025)

(Features added by "Vibe Coding" with no plan)


/root
├── index.html
├── js/
│   ├── main.js (3000+ lines)
│   ├── utils.js (Everything basket)
│   ├── data.js (Hardcoded state)
│   ├── visualizations.js (D3 spaghetti)
│   └── ... (20+ flat files)
└── css/
    ├── style.css (Global conflict zone)
    └── ...


Phase 3: The Engineered State (Dec 2025)

(Governed by Contract)


/root
├── ai/                 (Agentic Controllers)
├── docs/               (The Contracts)
├── css/
│   ├── components/     (Scoped styles)
│   └── themes/         (Variables)
└── js/
    ├── services/       (Logic Layer)
    ├── components/     (View Layer)
    ├── managers/       (State Layer)
    └── main.js         (Bootstrapper only)


Phase 2: The Pivot to Engineering (Summer 2025)

I realized that if I wanted this project to survive, I had to stop acting like a "prompter" and start acting like a "Principal Engineer."

The shift happened on December 3rd. I didn't ask the AI to write code. I asked it to read a Contract.

The Agent Contracts: A Rule of Law

We didn't just write one prompt; we established a constitutional framework for the AI. This came in the form of two critical documents that every agent was required to ingest before writing a single line of code.

1. The Coding Contract (coding-agent-contract.md)

This document outlawed the bad habits the AI had picked up during the "Vibe Coding" era. It established zero-tolerance policies:

  • "No Defensive Coding for Internal Code": Stop asking if (typeof SystemService !== 'undefined'). Trust the architecture.
  • "The Singleton Rule": All logic resides in Services. No ad-hoc functions in files.
  • "The Window Ban": Writing to window.myVariable was strictly prohibited. State must be encapsulated.

2. The Workspace Canvas Contract (workspace-canvas-contract.md)

This was the game-changer for UI. It stopped the AI from inventing new layouts for every page.

  • "The Semantic Color Rule": Hardcoded hex values like #fff or #000 were banned. The AI had to use semantic variables like var(--theme-bg-primary). This single rule made implementing Dark Mode instant.
  • "The Shell Protocol": Every view had to plug into a standard WorkspaceShell. No more rogue sidebars or inconsistent headers.

Why this matters:

These contracts turned the AI from a chaotic creative into a disciplined engineer. When the AI encountered a problem, it didn't just "fix it"; it checked the Contract to see how it was allowed to fix it.

Pro Tip: The Self-Audit

I didn't just trust the AI to follow the rules. On Dec 11, I ran a "Compliance Report" task. I asked the AI: "Scan the entire codebase. List every file that violates the workspace-canvas-contract.md."

It found 15 violations I had missed. The AI became its own auditor.

Phase 3: The Payoff (The "Agentic Spike")

With the contract in place, I unleashed the new Antigravity agents in December.

In one weekend (Dec 7-8), we processed 47 commits that didn't add a single feature but completely re-wired the application.

  • The "Junk Drawer" must die: We deleted utils.js. In the "Vibe Coding" days, this file became a 2,000-line dumping ground for everything from date formatting to API calls. We exploded it into dedicated Services.
  • 117 spaghetti files became 169 modular classes.
  • Global variables vanished.
  • Theming became a single-line change.

The Discipline of Deferral

The hardest part wasn't the code; it was the patience.

I had a major feature ready to go: Year Planning. In the old "Vibe Coding" days, I would have just asked Gemini to "add it."

Instead, I deferred it.

I spent a week refactoring the Gantt view and the Org view to meet the new Contract first. Only when the architecture was solid did I allow the AI to touch YearPlanning.

  • Result: When I finally asked the AI to refactor YearPlanning (Commit 392ffcd), it got it 90% right on the first shot. Because it had "learned" from the codebase and I had enforced the patterns elsewhere, the most complex feature in the app became the smoothest refactor.

The "Audit Loop" Struggle

It wasn't magic. It was a fight.

Enforcing the contract was a recursive nightmare at first. I would ask the AI to "Fix the compliance issues in GanttPlanning.js."

  1. Run 1: AI removes global variables. (Good)
  2. Run 1 Side Effect: AI introduces a new inline HTML template. (Violation!)
  3. Run 2: I scold the AI: "No inline HTML! Check the Contract!"
  4. Run 2: AI fixes HTML.
  5. Run 2 Side Effect: AI adds a defensive check if (typeof SystemService...). (Violation!)

I often had to prompt the agent 5 or 6 times just to get one file clean. I had to explicitly remind it: "You are an expert engineer. Do not regress on the Contract. Check your own work against coding-agent-contract.md before submitting."

The 5 Deadly Blind Spots

Even with the contract, I learned that Agents have recurring "blind spots" you must watch for:

  1. Zombie Code: They love to write the new class GanttMvc.js but forget to delete the old ganttPlanning.js, leaving you with two conflicting sources of truth.
  2. The "Partial Success" Hallucination: An agent will refactor 3 out of 5 methods and happily announce, "I have completed the refactor!" You have to check the bottom of the file.
  3. The Compatibility Trap: When refactoring, agents often try to be "helpful" by keeping the old broken methods "just in case," creating a hybrid mess instead of a clean break.
  4. Phantom Functions: They assume utils.formatDate exists because it feels right, even if you deleted utils.js yesterday.
  5. The Rename Reflex: Instead of finding the existing method calculateDuration(), they will invent a new one called getTaskLength(), duplicating logic because they didn't index the context.

The "Sync": When It Just Clicks

To be fair, it wasn't all struggle. There were moments of absolute magic.

During the "Agentic Spike" (Dec 7-8), when the Contract was fresh and the Context was clean, I hit a Flow State with the AI that felt telepathic.

  • I'd say: "Refactor the OrgView to use the new Service pattern."
  • The Agent would output 4 perfect files.
  • I'd say: "Excellent. Now do the same for the Gantt view."
  • It would interpret "the same" correctly, applying complex patterns without needing them restated.
  • "You Rock", "Awesome Job", "Perfect" — my chat logs from that weekend are full of these praises.

When you treat the AI as a partner with a shared mental model (the Contract), the friction disappears. You aren't prompting; you're just... building.

The New Metric: CW/H (Curse Words per Hour)

Waleed Kadous recently proposed a new metric for AI quality: Curse Words per Hour (CW/H).

My project logs are the perfect validation of this theory.

  • High CW/H: During the "Blind Spots" and "Whack-a-Mole" phases, my prompts were full of CAPS and desperation: "STOP. YOU DELETED THE LOGIC AGAIN. REVERT. WHAT ARE YOU DOING??" This wasn't just anger; it was a signal of Context Drift. The model had lost the thread.
  • Negative CW/H: During "The Sync," my CW/H went negative (praise acts as -1). "Perfect," "You nail it," "Go ahead."

The Lesson: Your emotional reaction is a debugging tool. If you find yourself swearing at the AI, stop. Do not prompt harder. The Context has drifted. Reset the chat, paste the Contract, and start fresh.

But once it learned? The velocity was unstoppable.

The "Gantt Odyssey": A Case Study in Persistence

If you want to know what "refactoring hell" looks like, look at the git history for the Gantt chart.

  • Nov 23 (The False Start): I tried to "refactor" the monolithic ganttPlanning.js while keeping the "vibe" intact. It failed. The code was too entangled.
  • Nov 25 (The Feature Trap): Instead of fixing the foundation, I asked the AI to add more features (Frappe renderer, toggles). This was a mistake. The main.js file ballooned to 3,000 lines.
  • Dec 9 (The Realization): Commit f4c4845 "WIP - re-architect gantt to mvc." This was the moment I realized the old code had to die.
  • Dec 14 (The Victory): Commit 6c5b6f7 "Complete refactor of gantt and delete legacy ganttPlanning.js."

It took 4 distinct attempts and over 20 commits just to clean up this one feature. The lesson? You cannot refactor spaghetti into lasagna one noodle at a time. Sometimes, you have to throw the bowl away and start fresh with a new recipe (the Contract).

The "Git Worktree" Trap: Parallel Agents & The Danger Zone "Git Merge Hell"

As I got comfortable with Antigravity, I got greedy. I thought: "Why wait for one agent to finish? I'll run TWO."

I used Git Worktrees to run multiple agents in parallel on different branches (feature/gantt and refactor/theming). Agents can make mistakes, some quite serious like messing up context about the branch they supposed to be working on!

  • The Dream: Double the velocity.
  • The Reality: "Merge Hell" and Corruption.

On Dec 8th, I nearly lost the entire Theming refactor. One agent overwrote the css/variables.css file while another was trying to read it. I spent 4 hours manually piecing together "lost" commits (see commit 2ac3f34: "fix: merge ALL missing theming changes from worktree").

The Warning: Antigravity is powerful, but it does not sandbox your agents. If you run them in parallel without strict discipline, they will step on each other's toes. Until we have automatic sandboxing, treat parallel execution like handling uranium: powerful, but deadly if mishandled.

Lessons for the AI-Augmented Developer

If you validly want to move from "Vibe Coding" to "AI Engineering," you need to fundamentally shift your mental model. What I've learnt so far:

1. Shift from "Writer" to "Architect"

The era of the "10x Developer" writing code alone in a basement is dead. We are now in the era of the "10x Architect."

AI generates code at near-zero cost. This means code is no longer an asset; it is a liability. Your job is not to produce more of it; your job is to curate it.

  • The Trap: Asking "Write me a function that does X."
  • The Fix: Asking "Given this architecture pattern, implement interface X for component Y."

Insight: You must have the entire system map in your head because the AI only has the context window.

2. Context Engineering > Prompt Engineering

Stop trying to find the "perfect prompt." It doesn't exist. Instead, focus on Context Engineering.

An AI agent is only as good as the files you feed it. If you feed it spaghetti utils.js, it will write more spaghetti.

  • The Strategy: Create "Context Files" (like our Contracts) that exist solely to guide the AI.
  • The Tactic: Before asking for a feature, pause. Ask yourself: "Does the AI have the current definition of our state management?" If not, paste the Contract first.

3. The Next Frontier: TDD as the Ultimate Spec

I'll be honest: SMT doesn't have a test suite yet. That is my next goal, but the refactoring was time consuming, laborious and very frustrating - either I introduce TDD or build some new features!

Why? Because looking back, I realize that Test-Driven Development (TDD) is the ultimate way to prompt an agent. A natural language prompt is ambiguous ("make it fast"). A failing test case is binary (Pass/Fail).

  • The Plan: We are going to implement a unit testing framework with the help of AI agents.
  • The Workflow: Write the test. Run it (Red). Feed the failure to the AI. Let it fix (Green).

4. Code Durability: Know What to Throw Away

Not all code deserves love. In an AI world, we must distinguish between Foundation Code and Disposable Code.

  • Foundation Code: Core business logic, data models, contracts. This must be reviewed by a human, typed strictly, and protected.
  • Disposable Code: UI prototypes, scripts, experimental features. Let the AI "vibe code" these. If they work, great. If not, delete them and regenerate. Do not fall in love with the prototype.

Final Words: The End of "Magic", The Beginning of Engineering

The journey of building SMT in 2025 taught me that AI is not a replacement for engineering; it is an amplifier of it. If you amplify chaos, you get "Vibe Coding"—a fast track to a tangled monolith. If you amplify structure, you get velocity.

We are entering a golden age of software development. Code is cheaper, faster, and more accessible than ever before. But let's not kid ourselves: this is not an easy journey. The "magic" of a one-shot prompt wears off the moment you need to maintain that code in production. The real work begins when the browser tab closes and the git commit hook runs.

Until AI agents evolve to possess the intuition of a Principal Engineer—knowing instinctively when to create a service vs. a utility, or when to refactor before building—human oversight remains critical. We are not just "prompters"; we are the guardians of the architecture. We provide the constraints (the Contracts) that allow the AI to be creative without being destructive.

My project SMT survived the "Model Wars" of 2025 not because I wrote better prompts, but because I stopped prioritizing speed and started prioritizing structure.

Don't just vibe. Build.

Sunday, 7 December 2025

A weekend with Antigravity: 17,000 Lines of Code in < 48 Hours

This weekend, I didn't just code. I accelerated.

I spent the last 48 hours pair-programming with Antigravity, an advanced AI coding assistant. We took a legacy JavaScript codebase—riddled with global variables, monolithic files, and "spaghetti" dependencies—and transformed it into a modern, service-oriented architecture.

The results go beyond just "helper" status. The metrics suggest a level of productivity that warps the traditional time-cost equation of software development.

This app started out in 2024 as a scrappy concept, MVP, rapidly coded as a single index.html file with inline vanilla Javascript & CSS code - messy spaghetti code that grew and grew, until this weekend when I decided to just stop, refactor and clobber all technical debt. The goal? To experience the full journey of coding with AI and improve the code over time, to eventually get a "modern" app deployed. This code was touched by many AI hands: GPT-o1, Gemini, Claude, Codex 5.1, and also my manual tweaks here and there. 


The Analytics: Friday to Sunday

Let's look at the raw data from git. This represents activity from Friday, December 5th, to Sunday evening, December 7th.

  • Commits: 60 (non-merges)
  • Files Changed: 89
  • Lines Added: 10,468
  • Lines Deleted: 6,755
  • Total Code Churn: ~17,223 lines
17,000 lines of code touched in one weekend.

The Narrative: Deconstructing the Monolith

When we started on Friday, the application was held together by window objects. Functions were exposed globally, state was scattered across main.js and utils.js, and components were tightly coupled to the DOM via inline onclick handlers.

Here is the story of our weekend, told through the commit log:

  1. The Purge of Globals: We began by hunting down window. dependencies. One by one, we replaced fragile global state with rigid, testable Singletons and Services.
  2. Breaking main.js: We took the 1,000+ line main.js file and chopped it up. We extracted logic into dedicated domains: PlanningService, AIService, SystemService, and NotificationManager.
  3. Refactoring the UI: We implemented a VIEW_REGISTRY pattern, moving away from ad-hoc function calls to structured Class-based component management.
  4. Safety Check-ins: You see this pattern in the logs constantly: "Safety checkin - refactoring...". We moved fast, but we didn't break things. The AI allowed me to make sweeping architectural changes with the confidence that I wasn't leaving behind broken references.

The "Antigravity" Effect: AI vs. Human Effort

How long would this have taken a human developer working alone?

The Human Estimate:

To touch 89 files and rewrite the core architecture of an application requires immense cognitive load.

  • Day 1-3: Reading code, mapping dependencies, planning the refactor safe-zones.
  • Day 4-8: Executing the refactor file-by-file. Manually checking imports. Wrangling with loads of ReferenceError
  • Day 9-10: Regression testing. Realizing you broke the "Save" button because it relied on a global variable you deleted three days ago.

Conservative Estimate: 2 Weeks (1 Sprint) of full-time, focused work.

The Antigravity Reality:

We did it in 14.7 hours.

I analyzed the timestamps of every commit. Our total active coding time—the actual time spent typing, refactoring, and verifying—was less than 15 hours.

  • Saturday: ~5 hours of strategic refactoring.
  • Sunday: ~9.5 hours of high-intensity execution.

The AI didn't just "autocomplete" lines. It understood the architectural intent. When I said "Refactor this to use a Singleton pattern", it didn't just write a class; it found every usage of the old global variable across 20 files and updated them instantly. It remembered the context I would have forgotten.

This wasn't just faster coding. It was higher leverage coding. I spent my weekend making high-level architectural decisions, while Antigravity handled the thousands of lines of execution required to make those decisions reality.


Conclusion

If this weekend proved anything, it's that the metric for "developer productivity" is changing. We aren't limited by typing speed or syntax recall anymore. We are limited only by our ability to describe the architecture we want.

My small experiment shows a software engineer would've saved two weeks of work in 15 hours! 

Did I waste my weekend coding with AI? Maybe...but I'm learning a ton and having fun coding after a very long time...it's like I'm back to my early teenage days of losing myself inside the machine...


Tuesday, 2 December 2025

A week with Antigravity animated by Google Gemini

Codebase Evolution: Time-Lapse

CSS
JS
HTML
Other

Fun animations with Google Gemini - my repo stats

Daily Throughput & Key Milestones