Started in 2011 as my outlet to share my experiences on personal life and work topics. My handle is khanmjk. My friends & colleagues call me "Mo".
This blog is an experiment, work-in-progress.
Life: Self-awareness, personal development, growth hacking (quantified self).
Work: Software and Systems Engineering, Leadership, Management, New Ideas and OpEds.
In my previous post, I shared how NotebookLM synthesised the evolution of my experimental app, SMT (Software Management Tools) from my Gemini chat transcripts till June 2025. I then went further and added just the latest Readme file as an additional source for NotebookLM to see if it could connect the dots from June 2025 to December 2025. I also created a simple dashboard using Antigravity for my repo's storyline here.
NotebookLM generated this infographic that was spot on!
By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.
Since the June 2025 MVP Version 1, I woke up again in November and added a ton of features. My productivity sky rocketed with the release new LLMs and coding assistants. I added a powerful AI feature which I think is a game changer for software management. I clobbered technical debt, rebuilt the app UX, added quality controls and overall rearchitected the codebase to something more respectable than a toy app coded by AI...
NotebookLM generated this audio overview, mostly correct!
Over the last year, I've been building an experimental app, SMT (Software Management Tools) that started out initially out of curiosity to test LLM's ability of code generation, before the proliferation of AI coding agents and integrated development environments or vibe coding platforms. I worked on this project in my spare time over weekends and evenings, testing the waters - often getting quite frustrated in the early days, sometimes having streaks of productivity and other times, just pure going in circles and breaking things - often found myself cursing the AI :-) I didn't give up, despite the breaks with long periods of inactivity, I kept monitoring the latest AI news and releases - and continued to test the waters. Each time, learning something new, seeing the progress of LLMs, witnessing the pure potential this technology has to not only disrupt the software industry, but also the immense potential at empowering people to translate ideas and concepts into prototypes, without depending on outsourced developers. The journey of learning continues. I stuck with Gemini because it has, since the beginning been enormously generous with quotas and large context windows, unlike Claude and ChatGPT at the time. Even today, I prefer to stick with learning just one tool like Antigravity than to context switch with others - although my workflow includes Antigravity Agent Manager, and a separate VSCode with Codex to audit changes and pick up where Gemini or Claude Opus fails to complete their tasks.
In this post, I'm sharing some history of SMT. How did all begin? Interestingly enough, I went through a phase of saving major chat sessions with Gemini, that led to the first MVP of SMT. I saved all my prompts in google docs. I stopped tracking my prompts in June because it got quite tedious! With this chat record, I wanted to see what Google's NotebookLM would make of the doc's contents, here is what it produced - super fascinating the power of NotebookLM!
NotebookLM generated this infographic that was spot on!
By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.
NotebookLM generated this audio overview, mostly correct!
NotebookLM generated this slide deck...
NotebookLM generated this video...
My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!
Sharing some of my workflow experience in building my SMT (Software Management Tools) application using AI as my coding assistant. With the launch of Google's Antigravity platform in November 2025, my curiosity got the better of me! I re-engaged with my codebase after about 5 months of dormancy. In under 2 months I was able to migrate the look and feel to a new UX paradigm, introduce a coding contract and constitutional framework, introduce dark/light mode theming, fundamentally refactoring the codebase to best practices software design patterns, integrate AI-features into the app, and clobber technical debt. Thanks to Antigravity... what this video shares is how powerful visual annotations can be for changing UX... As I experienced this journey, I became more aware the future of UX and Frontend Engineers is going to be disrupted quite radically!
If you’ve ever tried toinstitutionalizecode quality (not just talk about it), you’ll know the trap: quality work is rarely one big change. It’s a hundred small, unglamorous decisions—done consistently—until the codebase starts behaving like a system.
That’s what this post is about.
I’ve been evolving a real planning product codebase (“altsoftwareplanning”)—workflows for planning, roadmaps, org views, capacity tuning, and a growing set of UI/interaction patterns. It shipped features. I think my app will solve real problems. Yes, it's still a concept, and I wanted to ensure it's being built the right way. But the safety net was thin.
At some point, “it works” stops being good enough.
I wanted a codebase where I can say, with a straight face:
We can prove it works.
Codex 5.2 turned out to be the right partner for thelong-runningversion of this work: contract enforcement, lint hardening, unit testing, E2E testing, coverage instrumentation, CI visibility, and documentation—all in a compounding sequence that actually sticks.
Codex 5.2 CLI helped transform an active product codebase into a quality-first system by enforcing contracts, hardening lint discipline, building broad unit + Cypress E2E coverage, and surfacing coverage artifacts in CI—over a sequence of compounding iterations.
The Landscape: What We Were Working With
You can’t talk quality without acknowledging the stack. The strategy must fit the reality.
CI:GitHub Actions, publishing coverage artifacts on every push/PR
This is no longer a toy repo! It demonstrates patterns for a real product surface area with enough moving parts to punish sloppy changes.
The Problem: Velocity Without Proof
We had signals, but they weren’t a coherent system:
some rules existed (informally)
linting existed (but was noisy and inconsistent)
testing existed (but not at a level that lets you refactor with confidence)
E2E coverage didn’t reflect how users actually flow through the product
coverage existed locally, not visibly
In other words:tribal knowledge plus hope.
The goal wasn’t “add tests.” The goal was aquality ladder—each rung enabling the next.
The Quality Ladder (The Sequence Matters)
Here’s the order of operations that worked, and why:
Codify a contract(design rules, data flow rules, UI rules)
Make lint clean and trustworthy(so warnings mean something)
Cover domain logic with unit tests(turn behavior into executable specs)
Add E2E tests for real workflows(where regressions actually happen)
Instrument coverage(otherwise you’re guessing)
Publish coverage in CI(visibility changes behavior)
Keep docs current(quality must be repeatable)
Codex 5.2 helped me climb this ladder without losing the thread.
Phase 0: Compliance Cleanup (The Real Beginning)
Every long-running quality push has a starting “wake up” moment.
For this repo, that moment looked likecompliance cleanup:
removing riskyinnerHTMLusage and window-bound patterns
tightening DOM checks and wiring
simplifying defensive guards that hid intent
This wasn’t glamorous. It was foundational. You can’t build a quality system on top of brittle glue.
Phase 1: Turn Ideas Into Contracts
Before I was willing to scale testing, I needed the codebase to have enforceable rules—especially in the UI layer.
We already had an early foundation:
a contract-scanning script
a “coding contract” that enforces rules like:
no inline styles(even for dynamic values)
DOM-only toolbars(explicit, deterministic wiring)
centralized storagepatterns (no random settings scattered across views)
Then we reinforced it.
What changed (in practical terms)
Inline style mutations were removed or centralized.
View code was pushed toward CSS classes / CSS variables.
Global header/toolbars were wired consistently.
Storage moved behind the repository boundary (e.g., SystemRepository).
This is the crucial insight:
Quality starts with clarity. If you can’t describe the rules, you can’t test them.
Codex 5.2 was useful here because it didn’t just “fix a file.” It was willing to chase contract violations across dozens of small edits until the rule was actually enforced.
We also fixed dependency checks so coverage tooling is first-class and green in CI (depcheck included).
This matters because it changes the social contract:
Coverage stops being “something one dev ran.” It becomes a team-visible signal.
What Codex 5.2 Actually Enabled
This wasn’t “AI wrote tests.” That’s the shallow version of the story.
Codex 5.2 behaved more like aquality software-engineer-in-test (SWET) expert partner who doesn’t lose the thread that worked for hours (I exhausted my five hour quota on more than one occasion):
it read the codebase and translated workflows into test cases
it stayed consistent with constraints across many commits
it revisited flaky E2E specs and hardened selectors
it fixed CI failures immediately instead of letting them linger
it updated docs alongside implementation changes
Short tasks are easy.
The long game—contracts + lint + unit + E2E + coverage + CI + docs—requires continuity.
That’s where Codex 5.2 shined.
Hint: If you've been delaying a long term technical debt task to introduce (or migrate to other frameworks) unit tests and integration tests, integrated with your CI/CD workflows, you no longer have to do this yourself. Codex 5.2 is a very capable model that can do literally all the heavy lifting for you. Codex can help transform your codebase in a matter of hours, taking your codebase quality to the next level.
The Working Loop I Used (Practical, Not Magical)
If you want to replicate this kind of quality transformation, here’s the loop that worked:
Start with an audit task
“Scan the repo for contract violations and list them by severity.”
“List lint rule violations that prevent a clean run.”
Fix in small diffs
insist on small PR-sized changes
prevent “helpful refactors” that mix concerns
Lock in with tooling
don’t accept “we fixed it once”
enforce it throughnpm run verify/ CI gates
Convert workflows into tests
unit tests for domain rules
E2E specs for user flows
Make coverage visible
separate unit vs E2E
publish artifacts
Document the happy path
how to run tests locally
where coverage lands
what “green” means
Codex 5.2 helped keep this loop tight—especially the follow-through steps that humans tend to procrastinate.
Before / After (The Shape of the System)
Outcomes (What Changed, Concretely)
By the end of this effort, the repo gained:
a stronger contract + lint foundation
ESLint v9 flat config with warning-free lint
stylelint + depcheck integrated into the quality gates
a comprehensive unit test suite across core services
a multi-spec Cypress E2E suite covering real user journeys
separate unit and E2E coverage reports
CI workflows that upload coverage artifacts on every commit/PR
updated docs that make the whole thing discoverable and repeatable
This is the difference between:
“we have tests”
and“we have quality.”
Stats
Counts (current HEAD)
Unit tests: 68 test cases across 27 tests
Cypress: 7 specs in cypress/e2e, 39 E2E test cases
Files Touched
94 files changed since linting was introduced
37 files changed since unit tests were added
37 files changed since Cypress tests were added
E2E Coverage Delta (baseline = first coverage workflow commit 68418c)
Statements: 54.60% → 54.61% (+0.01pp)
Branches: 33.34% → 33.34% (+0.00pp)
Functions: 52.90% → 52.93% (+0.03pp)
Lines: 56.90% → 56.91% (+0.01pp)
CI Runtime Before/After (quality.yml, success runs only)
Before coverage/tests in CI (pre‑68418c): avg 29.2s, median 32.0s, n=10.
After coverage/tests in CI (post‑68418c): avg 303.7s (~5:04), median 306.0s, n=6.
tests.yml (unit coverage workflow)
all runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s
Onboarding improvesbecause expectations are written down (and executable).
Refactors get cheaperbecause the safety net is layered.
Velocity improvesbecause you stop paying the “regression tax.”
Confidence increasesbecause you can prove correctness instead of arguing.
The whole point of quality is not to slow down shipping.
It’s to preserve shipping speed as the product grows.
The Reusable Playbook
If you’re trying to do the same in your own codebase, don’t start with a thousand tests. Start with the ladder.
Define and enforce a contract(rules of the road)
Make lint clean and trustworthy
Build unit tests that map to domain logic
Start with smoke E2E tests, then expand into full flows
Instrument coverage for both layers
Publish coverage artifacts in CI
Treat docs as first-class artifacts
Quality is not a tool.
Quality is a system.
I've experienced the power of Codex as my AI coding agent quality partner - this stuff is real. My app codebase might still be in its infancy, but I know it comes close to what software engineering teams face everyday - especially if you're building enterprise tooling apps - as my ex-teams at AWS build. Integrating AI-coding agents to handle refactoring, migrations or as in my case - introducing end-to-end quality automated testing - is now a no-brainer with powerful assistants like Codex, and a significant productivity enhancer!
I've been building my SMT App with the codebase hosted on Github since October 2024. It has been an interesting journey of learning indeed! What started out as an experiment to test an idea of mine (basically create a set of practical tools for software managers to manage their teams and delivery plans better, pivoted on a clearly mapped out system architecture) whilst learning and upskilling to the new world of AI-assisted coding - has ended up with me refining and polishing the code to get it to a respectable level, worthy of being used by other people :-)
I'm also a fan of data and visualizations - I thought it would be really cool to have a tool to help visualize the evolution of my codebase. Here's what I created in a few hours: