Showing posts with label Creativity. Show all posts

Sunday, 8 February 2026

I built a “Steam Workshop” for system architecture + product roadmaps + org blueprints (runs in your browser)"

Have you ever wanted to install the architecture of a product the way you install an app?

Not just a diagram, but the whole system architecture blueprint:

services and dependencies
teams and ownership
goals → initiatives → work packages
a 3-year planning horizon for product managers
plus a “prompt pack” so you can remix it

That’s what I’ve been building as a hobby project: SMT (Software Management & Planning Tool).

And this week I shipped something I’m genuinely excited about:

The Community Blueprints Marketplace (with social features)

You can now publish blueprint packages publicly, and the community can:

browse and install them
star them
comment / discuss them
see what’s trending (social proof + discovery loops)

Think: Figma Community / Steam Workshop, but for product architecture + product/team organization.

What’s a “Blueprint” in SMT?

A blueprint is a portable package (JSON) that contains:

a manifest (title, summary, tags, trust label, etc.)
a prompt pack (seed + variants like MVP/Scale)
a full system snapshot (teams, services, goals, initiatives, work packages)

The goal is learning through interaction:

install a blueprint
explore its org + architecture + roadmap in the app
remix it into your constraints
publish your remix back to the marketplace

Why I’m doing this

Most “reference architectures” online are:

static
divorced from org/team realities
not easily remixable
missing the roadmap/execution story

SMT tries to make “how a product might actually run” tangible:

architecture + org design + planning are all connected
you can poke at it, not just read it

SMT makes it possible for you to inspect any type of tech platform. Think LeetCode interview preps but for system design, architecture, team topologies, product roadmaps and software delivery planning.

Local-first by default (privacy + zero friction)

SMT is intentionally local-first:

it runs as a static app in the browser
your systems stay in your browser unless you explicitly publish a blueprint package

The cloud marketplace is optional and only powers:

public publishing
discovery
stars/comments

No “workspace sync” SaaS lock-in.

How the social marketplace works (simple + cost-free)

To keep this sustainable on the free tier, the backend is:

Cloudflare Workers + D1 (SQLite)
token-normalized search (no paid search / no vector DB)
GitHub OAuth for identity (scope: read:user only)

Important security bit:

public publishing does secret scanning (manifest + full system payload) and blocks likely API keys/tokens.

Try it (and please break it, it's a WIP hobby project)

1) Open the app

2) Explore the “Community Blueprints” view

browse the curated catalog
click Preview on anything that looks interesting
install an Available blueprint and inspect it across:
- System Architecture Overviews
- Org Design
- Roadmap & Backlog Management
- Year Plan / Detailed Planning

3) Publish + socials

in the publish flow, use Publish Publicly
then open Preview on your published blueprint:
- Star/Unstar it
- Post a comment
- sanity check that it’s discoverable via search

If anything fails, I want to know. Use the Feedback feature to log issues.

What I’d love feedback on (high signal)

Does the blueprint concept actually help you understand a product faster?
Are the “prompt packs” useful, or just noise?
What should “trending” mean here: stars, downloads, recency, or something else?
What social features would make this fun without turning it into a moderation nightmare?

If you want to contribute

This is open source (CC0) and I’m happy to collaborate.

Repo: https://github.com/khanmjk/altsoftwareplanning
Issues / PRs welcome: docs, UX polish, seed blueprints, moderation workflows, “remix lineage”, collections, reputation, etc.

Roadmap ideas (if the community likes this)

Remix lineage: “Forked from…” + remixes graph
Lightweight contributor reputation (badges, trust tiers)
Reporting/flagging + moderation queue
Curated collections (“Backends 101”, “B2B SaaS starters”)

If any part of this sparks your curiosity, I’d love for you to try it for 5 minutes and tell me what confused you, what felt magical, and what felt pointless.

Drop a comment here, or open an issue on GitHub.

Saturday, 7 February 2026

A Day Building new features on SMT using Codex App with Codex 5.3

AI Build Journal · February 7, 2026 - written by Codex to me...OpenAI released codex app for mac this week, so I decided to have a go, and boy - am I blown away!!! In just one day, Codex helped me clear much of my SMT backlog, after a month's break from my AI-coding frenzy from Dec'25.

A Day Building on Codex App with Codex 5.3

This was not “prompt in, code out.” This was a full-day product session: strategy debates, UX corrections, contract audits, feature pivots, test hardening, documentation, and ship.

I started the day with one objective: execute the next phase plan for SMT Platform without losing quality. By the end of the day, we had shipped one of the most ambitious increments in the project so far: Goal Inspections + Community Blueprints Exchange, including end-to-end contribution flow, install lifecycle logic, catalog operations, and test coverage.

What We Shipped in One Day

Goal lifecycle + inspections system: owner status, weekly comments, PTG tracking, stale/mismatch detection, leadership report table.
Year Plan CSV/XLSX export: production export flow in toolbar with tested serialization and schema-aware payload handling.
Community Blueprints Exchange: Top-100 curated catalog, preview modal, search/filter, publish flow, package validation, and install lifecycle UX.
Launch package generation upgrade: moved to domain-authored-curated-v2 for the launch-25 package set.
Hardening + compliance: contract remediation pass, UX consistency fixes, event rebinding bug fix, and regression-proof e2e updates.

The Metrics That Matter

Metric	Result
Session duration	11h 56m 43s (10:08:30 → 22:05:13)
Timestamped worklog checkpoints	117
Commits shipped	4 (`2671b68`, `a502106`, `c9d8d43`, `97bec52`)
Code delta (same-day commits)	+129,166 / -7,770 (net +121,396)
Unique files touched	77 (113 file-change events across commits)
New files created	27
Unit test progression	90 → 117 tests (+30%)
E2E test progression	51 → 58 tests across 8 → 9 specs
Community blueprint footprint	Top-100 catalog + Launch-25 curated packages

Note: the large insertion volume includes generated blueprint catalog/package artifacts in addition to application code.

How This Compared to “Typical Solo Dev Pace”

A conservative estimate for this scope with one human engineer is 2–3 weeks: feature architecture, UI wiring, persistence, migration work, docs, and full regression coverage. Here, the value of Codex 5.3 was not just speed in typing code. The leverage came from:

Staying in implementation mode continuously while preserving test discipline.
Switching quickly between product decisions, coding, debugging, and documentation.
Keeping a verifiable trail (/docs/worklogjournal.md) so context did not get lost.

This Was Collaboration, Not Task Dispatch

The most important part of the day was the interaction pattern. We did not run a one-way backlog. We debated quality and credibility:

You challenged weak UX states (Install should be locked when unavailable), and we corrected behavior at both tile and preview levels.
You challenged data realism for “inspired-by” systems, and we replaced simplistic seed generation with richer domain-authored package generation.
You enforced coding contracts, and we ran an explicit compliance audit plus remediation pass before final push.
You required proof, not promises, so every major change ended with lint/unit/e2e verification.

The real unlock is not “AI writes code faster.” It is “human judgment + AI execution + strict verification” as one continuous loop.

Lessons Learned

Contracts first, always: when contract rules are explicit, quality issues become detectable and fixable quickly.
Feature credibility beats feature count: shipping a marketplace means realism, not placeholder parity.
Tests are collaboration memory: every bug found late became a permanent test so the fix does not regress.
Worklogs scale agentic development: detailed timestamped logs made long-session continuity possible.

What’s Next

The obvious next move is to raise the “real-world blueprint” bar further: richer domain fidelity, stronger package QA gates, and a true contribution-driven exchange loop where users generate, validate, publish, and learn from each other’s systems.

Built on SMT Platform using Codex 5.3 · evidence from /docs/worklogjournal.md and same-day git history.

Sunday, 4 January 2026

Personametry - a decade's journey of time tracking now enhanced with AI

A summary of 2025 performance and intro to personametry.com

It is the time when I share my Performance metrics for the previous year, this card pretty much sums it up, logging 8700 hours, with 3006 time entries, tracking close to 24 hours per day:

Background

For the last ten years I've been running an experiment in logging my time spent activities like: Work, Family, Me Time, Sleep, Spirituality, etc. In 2015 I developed a model for personal development, called RAGE (Reality, Aspiration, Goals, Expectations). In 2016, I got more serious by inspecting my time across all areas of my life against my RAGE model, which triggered deeper reflection on my aspirations versus reality. For the first three years, I maintained a rhythm of personal monthly performance reviews (PMPRs) and then transitioned to quarterly, mid-year and final year reviews. At the start of each new year, I would dive deep into the previous year's data - build analytics and dashboards, which I shared on this blog.

Context about my workflow - the early days

In the early days, my process for insights was quite manual. Logging my time was easy, using Harvest App, which I'd been introduced to by a good friend, Farid - around the time I switched to professional consulting, servicing some contracts with Crossbolt that expected Harvest timesheets for billing. Incidentally, Farid was the source of inspiration for me to critically think about Reality V Aspirations that led to me creating my RAGE model.

Generating reports initially started with exporting from Harvest, and importing to Excel and running pivot tables and charts, using content for my blog posts. I needed a way to transform the Harvest data to higher level constructs - so I transitioned to Amazon Quicksight (now Quick Suite), using an AWS Free Tier account. Quicksight was useful in acting as a yearly dataset, creating analysis that I would have done in Excel (so replaced excel) and created the dashboards, which I'd then copy and share in this blog. A downside of Quicksight is it's a closed system, had no way of publishing dashboards for public sites (like Google docs embedded pages mechanism). The free tier also prevented me from using its built-in insights features, and more recently Quicksight's AI analysis. I added Google slides to my workflow, sharing my deep dives as in this post. As AI tooling emerged, I transitioned to AI analysis as described here.

Introducing my latest workflow - finally, the Personametry Dashboard is born - ZERO Workflows

I spent just under 5 days building my Personametry app with Google's Antigravity as my coding partner. What a journey (look out for a future post). Since November 2025, I've been learning how to build apps with Antigravity, at first building my SMT app, then building tools for work - and I had enough insights to get Personametry app built. What's my new workflow then? Everything is now automated, apart from my manual time logging. I've built a dashboard that syncs daily with Harvest data, through an automated GitHub actions workflow that pulls time entries via the Harvest API. Harvest is so cool that they allow even free users full access to their APIs. An automated data transformation job runs that cleans up the data and transforms it just the way I used to do the meta level transforms using Quicksight. So no more Quicksight. All the dashboards refresh automatically. I no longer need to create Google slides anymore. At the start of each year, I'd usually spend about a week's time analysing, reflecting and creating dashboards. Now my analysis can be anytime, with zero manual work. Giving a week's time back! Yes, anyone has access to my data and dashboard, I don't mind sharing because I believe other folks could benefit from my experiment, decide to start their own tracking journey or build an app for themselves. The codebase is on GitHub.

Personametry.com is more than just a dashboard - introducing Machine Learning

With my rich dataset, there's opportunities for applying machine learning forecasting techniques and instrumenting goals. Check out the Machine Learning page. I can now tune my personas and in real time see the effects, example: If I reduce my sleep hours, where would the gains go? If I reduce my work hours, subject to constraints, what can I do? If I invest in health and fitness, what's the impact on Family time, etc. For me, this is a game changer. The app will evolve and learn as the dataset is updated, without having to change code or do manual imports! I might have to tweak the code just a little to cater for special years like sabbatical breaks though.

What's next - where am I going with this?

Version 1.0.0 is now live! Depending on how much time I have in 2026, I will look at embedded AI data driven analysis into Personametry.com, leverage conversational analysis. Ultimately I'm still striving to build the perfect personal assistant that just "knows" me. I will look at bringing in additional data sources like Strava, Netflix, Youtube, even integrating Islamic and Gregorian calendars. And finally I'll hook in a RAGE scorecard to match my time against the RAGE model! I could also turn this into a paid platform service, creating a platform for anyone to sign up and build their own RAGE model personas and track with Personametry.com!

Wednesday, 31 December 2025

The SMT Chronicles - MVP Version 2 - NotebookLM story

In my previous post, I shared how NotebookLM synthesised the evolution of my experimental app, SMT (Software Management Tools) from my Gemini chat transcripts till June 2025. I then went further and added just the latest Readme file as an additional source for NotebookLM to see if it could connect the dots from June 2025 to December 2025. I also created a simple dashboard using Antigravity for my repo's storyline here.

NotebookLM generated this infographic that was spot on!

By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.

Since the June 2025 MVP Version 1, I woke up again in November and added a ton of features. My productivity sky rocketed with the release new LLMs and coding assistants. I added a powerful AI feature which I think is a game changer for software management. I clobbered technical debt, rebuilt the app UX, added quality controls and overall rearchitected the codebase to something more respectable than a toy app coded by AI...

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

The SMT Chronicles - MVP Version 1 - Oct 24 to June 25, building with AI co-pilots

Over the last year, I've been building an experimental app, SMT (Software Management Tools) that started out initially out of curiosity to test LLM's ability of code generation, before the proliferation of AI coding agents and integrated development environments or vibe coding platforms. I worked on this project in my spare time over weekends and evenings, testing the waters - often getting quite frustrated in the early days, sometimes having streaks of productivity and other times, just pure going in circles and breaking things - often found myself cursing the AI :-) I didn't give up, despite the breaks with long periods of inactivity, I kept monitoring the latest AI news and releases - and continued to test the waters. Each time, learning something new, seeing the progress of LLMs, witnessing the pure potential this technology has to not only disrupt the software industry, but also the immense potential at empowering people to translate ideas and concepts into prototypes, without depending on outsourced developers. The journey of learning continues. I stuck with Gemini because it has, since the beginning been enormously generous with quotas and large context windows, unlike Claude and ChatGPT at the time. Even today, I prefer to stick with learning just one tool like Antigravity than to context switch with others - although my workflow includes Antigravity Agent Manager, and a separate VSCode with Codex to audit changes and pick up where Gemini or Claude Opus fails to complete their tasks.

Here's the activity story from GitHub:

I also created a simple dashboard using Antigravity for my repo's storyline here.

In this post, I'm sharing some history of SMT. How did all begin? Interestingly enough, I went through a phase of saving major chat sessions with Gemini, that led to the first MVP of SMT. I saved all my prompts in google docs. I stopped tracking my prompts in June because it got quite tedious! With this chat record, I wanted to see what Google's NotebookLM would make of the doc's contents, here is what it produced - super fascinating the power of NotebookLM!

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!

Sunday, 28 December 2025

How I use visual annotations with Antigravity to build UI / UX

Sharing some of my workflow experience in building my SMT (Software Management Tools) application using AI as my coding assistant. With the launch of Google's Antigravity platform in November 2025, my curiosity got the better of me! I re-engaged with my codebase after about 5 months of dormancy. In under 2 months I was able to migrate the look and feel to a new UX paradigm, introduce a coding contract and constitutional framework, introduce dark/light mode theming, fundamentally refactoring the codebase to best practices software design patterns, integrate AI-features into the app, and clobber technical debt. Thanks to Antigravity... what this video shares is how powerful visual annotations can be for changing UX... As I experienced this journey, I became more aware the future of UX and Frontend Engineers is going to be disrupted quite radically!

Play with the SMT app here Codebase on Github

Codex 5.2 vs the Long Game: Building a Quality Ladder in a Real Product Codebase

If you’ve ever tried to institutionalize code quality (not just talk about it), you’ll know the trap: quality work is rarely one big change. It’s a hundred small, unglamorous decisions—done consistently—until the codebase starts behaving like a system.

That’s what this post is about.

I’ve been evolving a real planning product codebase (“altsoftwareplanning”)—workflows for planning, roadmaps, org views, capacity tuning, and a growing set of UI/interaction patterns. It shipped features. I think my app will solve real problems. Yes, it's still a concept, and I wanted to ensure it's being built the right way. But the safety net was thin.

At some point, “it works” stops being good enough.

I wanted a codebase where I can say, with a straight face:

We can prove it works.

Codex 5.2 turned out to be the right partner for the long-running version of this work: contract enforcement, lint hardening, unit testing, E2E testing, coverage instrumentation, CI visibility, and documentation—all in a compounding sequence that actually sticks.

Codex 5.2 CLI helped transform an active product codebase into a quality-first system by enforcing contracts, hardening lint discipline, building broad unit + Cypress E2E coverage, and surfacing coverage artifacts in CI—over a sequence of compounding iterations.

The Landscape: What We Were Working With

You can’t talk quality without acknowledging the stack. The strategy must fit the reality.

App layer: vanilla JavaScript, HTML, CSS (no framework), service-oriented front-end modules under /js
Visualization tooling: D3, Mermaid, Frappe Gantt
Background logic: a feedback worker (smt-feedback-worker) and AI-adjacent utilities
Quality/tooling: ESLint (flat config), stylelint, depcheck, plus a custom contract scanner
Testing:
- Unit: Vitest + jsdom
- E2E: Cypress
- Coverage: Istanbul + nyc instrumentation (unit + E2E), plus @cypress/code-coverage
CI: GitHub Actions, publishing coverage artifacts on every push/PR

This is no longer a toy repo! It demonstrates patterns for a real product surface area with enough moving parts to punish sloppy changes.

The Problem: Velocity Without Proof

We had signals, but they weren’t a coherent system:

some rules existed (informally)
linting existed (but was noisy and inconsistent)
testing existed (but not at a level that lets you refactor with confidence)
E2E coverage didn’t reflect how users actually flow through the product
coverage existed locally, not visibly

In other words: tribal knowledge plus hope.

The goal wasn’t “add tests.” The goal was a quality ladder—each rung enabling the next.

The Quality Ladder (The Sequence Matters)

Here’s the order of operations that worked, and why:

Codify a contract (design rules, data flow rules, UI rules)
Make lint clean and trustworthy (so warnings mean something)
Cover domain logic with unit tests (turn behavior into executable specs)
Add E2E tests for real workflows (where regressions actually happen)
Instrument coverage (otherwise you’re guessing)
Publish coverage in CI (visibility changes behavior)
Keep docs current (quality must be repeatable)

Codex 5.2 helped me climb this ladder without losing the thread.

Phase 0: Compliance Cleanup (The Real Beginning)

Every long-running quality push has a starting “wake up” moment.

For this repo, that moment looked like compliance cleanup:

removing risky innerHTML usage and window-bound patterns
tightening DOM checks and wiring
simplifying defensive guards that hid intent

This wasn’t glamorous. It was foundational. You can’t build a quality system on top of brittle glue.

Phase 1: Turn Ideas Into Contracts

Before I was willing to scale testing, I needed the codebase to have enforceable rules—especially in the UI layer.

We already had an early foundation:

a contract-scanning script
a “coding contract” that enforces rules like:
- no inline styles (even for dynamic values)
- DOM-only toolbars (explicit, deterministic wiring)
- centralized storage patterns (no random settings scattered across views)

Then we reinforced it.

What changed (in practical terms)

Inline style mutations were removed or centralized.
View code was pushed toward CSS classes / CSS variables.
Global header/toolbars were wired consistently.
Storage moved behind the repository boundary (e.g., SystemRepository).

This is the crucial insight:

Quality starts with clarity. If you can’t describe the rules, you can’t test them.

Codex 5.2 was useful here because it didn’t just “fix a file.” It was willing to chase contract violations across dozens of small edits until the rule was actually enforced.

Phase 2: Make Linting a Trusted Signal

Linting is only useful if “clean” is achievable.

So we did the boring work:

cleared ESLint warnings (unused locals, scoping issues, assignment-in-conditional traps)
standardized hasOwnProperty usage
tightened property access and scoping

Only then did we upgrade:

ESLint v9
flat config
reduced deprecated dependencies

Once lint was noise-free, a new lint warning started meaning something again.

This is one of those hidden tipping points: when lint becomes a real gate, behavior shifts.

Phase 3: Unit Tests That Match the Domain

This was not a token unit-test pass. The suite mapped to the product’s domain logic.

We built unit tests across:

Planning
Roadmap
Org
System
Initiative
Forecast
WorkPackage
App state helpers and supporting utilities

Tooling:

Vitest + jsdom (DOM simulation where needed)
later: Vitest v4 upgrade (refreshed the Vite/esbuild chain and cleared CI audit issues)

The real win wasn’t “coverage percentage.”

The real win was this:

Domain behavior became executable.

Codex 5.2 helped by translating real product flows into testable units—without scattering random micro-tests that don’t align to how the app behaves.

Phase 4: Cypress E2E Tests for Real Flows

Unit tests prove logic.

E2E tests prove the product.

We started with smoke tests and selector stabilization, then expanded into specs that mirror how users actually move:

Core UI and workflow validation
Planning and detailed planning flows
Product management flows
System creation/editing and org management
Settings and AI feedback flows
Smoke tests and selector hardening

Two deliberate choices here:

Readable and commented tests. These become “living docs.”
Selector hardening before scale. Flaky selectors are how E2E suites die.

Codex 5.2 showed long-running strength here: iterating until the suite is stable, not just “present.”

Phase 5: Coverage You Can See (Not Just Locally)

Testing without visibility is guesswork.

So we instrumented coverage for both layers:

Unit coverage: Istanbul
E2E coverage: @cypress/code-coverage

We intentionally split reports:

coverage/unit
coverage/e2e

And then made it visible:

GitHub Actions uploads coverage artifacts on every push and PR
no special tooling required to inspect results—just download the artifact

CI now runs (as part of the quality pipeline):

npm run test:coverage
npm run test:e2e:coverage

We also fixed dependency checks so coverage tooling is first-class and green in CI (depcheck included).

This matters because it changes the social contract:

Coverage stops being “something one dev ran.” It becomes a team-visible signal.

What Codex 5.2 Actually Enabled

This wasn’t “AI wrote tests.” That’s the shallow version of the story.

Codex 5.2 behaved more like a quality software-engineer-in-test (SWET) expert partner who doesn’t lose the thread that worked for hours (I exhausted my five hour quota on more than one occasion):

it read the codebase and translated workflows into test cases
it stayed consistent with constraints across many commits
it revisited flaky E2E specs and hardened selectors
it fixed CI failures immediately instead of letting them linger
it updated docs alongside implementation changes

Short tasks are easy.

The long game—contracts + lint + unit + E2E + coverage + CI + docs—requires continuity.

That’s where Codex 5.2 shined.

Hint: If you've been delaying a long term technical debt task to introduce (or migrate to other frameworks) unit tests and integration tests, integrated with your CI/CD workflows, you no longer have to do this yourself. Codex 5.2 is a very capable model that can do literally all the heavy lifting for you. Codex can help transform your codebase in a matter of hours, taking your codebase quality to the next level.

The Working Loop I Used (Practical, Not Magical)

If you want to replicate this kind of quality transformation, here’s the loop that worked:

Start with an audit task
- “Scan the repo for contract violations and list them by severity.”
- “List lint rule violations that prevent a clean run.”
Fix in small diffs
- insist on small PR-sized changes
- prevent “helpful refactors” that mix concerns
Lock in with tooling
- don’t accept “we fixed it once”
- enforce it through npm run verify / CI gates
Convert workflows into tests
- unit tests for domain rules
- E2E specs for user flows
Make coverage visible
- separate unit vs E2E
- publish artifacts
Document the happy path
- how to run tests locally
- where coverage lands
- what “green” means

Codex 5.2 helped keep this loop tight—especially the follow-through steps that humans tend to procrastinate.

Before / After (The Shape of the System)

Outcomes (What Changed, Concretely)

By the end of this effort, the repo gained:

a stronger contract + lint foundation
ESLint v9 flat config with warning-free lint
stylelint + depcheck integrated into the quality gates
a comprehensive unit test suite across core services
a multi-spec Cypress E2E suite covering real user journeys
separate unit and E2E coverage reports
CI workflows that upload coverage artifacts on every commit/PR
updated docs that make the whole thing discoverable and repeatable

This is the difference between:

“we have tests”
and “we have quality.”

Stats

Counts (current HEAD)

Unit tests: 68 test cases across 27 tests
Cypress: 7 specs in cypress/e2e, 39 E2E test cases

Files Touched

94 files changed since linting was introduced
37 files changed since unit tests were added
37 files changed since Cypress tests were added

E2E Coverage Delta (baseline = first coverage workflow commit 68418c)

Statements: 54.60% → 54.61% (+0.01pp)
Branches: 33.34% → 33.34% (+0.00pp)
Functions: 52.90% → 52.93% (+0.03pp)
Lines: 56.90% → 56.91% (+0.01pp)

CI Runtime Before/After (quality.yml, success runs only)

Before coverage/tests in CI (pre‑68418c): avg 29.2s, median 32.0s, n=10.
After coverage/tests in CI (post‑68418c): avg 303.7s (~5:04), median 306.0s, n=6.

tests.yml (unit coverage workflow)

all runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s
success runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s

e2e.yml (Cypress coverage workflow)

all runs: n=19 avg=229.7s median=284.0s min=38.0s max=310.0s
success runs: n=18 avg=239.8s median=285.0s min=38.0s max=310.0s

Why This Matters (Beyond Preventing Regressions)

A real quality system does more than reduce bugs:

Onboarding improves because expectations are written down (and executable).
Refactors get cheaper because the safety net is layered.
Velocity improves because you stop paying the “regression tax.”
Confidence increases because you can prove correctness instead of arguing.

The whole point of quality is not to slow down shipping.

It’s to preserve shipping speed as the product grows.

The Reusable Playbook

If you’re trying to do the same in your own codebase, don’t start with a thousand tests. Start with the ladder.

Define and enforce a contract (rules of the road)
Make lint clean and trustworthy
Build unit tests that map to domain logic
Start with smoke E2E tests, then expand into full flows
Instrument coverage for both layers
Publish coverage artifacts in CI
Treat docs as first-class artifacts

Quality is not a tool.

Quality is a system.

I've experienced the power of Codex as my AI coding agent quality partner - this stuff is real. My app codebase might still be in its infancy, but I know it comes close to what software engineering teams face everyday - especially if you're building enterprise tooling apps - as my ex-teams at AWS build.
Integrating AI-coding agents to handle refactoring, migrations or as in my case - introducing end-to-end quality automated testing - is now a no-brainer with powerful assistants like Codex, and a significant productivity enhancer!

How AI helped me create a Git Repo Storyline

I've been building my SMT App with the codebase hosted on Github since October 2024. It has been an interesting journey of learning indeed! What started out as an experiment to test an idea of mine (basically create a set of practical tools for software managers to manage their teams and delivery plans better, pivoted on a clearly mapped out system architecture) whilst learning and upskilling to the new world of AI-assisted coding - has ended up with me refining and polishing the code to get it to a respectable level, worthy of being used by other people :-)

I'm also a fan of data and visualizations - I thought it would be really cool to have a tool to help visualize the evolution of my codebase. Here's what I created in a few hours:

Saturday, 27 December 2025

Animating Git repo evolution of SMT codebase

I thought I'd have some fun today with Google Antigravity using Gemini and Claude Opus - to mine git repository data and build timeline visualisations. Since I'm a big fan of "Race" timeline graph animations, I thought I'd have a go at building one myself. This video shows how my codebase evolved over time, from a single monolithic index.html/js file to a production-grade, well-architected system. This also tells a story of the journey of coding with AI assistants from the early models to current day. My productivity literally increased exponentially from November, when newer models released -- especially Google's Antigravity platform. And in the last week, Codex 5.2 Max is proving to be quite powerful too -- it would be great if Antigravity expands support for Codex!! #gemini #antigravity #claude #codex #ai #sofware #github

Play with the app here Codebase on Github!

Pages

Sunday, 8 February 2026

The Community Blueprints Marketplace (with social features)

What’s a “Blueprint” in SMT?

Why I’m doing this

Local-first by default (privacy + zero friction)

How the social marketplace works (simple + cost-free)

Try it (and please break it, it's a WIP hobby project)

1) Open the app

2) Explore the “Community Blueprints” view

3) Publish + socials

What I’d love feedback on (high signal)

If you want to contribute

Roadmap ideas (if the community likes this)

Saturday, 7 February 2026

What We Shipped in One Day

The Metrics That Matter

How This Compared to “Typical Solo Dev Pace”

This Was Collaboration, Not Task Dispatch

Lessons Learned

What’s Next

Sunday, 4 January 2026

A summary of 2025 performance and intro to personametry.com

Background

Context about my workflow - the early days

Introducing my latest workflow - finally, the Personametry Dashboard is born - ZERO Workflows

Personametry.com is more than just a dashboard - introducing Machine Learning

What's next - where am I going with this?

Wednesday, 31 December 2025

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!

Sunday, 28 December 2025

The Landscape: What We Were Working With

The Problem: Velocity Without Proof

The Quality Ladder (The Sequence Matters)

Phase 0: Compliance Cleanup (The Real Beginning)

Phase 1: Turn Ideas Into Contracts

What changed (in practical terms)

Phase 2: Make Linting a Trusted Signal

Phase 3: Unit Tests That Match the Domain

Phase 4: Cypress E2E Tests for Real Flows

Phase 5: Coverage You Can See (Not Just Locally)

What Codex 5.2 Actually Enabled

The Working Loop I Used (Practical, Not Magical)

Before / After (The Shape of the System)

Outcomes (What Changed, Concretely)

Stats

Why This Matters (Beyond Preventing Regressions)

The Reusable Playbook

Saturday, 27 December 2025