Mo Khan's Blog

Sunday, 4 January 2026

Personametry V1.0 Worklog with Antigravity

Personametry Development Journey

Personametry - a decade's journey of time tracking now enhanced with AI

A summary of 2025 performance and intro to personametry.com

It is the time when I share my Performance metrics for the previous year, this card pretty much sums it up, logging 8700 hours, with 3006 time entries, tracking close to 24 hours per day:

Background

For the last ten years I've been running an experiment in logging my time spent activities like: Work, Family, Me Time, Sleep, Spirituality, etc. In 2015 I developed a model for personal development, called RAGE (Reality, Aspiration, Goals, Expectations). In 2016, I got more serious by inspecting my time across all areas of my life against my RAGE model, which triggered deeper reflection on my aspirations versus reality. For the first three years, I maintained a rhythm of personal monthly performance reviews (PMPRs) and then transitioned to quarterly, mid-year and final year reviews. At the start of each new year, I would dive deep into the previous year's data - build analytics and dashboards, which I shared on this blog.

Context about my workflow - the early days

In the early days, my process for insights was quite manual. Logging my time was easy, using Harvest App, which I'd been introduced to by a good friend, Farid - around the time I switched to professional consulting, servicing some contracts with Crossbolt that expected Harvest timesheets for billing. Incidentally, Farid was the source of inspiration for me to critically think about Reality V Aspirations that led to me creating my RAGE model.

Generating reports initially started with exporting from Harvest, and importing to Excel and running pivot tables and charts, using content for my blog posts. I needed a way to transform the Harvest data to higher level constructs - so I transitioned to Amazon Quicksight (now Quick Suite), using an AWS Free Tier account. Quicksight was useful in acting as a yearly dataset, creating analysis that I would have done in Excel (so replaced excel) and created the dashboards, which I'd then copy and share in this blog. A downside of Quicksight is it's a closed system, had no way of publishing dashboards for public sites (like Google docs embedded pages mechanism). The free tier also prevented me from using its built-in insights features, and more recently Quicksight's AI analysis. I added Google slides to my workflow, sharing my deep dives as in this post. As AI tooling emerged, I transitioned to AI analysis as described here.

Introducing my latest workflow - finally, the Personametry Dashboard is born - ZERO Workflows

I spent just under 5 days building my Personametry app with Google's Antigravity as my coding partner. What a journey (look out for a future post). Since November 2025, I've been learning how to build apps with Antigravity, at first building my SMT app, then building tools for work - and I had enough insights to get Personametry app built. What's my new workflow then? Everything is now automated, apart from my manual time logging. I've built a dashboard that syncs daily with Harvest data, through an automated GitHub actions workflow that pulls time entries via the Harvest API. Harvest is so cool that they allow even free users full access to their APIs. An automated data transformation job runs that cleans up the data and transforms it just the way I used to do the meta level transforms using Quicksight. So no more Quicksight. All the dashboards refresh automatically. I no longer need to create Google slides anymore. At the start of each year, I'd usually spend about a week's time analysing, reflecting and creating dashboards. Now my analysis can be anytime, with zero manual work. Giving a week's time back! Yes, anyone has access to my data and dashboard, I don't mind sharing because I believe other folks could benefit from my experiment, decide to start their own tracking journey or build an app for themselves. The codebase is on GitHub.

Personametry.com is more than just a dashboard - introducing Machine Learning

With my rich dataset, there's opportunities for applying machine learning forecasting techniques and instrumenting goals. Check out the Machine Learning page. I can now tune my personas and in real time see the effects, example: If I reduce my sleep hours, where would the gains go? If I reduce my work hours, subject to constraints, what can I do? If I invest in health and fitness, what's the impact on Family time, etc. For me, this is a game changer. The app will evolve and learn as the dataset is updated, without having to change code or do manual imports! I might have to tweak the code just a little to cater for special years like sabbatical breaks though.

What's next - where am I going with this?

Version 1.0.0 is now live! Depending on how much time I have in 2026, I will look at embedded AI data driven analysis into Personametry.com, leverage conversational analysis. Ultimately I'm still striving to build the perfect personal assistant that just "knows" me. I will look at bringing in additional data sources like Strava, Netflix, Youtube, even integrating Islamic and Gregorian calendars. And finally I'll hook in a RAGE scorecard to match my time against the RAGE model! I could also turn this into a paid platform service, creating a platform for anyone to sign up and build their own RAGE model personas and track with Personametry.com!

Wednesday, 31 December 2025

The SMT Chronicles - MVP Version 2 - NotebookLM story

In my previous post, I shared how NotebookLM synthesised the evolution of my experimental app, SMT (Software Management Tools) from my Gemini chat transcripts till June 2025. I then went further and added just the latest Readme file as an additional source for NotebookLM to see if it could connect the dots from June 2025 to December 2025. I also created a simple dashboard using Antigravity for my repo's storyline here.

NotebookLM generated this infographic that was spot on!

By June 2025, SMT's main feature was the Yearly Planning page - the inspiration behind the planning feature of SMT came from my Amazon AWS experience of their yearly planning mechanism called Operational Planning (OP1 & OP2) cycles. The lack of tooling within the company made the process quite time-consuming, error prone and not fun at all! We used spreadsheets in either Excel or Quip (Quip is a pain for spreadsheeting!). So SMT was going to make OP planning much more fun, and accurate as well - especially when calculating net engineering capacity. SMT is a proof-of-concept, but if anyone from AWS is reading this post, feel free to play with the app, get the codebase and have a go at using it for your team's planning. The app works fine IMHO but still has a lot more features to add.

Since the June 2025 MVP Version 1, I woke up again in November and added a ton of features. My productivity sky rocketed with the release new LLMs and coding assistants. I added a powerful AI feature which I think is a game changer for software management. I clobbered technical debt, rebuilt the app UX, added quality controls and overall rearchitected the codebase to something more respectable than a toy app coded by AI...

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

The SMT Chronicles - MVP Version 1 - Oct 24 to June 25, building with AI co-pilots

Over the last year, I've been building an experimental app, SMT (Software Management Tools) that started out initially out of curiosity to test LLM's ability of code generation, before the proliferation of AI coding agents and integrated development environments or vibe coding platforms. I worked on this project in my spare time over weekends and evenings, testing the waters - often getting quite frustrated in the early days, sometimes having streaks of productivity and other times, just pure going in circles and breaking things - often found myself cursing the AI :-) I didn't give up, despite the breaks with long periods of inactivity, I kept monitoring the latest AI news and releases - and continued to test the waters. Each time, learning something new, seeing the progress of LLMs, witnessing the pure potential this technology has to not only disrupt the software industry, but also the immense potential at empowering people to translate ideas and concepts into prototypes, without depending on outsourced developers. The journey of learning continues. I stuck with Gemini because it has, since the beginning been enormously generous with quotas and large context windows, unlike Claude and ChatGPT at the time. Even today, I prefer to stick with learning just one tool like Antigravity than to context switch with others - although my workflow includes Antigravity Agent Manager, and a separate VSCode with Codex to audit changes and pick up where Gemini or Claude Opus fails to complete their tasks.

Here's the activity story from GitHub:

I also created a simple dashboard using Antigravity for my repo's storyline here.

In this post, I'm sharing some history of SMT. How did all begin? Interestingly enough, I went through a phase of saving major chat sessions with Gemini, that led to the first MVP of SMT. I saved all my prompts in google docs. I stopped tracking my prompts in June because it got quite tedious! With this chat record, I wanted to see what Google's NotebookLM would make of the doc's contents, here is what it produced - super fascinating the power of NotebookLM!

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!

Sunday, 28 December 2025

How I use visual annotations with Antigravity to build UI / UX

Sharing some of my workflow experience in building my SMT (Software Management Tools) application using AI as my coding assistant. With the launch of Google's Antigravity platform in November 2025, my curiosity got the better of me! I re-engaged with my codebase after about 5 months of dormancy. In under 2 months I was able to migrate the look and feel to a new UX paradigm, introduce a coding contract and constitutional framework, introduce dark/light mode theming, fundamentally refactoring the codebase to best practices software design patterns, integrate AI-features into the app, and clobber technical debt. Thanks to Antigravity... what this video shares is how powerful visual annotations can be for changing UX... As I experienced this journey, I became more aware the future of UX and Frontend Engineers is going to be disrupted quite radically!

Play with the SMT app here Codebase on Github

Codex 5.2 vs the Long Game: Building a Quality Ladder in a Real Product Codebase

If you’ve ever tried to institutionalize code quality (not just talk about it), you’ll know the trap: quality work is rarely one big change. It’s a hundred small, unglamorous decisions—done consistently—until the codebase starts behaving like a system.

That’s what this post is about.

I’ve been evolving a real planning product codebase (“altsoftwareplanning”)—workflows for planning, roadmaps, org views, capacity tuning, and a growing set of UI/interaction patterns. It shipped features. I think my app will solve real problems. Yes, it's still a concept, and I wanted to ensure it's being built the right way. But the safety net was thin.

At some point, “it works” stops being good enough.

I wanted a codebase where I can say, with a straight face:

We can prove it works.

Codex 5.2 turned out to be the right partner for the long-running version of this work: contract enforcement, lint hardening, unit testing, E2E testing, coverage instrumentation, CI visibility, and documentation—all in a compounding sequence that actually sticks.

Codex 5.2 CLI helped transform an active product codebase into a quality-first system by enforcing contracts, hardening lint discipline, building broad unit + Cypress E2E coverage, and surfacing coverage artifacts in CI—over a sequence of compounding iterations.

The Landscape: What We Were Working With

You can’t talk quality without acknowledging the stack. The strategy must fit the reality.

App layer: vanilla JavaScript, HTML, CSS (no framework), service-oriented front-end modules under /js
Visualization tooling: D3, Mermaid, Frappe Gantt
Background logic: a feedback worker (smt-feedback-worker) and AI-adjacent utilities
Quality/tooling: ESLint (flat config), stylelint, depcheck, plus a custom contract scanner
Testing:
- Unit: Vitest + jsdom
- E2E: Cypress
- Coverage: Istanbul + nyc instrumentation (unit + E2E), plus @cypress/code-coverage
CI: GitHub Actions, publishing coverage artifacts on every push/PR

This is no longer a toy repo! It demonstrates patterns for a real product surface area with enough moving parts to punish sloppy changes.

The Problem: Velocity Without Proof

We had signals, but they weren’t a coherent system:

some rules existed (informally)
linting existed (but was noisy and inconsistent)
testing existed (but not at a level that lets you refactor with confidence)
E2E coverage didn’t reflect how users actually flow through the product
coverage existed locally, not visibly

In other words: tribal knowledge plus hope.

The goal wasn’t “add tests.” The goal was a quality ladder—each rung enabling the next.

The Quality Ladder (The Sequence Matters)

Here’s the order of operations that worked, and why:

Codify a contract (design rules, data flow rules, UI rules)
Make lint clean and trustworthy (so warnings mean something)
Cover domain logic with unit tests (turn behavior into executable specs)
Add E2E tests for real workflows (where regressions actually happen)
Instrument coverage (otherwise you’re guessing)
Publish coverage in CI (visibility changes behavior)
Keep docs current (quality must be repeatable)

Codex 5.2 helped me climb this ladder without losing the thread.

Phase 0: Compliance Cleanup (The Real Beginning)

Every long-running quality push has a starting “wake up” moment.

For this repo, that moment looked like compliance cleanup:

removing risky innerHTML usage and window-bound patterns
tightening DOM checks and wiring
simplifying defensive guards that hid intent

This wasn’t glamorous. It was foundational. You can’t build a quality system on top of brittle glue.

Phase 1: Turn Ideas Into Contracts

Before I was willing to scale testing, I needed the codebase to have enforceable rules—especially in the UI layer.

We already had an early foundation:

a contract-scanning script
a “coding contract” that enforces rules like:
- no inline styles (even for dynamic values)
- DOM-only toolbars (explicit, deterministic wiring)
- centralized storage patterns (no random settings scattered across views)

Then we reinforced it.

What changed (in practical terms)

Inline style mutations were removed or centralized.
View code was pushed toward CSS classes / CSS variables.
Global header/toolbars were wired consistently.
Storage moved behind the repository boundary (e.g., SystemRepository).

This is the crucial insight:

Quality starts with clarity. If you can’t describe the rules, you can’t test them.

Codex 5.2 was useful here because it didn’t just “fix a file.” It was willing to chase contract violations across dozens of small edits until the rule was actually enforced.

Phase 2: Make Linting a Trusted Signal

Linting is only useful if “clean” is achievable.

So we did the boring work:

cleared ESLint warnings (unused locals, scoping issues, assignment-in-conditional traps)
standardized hasOwnProperty usage
tightened property access and scoping

Only then did we upgrade:

ESLint v9
flat config
reduced deprecated dependencies

Once lint was noise-free, a new lint warning started meaning something again.

This is one of those hidden tipping points: when lint becomes a real gate, behavior shifts.

Phase 3: Unit Tests That Match the Domain

This was not a token unit-test pass. The suite mapped to the product’s domain logic.

We built unit tests across:

Planning
Roadmap
Org
System
Initiative
Forecast
WorkPackage
App state helpers and supporting utilities

Tooling:

Vitest + jsdom (DOM simulation where needed)
later: Vitest v4 upgrade (refreshed the Vite/esbuild chain and cleared CI audit issues)

The real win wasn’t “coverage percentage.”

The real win was this:

Domain behavior became executable.

Codex 5.2 helped by translating real product flows into testable units—without scattering random micro-tests that don’t align to how the app behaves.

Phase 4: Cypress E2E Tests for Real Flows

Unit tests prove logic.

E2E tests prove the product.

We started with smoke tests and selector stabilization, then expanded into specs that mirror how users actually move:

Core UI and workflow validation
Planning and detailed planning flows
Product management flows
System creation/editing and org management
Settings and AI feedback flows
Smoke tests and selector hardening

Two deliberate choices here:

Readable and commented tests. These become “living docs.”
Selector hardening before scale. Flaky selectors are how E2E suites die.

Codex 5.2 showed long-running strength here: iterating until the suite is stable, not just “present.”

Phase 5: Coverage You Can See (Not Just Locally)

Testing without visibility is guesswork.

So we instrumented coverage for both layers:

Unit coverage: Istanbul
E2E coverage: @cypress/code-coverage

We intentionally split reports:

coverage/unit
coverage/e2e

And then made it visible:

GitHub Actions uploads coverage artifacts on every push and PR
no special tooling required to inspect results—just download the artifact

CI now runs (as part of the quality pipeline):

npm run test:coverage
npm run test:e2e:coverage

We also fixed dependency checks so coverage tooling is first-class and green in CI (depcheck included).

This matters because it changes the social contract:

Coverage stops being “something one dev ran.” It becomes a team-visible signal.

What Codex 5.2 Actually Enabled

This wasn’t “AI wrote tests.” That’s the shallow version of the story.

Codex 5.2 behaved more like a quality software-engineer-in-test (SWET) expert partner who doesn’t lose the thread that worked for hours (I exhausted my five hour quota on more than one occasion):

it read the codebase and translated workflows into test cases
it stayed consistent with constraints across many commits
it revisited flaky E2E specs and hardened selectors
it fixed CI failures immediately instead of letting them linger
it updated docs alongside implementation changes

Short tasks are easy.

The long game—contracts + lint + unit + E2E + coverage + CI + docs—requires continuity.

That’s where Codex 5.2 shined.

Hint: If you've been delaying a long term technical debt task to introduce (or migrate to other frameworks) unit tests and integration tests, integrated with your CI/CD workflows, you no longer have to do this yourself. Codex 5.2 is a very capable model that can do literally all the heavy lifting for you. Codex can help transform your codebase in a matter of hours, taking your codebase quality to the next level.

The Working Loop I Used (Practical, Not Magical)

If you want to replicate this kind of quality transformation, here’s the loop that worked:

Start with an audit task
- “Scan the repo for contract violations and list them by severity.”
- “List lint rule violations that prevent a clean run.”
Fix in small diffs
- insist on small PR-sized changes
- prevent “helpful refactors” that mix concerns
Lock in with tooling
- don’t accept “we fixed it once”
- enforce it through npm run verify / CI gates
Convert workflows into tests
- unit tests for domain rules
- E2E specs for user flows
Make coverage visible
- separate unit vs E2E
- publish artifacts
Document the happy path
- how to run tests locally
- where coverage lands
- what “green” means

Codex 5.2 helped keep this loop tight—especially the follow-through steps that humans tend to procrastinate.

Before / After (The Shape of the System)

Outcomes (What Changed, Concretely)

By the end of this effort, the repo gained:

a stronger contract + lint foundation
ESLint v9 flat config with warning-free lint
stylelint + depcheck integrated into the quality gates
a comprehensive unit test suite across core services
a multi-spec Cypress E2E suite covering real user journeys
separate unit and E2E coverage reports
CI workflows that upload coverage artifacts on every commit/PR
updated docs that make the whole thing discoverable and repeatable

This is the difference between:

“we have tests”
and “we have quality.”

Stats

Counts (current HEAD)

Unit tests: 68 test cases across 27 tests
Cypress: 7 specs in cypress/e2e, 39 E2E test cases

Files Touched

94 files changed since linting was introduced
37 files changed since unit tests were added
37 files changed since Cypress tests were added

E2E Coverage Delta (baseline = first coverage workflow commit 68418c)

Statements: 54.60% → 54.61% (+0.01pp)
Branches: 33.34% → 33.34% (+0.00pp)
Functions: 52.90% → 52.93% (+0.03pp)
Lines: 56.90% → 56.91% (+0.01pp)

CI Runtime Before/After (quality.yml, success runs only)

Before coverage/tests in CI (pre‑68418c): avg 29.2s, median 32.0s, n=10.
After coverage/tests in CI (post‑68418c): avg 303.7s (~5:04), median 306.0s, n=6.

tests.yml (unit coverage workflow)

all runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s
success runs: n=22 avg=32.0s median=34.0s min=18.0s max=42.0s

e2e.yml (Cypress coverage workflow)

all runs: n=19 avg=229.7s median=284.0s min=38.0s max=310.0s
success runs: n=18 avg=239.8s median=285.0s min=38.0s max=310.0s

Why This Matters (Beyond Preventing Regressions)

A real quality system does more than reduce bugs:

Onboarding improves because expectations are written down (and executable).
Refactors get cheaper because the safety net is layered.
Velocity improves because you stop paying the “regression tax.”
Confidence increases because you can prove correctness instead of arguing.

The whole point of quality is not to slow down shipping.

It’s to preserve shipping speed as the product grows.

The Reusable Playbook

If you’re trying to do the same in your own codebase, don’t start with a thousand tests. Start with the ladder.

Define and enforce a contract (rules of the road)
Make lint clean and trustworthy
Build unit tests that map to domain logic
Start with smoke E2E tests, then expand into full flows
Instrument coverage for both layers
Publish coverage artifacts in CI
Treat docs as first-class artifacts

Quality is not a tool.

Quality is a system.

I've experienced the power of Codex as my AI coding agent quality partner - this stuff is real. My app codebase might still be in its infancy, but I know it comes close to what software engineering teams face everyday - especially if you're building enterprise tooling apps - as my ex-teams at AWS build.
Integrating AI-coding agents to handle refactoring, migrations or as in my case - introducing end-to-end quality automated testing - is now a no-brainer with powerful assistants like Codex, and a significant productivity enhancer!

Pages

Sunday, 4 January 2026

Personametry V1.0 Worklog with Antigravity

Personametry - a decade's journey of time tracking now enhanced with AI

A summary of 2025 performance and intro to personametry.com

Background

Context about my workflow - the early days

Introducing my latest workflow - finally, the Personametry Dashboard is born - ZERO Workflows

Personametry.com is more than just a dashboard - introducing Machine Learning

What's next - where am I going with this?

Wednesday, 31 December 2025

The SMT Chronicles - MVP Version 2 - NotebookLM story

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

The SMT Chronicles - MVP Version 1 - Oct 24 to June 25, building with AI co-pilots

NotebookLM generated this infographic that was spot on!

NotebookLM generated this audio overview, mostly correct!

NotebookLM generated this slide deck...

NotebookLM generated this video...

My chat transcripts with Gemini from April 25 to June 25 - 50+ pages!

Sunday, 28 December 2025

How I use visual annotations with Antigravity to build UI / UX

Codex 5.2 vs the Long Game: Building a Quality Ladder in a Real Product Codebase

The Landscape: What We Were Working With

The Problem: Velocity Without Proof

The Quality Ladder (The Sequence Matters)

Phase 0: Compliance Cleanup (The Real Beginning)

Phase 1: Turn Ideas Into Contracts

What changed (in practical terms)

Phase 2: Make Linting a Trusted Signal

Phase 3: Unit Tests That Match the Domain

Phase 4: Cypress E2E Tests for Real Flows

Phase 5: Coverage You Can See (Not Just Locally)

What Codex 5.2 Actually Enabled

The Working Loop I Used (Practical, Not Magical)

Before / After (The Shape of the System)

Outcomes (What Changed, Concretely)

Stats

Why This Matters (Beyond Preventing Regressions)

The Reusable Playbook