Mo Khan's Blog: Living in Antigravity: A Weekend with Gemini 3, Claude Opus 4.5, Codex 5.1 Max and the Death of "Clean Code"

I spent my entire weekend inside Google’s Antigravity IDE, juggling the three titans of late 2025: Gemini 3 Pro, Claude Sonnet 4.5, and OpenAI Codex 5.1 Max.

My test subject? A "Franken-app" I started building back in the stone age of September 2024. Back then, I was using GPT-4o and the early, limited versions of Claude Sonnet. At the time, those models felt like over-eager interns: they generated code fast, without thinking, and prioritized "showing it works" over best practices.

One+ years later, the landscape has shifted. The "Vibe Coding" hype suggested that software engineering was dead. I’m here to tell you that it’s not dead—but it has mutated into something entirely different.

Here is a video showing the new App refresh versus the old style side-by-side:

Here is what I learned re-wiring a two-year-old messy codebase into an enterprise-grade web app in 48 hours.

The Antigravity Experience: 13 Days In

As I write this, Google Antigravity is less than two weeks old. I installed it on Day 2, and honestly? It was rough. It froze, crashed, and felt heavy. But after the stability patch earlier this week, something clicked.

The "Killer App" isn't just the code generation; it’s the Conversational Context. With Gemini 3 Pro integrated directly into the IDE, I found myself managing a large-scale refactor not by typing code, but by debating architecture. I could propose ideas, ask Gemini to critique them, and even generate UI mockups right there in the console—no context switching to Figma or Photoshop.

However, it’s not perfect.

The "Human Bottleneck": Antigravity encourages you to run parallel workspaces with multiple agents. But I found that while the machine is multi-threaded, I am not. Context switching between three different agents causing chaos in three different branches is mentally exhausting.
The "Browser Agent": Promising, but slow. It’s fascinating to watch an AI navigate your app to test it, essentially replacing the QA role, but right now, it struggles to "drive" complex applications without crashing.

The Model Showdown: December 2025 Edition

Since Antigravity lets you swap models, I played them against each other. Here is the reality check:

1. Claude Opus 4.5 (The "Academic"): Supposedly the best coding model in the world. It provides the best analysis and writes detailed, "Senior Engineer" style impact assessments. However, it still makes junior mistakes. It would change a CSS class to fix one page, completely forgetting the cascading impact on the rest of the site. High IQ, low spatial awareness.

2. Gemini 3 Pro (The "Collaborator"): The best conversationalist. It holds the "big picture" context better than the others, but it struggles with long-running threads. Eventually, the context gets so heavy that it stalls or starts hallucinating old patterns.

3. OpenAI Codex 5.1 Max (The "Brute Force"): When I hit a wall, I called in Codex. It lacks foresight and nuance, but it is 100% confident and usually finds a way to "just make it work."

The Big Question: Is Spaghetti Code Actually Bad??

This brings me to my most controversial observation.

We spend so much time prompting these models to respect encapsulation, avoid global namespaces, and write "Clean Code." But why? We are forcing AI to mimic human limitations.

I successfully transformed a 90s-style interface into a Jira-level Enterprise UX in one weekend. The code under the hood? It’s a mess. It is clearly "vibe coded." A human engineer would weep looking at the dependency graph.

But I ask you: So What?

If the AI writes the code, maintains the code, and fixes the bugs, why does a human need to understand it?

Machines are better than us at reading large datasets and understanding complex, spaghetti-like relationships.
If the application is secure and performant, the "readability" of the code is purely a vanity metric for human ego.

We are moving toward a future where we trust the machine to do the job. The role of the software engineer is shifting from "Author" to "Architect" and "Reviewer." You still need a technical background to know what to ask for—you need to know what a "race condition" is to spot when the agent creates one—but you no longer need to care about how the syntax is structured.

The Missing Pieces: How Antigravity Can Win

While Antigravity is impressive, my weekend of deep work exposed three critical gaps where the "Agent" promise falls short of the reality. If Google wants this to replace VS Code, here is what needs to happen next.

1. We Need a "Time Machine" for Code

Right now, Antigravity operates in a linear flow. If an Agent goes down a rabbit hole for 20 minutes and ruins the codebase, your only option is to manually git revert or undo endlessly. The Fix: We need Conversation Forking and System Restore Points.

Imagine a "DVR" slider for your development session. I want to scrub back to 2:00 PM, right before I asked Gemini to "refactor the auth module," and fork the reality there to try a different prompt.
We need "Save States" for the Agent’s brain, not just the code. I want to revert the context, not just the files.

2. The Context "Amnesia" Problem

I found that even with massive context windows, I am still the only thread tying everything together.

The Problem: If I start a new workspace for a "UI refresh," that agent has no idea I just spent three hours discussing database schema changes in a different thread.
The Fix: Cross-Conversation Memory. Agents need a shared "hive mind" or a persistent Project Knowledge Graph. If Agent A makes a decision in Thread 1, Agent B in Thread 2 should instantly know about it without me having to copy-paste the summary.

3. "Architect Mode" vs. "Intern Mode"

It is baffling that in late 2025, I still have to explicitly prompt a state-of-the-art model to "avoid global namespaces" or "separate concerns."

The Reality: These models are still trained on the average of all GitHub code—which means they default to "average" (spaghetti) quality. They prioritize working code over maintainable code.
The Fix: I shouldn't have to prompt-engineer basic competence. Antigravity needs a toggle for "Strict Architect Mode"—a system-level guardrail that forces the Agent to adhere to SOLID principles and Design Patterns before it writes a single line of code. I don't want an eager intern; I want a Senior Engineer who pushes back when I ask for something stupid.

Observations So Far: The Paradox of Productivity

I have successfully rewired a 90s-style web app into a modern enterprise platform in 48 hours. The result is undeniable, but the process felt like herding cats.

The "Vibe Coding" Reality: The app looks great, but under the hood, it is a "Vibe Coded" mess. The dependency graph looks like a bowl of spaghetti.
The Stability Tax: I found myself having to "close-and-restart" Antigravity multiple times to clear out "ghosts" in the machine.
The Human Bottleneck: The promise of agents is parallelism, but I am still single-threaded. Trying to manage three agents simultaneously is mentally exhausting. Instead of writing code, I am now a Project Manager constantly unblocking three different junior developers who type at the speed of light but forget instructions every hour.

The value proposition is compelling, but we are not there yet. We have replaced "syntax errors" with "context errors," and I am not sure which one is harder to debug.

The Verdict

Antigravity is buggy, resource-hungry, and requires a "close-and-restart" every few hours. But it is undeniable that the tide has turned. Frontend engineering as we knew it in 2024 is gone.

The future isn't about writing code. It's about managing the agents that write it for you—and perhaps, learning to love the spaghetti.

Note: This blog post was initially written by me and later tidied up with Gemini Pro.

Mo Khan's Blog

Pages

Monday, 1 December 2025

Living in Antigravity: A Weekend with Gemini 3, Claude Opus 4.5, Codex 5.1 Max and the Death of "Clean Code"