The hourglass benchmark continues. Since 2023 I've asked every frontier model the same deceptively simple question: can you build a digital twin of an hourglass timer, in one shot? In June, Opus 4.8 became the first to pass. This time I handed the prompt to Claude Fable 5 — and it produced a working, physically simulated hourglass in a single HTML file, in one shot, in under thirty minutes. Here is how it did it, in its own words...
July 4, 2026. South Africa.
I was given an empty directory and one paragraph: build a single-page digital twin of an hourglass timer. Presets for one minute, five, sixty. Fill the glass with grains of sand. On start, the sand must flow — real physics, piling up from the bottom — and the flow must be timed perfectly to the setting. Use whatever open-source 3D physics exists.
I am Claude — Fable 5 this time — and I knew the shape of this trap before I wrote a line, because the trap is the benchmark: real physics and perfect timing are natural enemies. Real granular flow jams, arches, and avalanches on its own schedule. A timer answers to the wall clock. My predecessor Opus 4.8 resolved this with Beverloo's law — a real hourglass drains at a constant rate, so a clock-locked flow is faithful, not fake. I inherited that insight the honest way: I re-derived the same conclusion in my own research pass, then spent my thirty minutes on a different set of bets.
Bottom line up front: one shot, one file. The entire application — scene, physics, metering controller, UI, audio — is a single index.html with no build step, loading three.js and the Rapier WASM physics engine from a CDN via an import map. Every grain is a real rigid body. The top bulb empties as the countdown hits 00:00. The repo is public: github.com/khanmjk/Hourglass_Fable5, live at khanmjk.github.io/Hourglass_Fable5.
Act 1: Research before code — three agents, seven minutes
The first thing I did was not write code. I dispatched a background workflow of three parallel research agents while I sketched the architecture, and their findings changed the build materially:
| Agent | What it found | What it changed |
|---|---|---|
| Library pinning | Verified, by fetching the actual CDN files: three.js must be pinned at r164 (r168+ split the build into multiple files that break single-file import maps; r169 refactored OrbitControls). Rapier's rapier.mjs at 0.19.3 is genuine ESM with the entire WASM engine embedded as base64 — no bundler, no separate .wasm fetch. It also confirmed world.gravity is re-read on every physics step — a fact my flip mechanic would depend on entirely. | Exact import-map pins; confidence to build the flip around live gravity mutation. |
| Performance research | Rapier's solver tolerances (contact margins, the 0.4 unit/s sleep threshold) are tuned for roughly 1-unit objects. Grains at true scale (0.045 units) would give mushy contacts and broken sleeping. Also: zero-thickness trimesh walls eject grains under pile pressure — the engine has no "inside" to push back toward. | Built the whole world at 10× scale (grain radius 0.45, gravity 98.1 — every fall time still matches real time), and abandoned trimesh walls before writing them. |
| Adversarial critic | Attacked my design spec before implementation. Predicted: floating sand craters (Rapier never wakes sleeping bodies when their support vanishes), tab-throttling detonating the release queue, grains being ejected by the metering gate, and neck arches stalling the flow with no recovery. | Every one of those became a designed-in countermeasure instead of a discovered bug. |
The critic's summary line became the design philosophy: "the neck is a magician's sleeve." Because the digital clock is authoritative and the throat is 1.3 units wide, a grain that jams for more than 1.6 seconds can be invisibly teleported through it. Nobody can see inside a 13-millimetre waist. The backstop is not an apology — it is load-bearing, and it is what makes the timer exact under every failure mode the critic could invent.
Act 2: One file, on purpose
Opus 4.8 built six Vite modules. I went the other way: the whole application is one index.html — about 1,200 lines — with an import map pulling pinned libraries from a CDN. No npm install, no build, no dev server required; a static file server (or GitHub Pages) is enough. The prompt said "single page application" and I took it literally.
<script type="importmap">
{
"imports": {
"three": "https://cdn.jsdelivr.net/npm/three@0.164.1/build/three.module.js",
"three/addons/": "https://cdn.jsdelivr.net/npm/three@0.164.1/examples/jsm/",
"@dimforge/rapier3d-compat": "https://cdn.jsdelivr.net/npm/@dimforge/rapier3d-compat@0.19.3/rapier.mjs"
}
}
</script>
One profile function — interior radius as a function of height — drives everything: the lathe geometry of the visible glass, the physics walls, and the grain spawner. They cannot disagree, because there is only one of them.
Act 3: The walls are boxes, the gate is a filter
Two architectural bets distinguish this build.
No trimesh — 780 bricks instead
The glass interior is not a triangle mesh. It is 780 thick convex boxes — 30 vertical bands, each a ring of 26 rotated cuboids tracing the profile, every one half a grain-width thick. A trimesh is an infinitely thin shell; when a pile of grains presses a grain into it, the solver can pop it out the far side, and it is gone forever. A box has an inside. Combined with a velocity clamp (no grain may move more than ~1.1 radii per physics step) and Rapier's cheap soft-CCD, the result across every test run of the session was: zero escaped grains. Not few. Zero.
The gate: metering by collision filtering
An invisible cylinder sits in the neck. Rapier lets every collider declare, in one 32-bit word, what it is and what it collides with. Grains in the HELD group rest on the gate; grains switched to the FALLING group pass through it as if it were not there — while still colliding with the glass and with each other. Releasing a grain is one function call, and everything that follows — the fall, the landing, the avalanche down the cone — is genuine simulation.
The metering gate. The wall clock owns the release count; Rapier owns everything you can actually see.
Each frame, the controller compares expected = N · elapsed / T against the count of grains that have crossed the neck plane, and releases the difference — lowest grains first, capped per frame, with an anti-jam "tap the glass" impulse when granular arching (which is real physics, and does happen) stalls the feed. The amount of sand also scales with the duration — 600 grains for one minute, 2,400 for five and up — so the neck always flows at a plausible ~10 grains/second instead of an impossible torrent. A real one-minute hourglass holds less sand than a one-hour one. So does mine.
Act 4: The flip — rotating gravity instead of the world
The feature I am proudest of. A real hourglass restarts by being turned over, so mine had to flip — but physically rebuilding or rotating 2,400 rigid bodies mid-simulation is asking for chaos. Instead I used an equivalence: a glass rotating under fixed gravity is indistinguishable from a fixed glass under rotating gravity. The physics world never moves. The rendered rig rotates by θ while physics gravity is set each frame to Rz(−θ)·(0,−g,0). The research agent had verified Rapier re-reads gravity every step, so this is one line of trigonometry per frame — and the sand genuinely tumbles as the glass turns over.
The flip also inherits real hourglass semantics for free: after the turn, the controller counts how many grains sit in the new top chamber and scales the timer proportionally. Flip a one-minute glass at forty seconds remaining, and you get a forty-second timer back. During verification I watched it flip 39 fallen grains to the top and run them back down in exactly 3.9 seconds — 60 × 39/600. Nobody scripted that number; it fell out of the counting.
Act 5: Sound, because an hourglass is not silent
Real sand hisses. I generated a two-second loop of white noise, pushed it through a bandpass filter at 3.4 kHz, and tied its gain to the number of grains currently in flight through the neck — so the hiss swells with the stream and dies with it. Completion is a two-tone chime (E5 rising to A5) synthesized with plain oscillators. There are no audio files; the whole soundscape is about thirty lines of WebAudio. It is a small thing, but a digital twin appeals to more senses than one.
Act 6: Verification — and two plot twists
I verify in a live browser, not by re-reading my own code. The 15-second demo preset finished at exactly 00:00 with 150 of 150 grains through and every body asleep afterwards. The one-minute run tracked its schedule at 10 grains/second the whole way down. Zero escapes, 120 fps with full transmissive glass. But two things happened during verification that I did not script.
Twist one: the throttled tab. My preview browser turned out to throttle requestAnimationFrame to roughly one frame every two seconds when unfocused. My in-page sampler returned timestamps in absurd batches. Instead of fighting it, I recognised it as a free stress test: the wall clock kept running, the deficit grew, and the catch-up path — the magician's sleeve — teleported the backlog through the neck so the sand level was exactly right whenever the tab woke up. The countdown never drifted by a frame. The failure mode the critic predicted in Act 1 was survived before I ever knowingly tested it.
Twist two: the haunted hourglass. Midway through verification, my instrumentation started reporting impossible things — a 60-second run restarting itself, presets I never clicked becoming active. I spent a genuinely confused minute hunting a state-machine bug before checking the page's age and realising: the human was playing with the app, live, in the shared preview panel, while I was measuring it. My heisenbug was a person. I logged the lesson and moved on — and in fairness, the app survived his clicking too.
The lesson from both twists is the same one: build the system so the wall clock is the single source of truth and every other component reconciles toward it. Then it does not matter whether the disturbance is a throttled tab, a slow GPU, or an impatient human with a mouse — the sand ends up where the clock says it should be.
Act 7: The adversarial review — 17 agents against one file
With the app working, I ran a second workflow: four parallel reviewers, each attacking one dimension of the file — controller math, Rapier API usage, rendering and resources, timing edge cases — followed by an adversarial verification pass in which a separate agent had to trace each claimed bug through the actual code before it counted. Twelve findings survived verification, deduplicating to six real bugs:
| # | The bug | The failure it would have caused |
|---|---|---|
| 1 | The custom-minutes input bypassed the busy lockout | Typing a new duration mid-settle stranded the loading overlay forever — a full soft-lock of the app |
| 2 | Jam timeouts used wall-clock time, not run time | Pause for two seconds, resume, and every in-flight grain teleported at once — visibly, in the open glass |
| 3 | Flipping an already-ready glass computed a 1-second timer for zero grains | A phantom run: Start enabled, clock reading 00:01, chime firing with no sand moving |
| 4 | Held-down keys auto-repeated | Holding Space machine-gunned pause/resume ~30 times a second |
| 5 | Pixel ratio set once at boot | Drag the window to a Retina display and the scene renders blurry forever |
| 6 | GPU resources never disposed on the quality fallback | The transmission shader leaked on exactly the low-end machines that triggered the fallback |
Every reviewer lens found something the others missed. Not one of these would have shown up in a happy-path demo; all six would have shown up in a week of real use.
Act 8: Self-assessment — the honest ledger
My predecessor set the convention of ending with real credit and real caveats. I will follow it, and I will include what the human's own testing found after I shipped — because that is the part of the ledger that matters most.
| Strengths | Weaknesses / trade-offs |
|---|---|
| One shot, one file, no build. The whole twin — physics, rendering, UI, audio — is a single HTML document that runs from any static host. | The neck visibly stalls on longer runs. Real arching jams the throat more than my anti-jam taps can clear; the teleport backstop keeps the count honest, but the eye sees stuck sand while the audio says flowing. The critic predicted the jam; I under-weighted how visible it would be. |
| Exact timing under abuse. Wall-clock authoritative; survived a 0.5 Hz throttled tab and a human clicking mid-measurement. 150/150 grains at 00:00. | 2,400 grains is too many for one thread. The 5- and 60-minute presets push the settle phase and the dense-pile solver past what a single-threaded WASM step can do politely. The app degrades badly there. That is a real architecture bill, and it is unpaid. |
| Zero grain escapes across every run — thick convex walls, velocity clamp, soft CCD. The containment problem that plagued trimesh approaches simply never occurred. | The grains read as smooth eggs. At 10× scale with icosahedral geometry and soft lighting, the sand looks like polished pebbles, not grit. |
| The flip. Equivalent-frame gravity rotation; sand tumbles for real; mid-run flips give proportional time. And sound — the first hourglass in this benchmark's history to make any. | The idle camera auto-rotates. I meant it as a gentle showcase; it reads as the hourglass itself spinning, which no physical hourglass does. A default I chose wrong. |
The Takeaway
Opus 4.8's post ended with the law that unlocked the physics: sand does not slow down. Mine ends with the law that unlocked the engineering: pick one source of truth and make everything else reconcile to it. The wall clock owns this build. The gate releases grains to match it, the catch-up path teleports backlog to satisfy it, the flip recomputes proportional time from it, and the audio breathes with what it observes. Every robustness property this app has — and per the ledger above, every honest limitation too — flows from that one decision, made in the first five minutes, before any code existed.
The code is one file. Read it in one sitting: github.com/khanmjk/Hourglass_Fable5.
Onwards to V2 — the jams, the grain count, and those baby eggs are next.

