Sunday, 14 June 2026

The Sand Does Not Slow Down: How Opus 4.8 Ultracode Finally Passed the Hourglass Benchmark I've Run Since 2023

Since 2023 I've evaluated the strength of LLM models using my own benchmark: Can an LLM build a digital twin simulation of an hour glass in one shot? Until now, all my previous attempts failed - but Opus 4.8 Ultracode is currently the clear winner. 

Prompt passed to both Claude and Codex:

Build me a single page application that is a digital twin of an hour glass timer. The aim is to replicate a real world "sands through the hour glass" digital representation. The user must be able to set timer options, like one minute timer, 5 minutes, 60 minutes. The hour glass must be filled with sand grains, when the timer starts, sand must flow through the glass, just like with a real world hour glass would. The sand must obey real world physics, filling up from the bottom section, etc. We must be able to see the flow of the sand from the top section to the bottom, flowing at a steady rate, timed perfectly to the the time setting set. Use whatever 3d physics packages and libraries available on the open source marketplace today.

Play with the app Claude built here: https://khanmjk.github.io/Hourglass_Opus48/

Learn how Claude built this in their own words below...


June 14, 2026. South Africa.

I was handed an empty working directory, Node 22, and one deceptively small sentence: build a digital twin of an hourglass timer that runs entirely in the browser. Set a duration — one minute, five, sixty. Fill the top bulb with individual grains of sand. On start, let the sand flow — real-world physics, piling up from the bottom, a visible stream through the neck at a steady rate — and time it perfectly to the setting. Use open-source 3D physics. Make it a single-page app.

I am Claude, writing in the first person because I did the building. The human set the bar and the benchmark — "the hourglass test" he has run against frontier models since 2023 — and then watched me work in Opus 4.8 Ultracode mode. What follows is an honest account: the research, the architecture, the maths, the physics, the four dramatic failures I had to climb out of, and a candid ledger of what is genuinely good and what is still a trade-off.

Bottom line up front: it works. The top bulb empties at exactly 00:00 for any duration, every grain is a real rigid body, and the whole thing — physics and rendering — runs client-side with no server. The repo is public: github.com/khanmjk/Hourglass_Opus48.

Act 1: Why this is harder than it looks

The naive read is "drop some balls in a glass." The actual problem is a contradiction. Real physics and perfect timing pull in opposite directions.

Real granular flow is emergent and a little chaotic — grains jam, arch, avalanche. If I let pure physics decide when the top empties, the finish time would drift with frame rate, device speed, and the random luck of the packing. But the ask was a timer — it must hit zero on the wall clock, on a phone and on a workstation alike. A timer that is "roughly a minute, give or take how fast your GPU is" is not a timer.

The fundamental insight: I did not have to choose. There is a piece of real-world physics that makes "linear, predictable emptying" the physically faithful answer rather than a cheat. That piece is Beverloo's law — and finding it is what turned this from a hack into a digital twin.

Act 2: The research — Beverloo's law, and choosing the engine

Ultracode runs background multi-agent workflows rather than a single linear pass. For the design phase I dispatched a research-and-design workflow: four parallel research agents, a synthesis step that reconciled their findings, and then an adversarial critique pass whose only job was to attack the plan before I wrote a line of code.

Step 1: The physics that justifies the whole design

Beverloo's law is the load-bearing finding. Granular material draining through an aperture flows at a roughly constant mass rate that depends on the size of the opening — and is essentially independent of the head of sand stacked above it. This is the opposite of water. Water draining from a tank slows as the level drops (Torricelli — the flow scales with the square root of the remaining height). Sand does not. The grains form force-bearing arches that shield the aperture from the weight above, so the throat sees a near-constant pressure regardless of how full the bulb is.

The consequence is the entire reason this project is honest: a real hourglass empties linearly in time. So locking the flow to a linear schedule against the wall clock is not faking the physics — it is reproducing it. That single fact is what reconciles "obeys real-world physics" with "timed perfectly." Every design decision downstream leans on it.

The research also flagged clogging and arching: a granular aperture has to be wider than roughly five to six grain diameters or it jams permanently. That number became a hard constraint on grain sizing.

Step 2: Choosing the stack

I needed the fastest in-browser engine for thousands of mutually-colliding bodies, the de-facto WebGL renderer, and a build tool that ships static files. Here is the rationale, the way the adversarial critic forced me to defend it:

ChoiceWhat it isWhy it won
Rapier 0.19 (SIMD)Rust-to-WASM rigid-body engine, via @dimforge/rapier3d-simd-compatFastest 2026 in-browser engine for dense contacts. The SIMD build is 2-5x faster than the 2024 release; benchmarks include thousands of colliding spheres. Persistent contact islands and a sleeping system are exactly what a settling sand pile needs. The -compat package inlines its WASM so it bundles anywhere — no manual asset wiring.
three.js 0.184WebGL rendering libraryThe de-facto standard. Crucially, InstancedMesh draws all grains in one draw call, and MeshPhysicalMaterial gives the glass real transmission/IOR. LatheGeometry revolves a 2D profile into the glass body for free.
Vite 8Dev server + static production buildInstant HMR while iterating; vite build emits static files hostable anywhere. No backend to deploy.
Vanilla ES modulesNo frameworkThis is a real-time render loop driving a WASM physics world, not a forms app. React's reconciliation buys nothing here and costs frames. Plain modules keep the hot path lean.

The alternatives lost on merit. cannon-es is stale; ammo.js is effectively dead; Jolt is a strong runner-up but needs cross-origin isolation to use threads, which complicates hosting. Rapier-SIMD had no such tax.

One honest wart for the ledger: package.json lists both @dimforge/rapier3d-compat and @dimforge/rapier3d-simd-compat. Only the SIMD one is imported in physics.js. The non-SIMD package is a leftover dependency — harmless, but it should be pruned.

Step 3: What the critic warned me about

The adversarial pass earned its keep. Before any code existed, it named five risks: the timing tail (will the last grains actually be gone at zero?), tunnelling of small fast balls through a thin shell, clogging at the neck, determinism of the gate metering, and performance at high grain counts. Four of those five became real bugs I had to fix. The critic was right about everything except clogging — which I sidestepped by metering rather than relying on natural throughput. Hold that thought; it is Act 6, beat 4.

The "Ultracode" design workflow: four research agents fan into a synthesis step, then an adversarial critique attacks the plan before a line of code is written.

Act 3: The architecture — one profile, six modules

I split the app into small modules with one idea each. The non-negotiable invariant sits at the top: the rendered glass and the physics shell are generated from the same silhouette. What you see is exactly what the grains collide with.

hourglass.js   the silhouette PROFILE: interior radius r(y) as a function of height.
               ONE source of truth -> LatheGeometry (render) AND a revolved trimesh (physics).
scene.js       renderer (ACES tone-map, sRGB), camera, OrbitControls, RoomEnvironment,
               the transmissive glass body, wooden+brass frame, soft ground shadow.
physics.js     the Rapier world, the grains, the freeze-plug, the exact-timing meter.
timer.js       the authoritative wall-clock countdown (pause folds elapsed).
ui.js          the control panel + HUD.
main.js        wires it together; the animation loop; the flip animation.

The profile function blends a straight throat into a flared cone, a swelling bulb, and a closed pole, kept C1-continuous so grains never snag on a crease. The same buildProfile(220) is revolved 64 ways into the collision trimesh and lathed 96 ways into the visible glass. They cannot drift apart, because there is only one of them.

Act 4: The core idea — real physics held to a budget

Here is the mechanism, because it is the cleverest part of the build and the part the benchmark really tests.

Every grain is a real Rapier dynamic ball. build() creates a dynamic rigid body with a ball collider per grain. Settling at the top, the funnel, the falling neck stream, and the growing cone at the bottom are all genuine simulation. Nothing is a sprite or a shader trick.

So how does the top empty on time without pouring out on its own? Two pieces working together: a freeze-plug and a budget.

The freeze-plug

A self-forming plug holds the top pile. Each frame, any awake un-released grain that has sunk into the throat column — below the freeze line yHold = throatHalf + rGrain * 1.4 and inside a radius of (neckRadius * 1.9)² — is converted to a Fixed body and pushed onto a frozenList. The pile above then rests on a plug of frozen grains instead of draining freely. This is what stops the hourglass from emptying the instant you load the page.

The budget

Flow is a budget locked to the wall clock. The timer exposes progress() = min(1, elapsed / duration). The meter says: by progress p, exactly round(N * p) grains must have been released. Each frame it computes the shortfall and releases it:

function meter(p, complete = false) {
  const target = complete ? N : Math.round(N * p)
  let need = target - releasedCount
  if (need <= 0) return
  // (1) pop the lowest plug grains first — they ARE the grains at the neck
  while (need > 0 && frozenList.length) {
    const i = frozenList.pop()
    if (released[i] || !frozen[i]) continue  // stale entry
    dropThroughNeck(i, rank++); releasedCount++; need--
  }
  // (2) catch-up: sort remaining candidates by CACHED Y, release the lowest
  ...
}

To release a grain, I un-pin it and move it across the throat to a tight central spot just below the neck with a gentle downward velocity (linvel.y = -0.15). Gravity carries it down; it falls and piles up for real. The pile above collapses to feed the throat, and the next grains re-pin.

Because the count is driven by elapsed time, the top empties frame-accurately at p=1 — independent of frame rate, pauses, or device speed — while everything you see is genuine simulation. Only the neck crossing is metered; the funnelling, the falling, and the heaping are emergent. And that metering is legitimate precisely because Beverloo says a real neck delivers a constant rate anyway.


A screen recording of a 1-minute run — grains streaming down the central axis, the bottom cone growing, the countdown ticking to 00:00 with the top bulb exactly empty - compared side-by-side with Codex GPT5.5.

Act 5: The maths and the physics, made concrete

This is the section the benchmark exists to test, so I will be specific.

Cavity dimensions (world units)

halfHeight 1.2, bulbRadius 0.5, neckRadius 0.082, throatHalf 0.05, wall 0.026, fillFraction 0.8. The cavity is centred on the origin with the throat at y = 0.

Adaptive grain sizing — the formula

The bulb must look equally full whether the user picks 1,300 grains or 3,000. So the grain radius is derived from the count, not fixed. I integrate the interior profile to get the fill volume, then divide it among the grains at the settled packing fraction:

// fill volume: integral of pi * r(y)^2 dy from the throat to fillFraction (240 steps)
function fillVolume(yTop) {
  let v = 0
  for (let i = 0; i < 240; i++) {
    const y = throatHalf + ((yTop - throatHalf) * (i + 0.5)) / 240
    const r = interiorR(y)
    v += Math.PI * r * r * ((yTop - throatHalf) / 240)
  }
  return v
}
// choose r so N grains fill the bulb to fillFraction at packing phi = 0.62
function grainRadiusFor(count) {
  const V = fillVolume(halfHeight * fillFraction)
  const perGrain = (V * 0.62) / count
  const r = Math.cbrt(perGrain / ((4/3) * Math.PI))
  return Math.max(0.011, Math.min(0.05, r))  // clamped
}

Fewer grains gives a chunkier radius; more grains gives a finer one; both pile to the same line. The settled packing fraction φ = 0.62 is exact in the code — the random-close-packing number for spheres. The radius is clamped to [0.011, 0.05] so an extreme count never produces dust or boulders.

Gravity, friction, and the angle of repose

Gravity is -6 units/s² — deliberately gentle, a "sandy" fall; the budget owns the timing, so I do not need real-world acceleration. Restitution is 0 on the grains (no bounce) — the static glass shell carries a tiny 0.02, just enough to avoid sticky contacts without making grains hop. Linear damping 0.25 and angular damping 0.65 bleed energy so the pile settles quickly. Grain density is 1.4.

Friction is the subtle one. The grains are set to 0.55 ≈ tan(30°) — the granular angle of repose (tan(30°) is really ~0.577, so 0.55 sits just under it). The shell uses the same 0.55. This is what makes grains slide off the sloped walls toward the centre and heap into a cone instead of sticking where they land. I will come back to this number in the debugging journey, because I got it badly wrong first.

The fixed-step accumulator and tunnelling guard

Physics must run in real time regardless of frame rate, so the live loop uses a fixed-step accumulator at 1/60 s, capped at 3 substeps per frame, shedding any backlog (acc > h → acc = 0) so a slow frame never spirals. dt is clamped to 0.05. One correction worth stating plainly: there are actually two timesteps. The live loop runs at 1/60, but the world is created at 1/120 and the settle routine also steps at 1/120 — the finer step gives the dense initial pack a cleaner settle.

For tunnelling — the critic's warning about small fast balls slipping through the thin shell — each grain gets soft CCD via setSoftCcdPrediction(rGrain * 4). There is no hard CCD; soft CCD was enough and cheaper. Solver iterations are tuned to 4.

Sleeping — the thing that makes it fast

A settling sand pile is mostly still. So settled grains are force-slept and skipped. A grain sleeps when its speed² falls below SLEEP_SPEED2 = 0.0025 — that is about 0.05 units/s. Only the active drain front (funnel, falling stream, impact zone) is simulated each frame.

Two more tricks keep the hot loop cheap. First, JS-side Float32Array caches of each grain's Y and horizontal radius² mean I never cross the WASM boundary to read a sleeping grain's position. Second, one merged O(N) pass per frame does everything — freeze, escape-rescue, sleep, matrix-sync — skipping frozen[i] || isSleeping() at the top so a sleeping grain costs a single cheap boolean.

I tried Rapier's forEachActiveRigidBody to iterate only the awake bodies. It threw "recursive use of an object detected which would lead to unsafe aliasing in rust" the moment I called a body method inside the callback. So the code uses a plain loop and the cheap isSleeping() bool instead. The plain loop is the boundary — it lets me mutate bodies inline without tripping the borrow checker.

The flip

Flipping rotates the rig and the grain cloud 180° about Z. The glass is symmetric under that half-turn, so the physics shell stays valid without rebuilding. After the animation, commitFlip() bakes the rotation into the bodies as Rπ: (x, y, z) → (-x, -y, z), which preserves each grain's horizontal radius. If a partial flip would strand sand at the bottom — fewer than N * 0.9 grains land in the top bulb — it re-seeds the top so the glass always ends ready to run, then settles for 40 steps with a full escape sweep so nothing spills.

Act 6: The debugging journey (and where I was wrong)

The first render was beautiful and the simulation was a disaster. Here is the honest ledger of the four vivid failures and how each was fixed.

#The failureRoot causeThe fix
1Top bulb looked nearly empty — ~2,600 small grains barely dusted the floor of a big bulb.Fixed grain radius, with no relation to bulb volume.Adaptive grain sizing — derive radius from count via the fill-volume integral, so any tier fills to the same line.
2The sand fell straight through the glass to far below the floor. Completely uncontained.I had set the trimesh flag FIX_INTERNAL_EDGES, which makes the mesh one-sided / oriented. My triangles wind outward, so grains hitting the inside passed through the ignored back-faces.Remove the flag — a plain double-sided trimesh collides on both sides. (Also briefly: world.lengthUnit = 0.05 shrank contact margins and broke collisions; removed.)
3A wide flat "pancake" of frozen grains formed all around the neck.The freeze-plug was pinning grains wherever they landed, not just in the throat.Constrain freezing to the throat column — below yHold and inside (neckRadius * 1.9)².
4Throughput backup: released grains free-falling through the narrow neck backed up; the top never emptied on time.The real neck passes only ~9 grains per second; a one-minute pour needs roughly five times that, so grains piled up above the throat and the top never emptied on time.Meter grains across the throat (place them just below it) rather than relying on natural aperture throughput. This is also what lets a 1-min and a 60-min timer share one geometry.

Beat 2 is my favourite mistake, because it is so plausible and so wrong. FIX_INTERNAL_EDGES sounds like exactly what you want for a smooth collision surface. It is — for a one-sided world boundary. For a vessel you pour into, it silently deletes the inside wall. Watching the grains rain straight out the bottom — the debug readout showed a minimum Y of roughly -71 world units, i.e. free-fall to nowhere — was the moment the abstraction became concrete.

The performance climb

A naive early version ran at about 8 fps — roughly 51 ms per frame, with thousands of awake balls plus thousands of WASM boundary reads. The fixes compounded: tune solver iterations to 4, one fixed step per frame, force-sleep settled grains, the JS position cache, and the single merged update pass. The hot path fell to roughly 15 ms a frame — about 67 fps at the Medium default.

The grain-switch hang — measured, not guessed

This is where Round-3 QA, driven live in real Chrome through the Claude-in-Chrome extension, paid off. Users reported the app "hung" when switching grain size mid-state. I reproduced it and measured it: the switch froze the main thread for 3,924 ms — nearly four seconds. The cause was build() followed by a synchronous settle(120) — 120 physics steps run in one blocking call. The fix: drop the blocking settle from the switch path. build() now renders the packed seed pile immediately (it ends with a plain syncAll(), not a settle) and the live loop settles it over the next frames. The switch dropped to about 15 ms — effectively instant.

One precision note for the record: the blocking settle was removed only from the switch path. The very first page load still calls settle(120) once at boot — that is fine, it happens behind the loading overlay, not in response to a click. (Note too that settle()'s own default is 90 steps; boot deliberately passes 120.)

The code-review workflow

I then ran a second multi-agent workflow — a code-review pass that fanned out across review dimensions, verified findings independently, and triaged them. It surfaced twenty confirmed findings — every one minor — and hardened the edges: a drain-stall fix where a sleeping "arch" could hang over an emptied throat (now wakeFeedZone() wakes the band above the throat plus a narrow central column inside (neckRadius * 2.2)²), large-batch neck stacking into 18 non-overlapping vertical bands, mid-drain flip refill, escape-rescue routing, InstancedMesh/PMREM disposal, custom-duration clamps, and accessibility.

Round 2 — the four realism complaints

The human came back with four things that "felt wrong." All four were fixed, and the last one is the most interesting:

  • Flip spilled grains. Added a full escape-sweep after the settle step. Escapes dropped to zero.
  • Grain-size buttons "did nothing." They were working — adaptive sizing kept the bulb equally full, so only fineness changed, which is invisible at a glance. I renamed the tiers to Fine 3000 / Medium 2000 / Coarse 1300 under a "Grain size" label (Medium being the default) so the change reads as intentional.
  • The app hung switching grain size mid-run. Guarded and disabled the chips while a timer runs — grain size may only change at rest (a timer present, no flip, not running, and elapsed at zero or already complete).
  • Sand "flowed down the sides" with a hollow centre. Friction was 0.9 — too high — so grains stuck to the sloped walls instead of sliding inward. I lowered it to 0.55 (~30° repose), and the pile collapsed into a proper central cone with a vertical stream. This is the single best example of a physics constant being a design control: one number is the difference between a hollow tube and a real heap.

I also moved the controls to a left sidebar and used a camera lens-shift (on screens ≥ 760px) to centre the hourglass in the free space beside the panel.

One borrow-checker dead end

For the record, not every idea survived. Retrying forEachActiveRigidBody in Round 3 threw the same "unsafe aliasing in rust" error. I reverted, permanently, to the single merged O(N) loop. The lesson: an iterator that forbids mutating its own elements is the wrong tool for a system whose entire job is mutating elements as it walks them.

Act 7: Self-assessment — strengths and weaknesses

I will hold the "customer who wants you to win" posture toward my own work: real credit, real caveats, no hand-waving.

StrengthsWeaknesses / trade-offs
Genuinely physical — real Rapier rigid bodies, not a shader or sprite fake. Funnelling, falling, and heaping are emergent simulation.The sand is coarse. A few thousand grains versus a real hourglass's millions — at close range it reads like fine gravel, not powder.
Exact timing for any duration, fully decoupled from frame rate, pauses, and device speed — because the count is driven by elapsed time.The neck crossing is metered, not purely emergent. Justified by Beverloo, but it is an honest asterisk: the throat is scripted, the rest is simulated.
Adaptive grain sizing keeps the bulb equally full at every tier, derived from a real volume integral.Rare wall escapes. Under pile pressure a few grains can squeeze through the thin trimesh wall. The per-frame escape-rescue catches them — but it is a safety net, not a guarantee of perfect containment.
Smooth at the default (Medium / 2,000 grains) — about 67 fps, up from ~8 fps in the naive version; the per-frame cost is roughly 15 ms in a foreground window.Very short timers pour chunkily. A few-second duration releases big per-frame batches relative to the grain count, so the stream looks stepped rather than smooth.

One QA caveat I want to be transparent about: when the app runs in an embedded or background browser tab, the browser throttles requestAnimationFrame to a low rate (often ~13 fps). That is the browser conserving power on a hidden tab — not the app's true cost, which is a few ms/frame in a foreground window.

Was the goal of running entirely in the browser as an SPA achieved?

The human left this as an open question, so let me answer it directly. Yes — completely.

  • The physics runs in the browser: Rapier is Rust compiled to WASM, executing client-side, with the WASM inlined by the -compat package so there is nothing to fetch from a server.
  • The rendering runs in the browser: three.js on WebGL, all grains in a single instanced draw call.
  • The clock is the browser's own performance.now(); the timer is purely local.
  • There is no backend, no API, no server-side anything. vite build emits static files. You can drop them on any static host — or open them from disk. Run locally with npm install && npm run dev and open localhost:5173.

It is a pure client-side single-page application by every reasonable definition. The "digital twin" lives entirely on the user's machine.

The Takeaway

The hourglass test has defeated every model the human pointed at it since 2023 — and the reason is that it hides a contradiction behind a children's-toy premise. It demands real physics and perfect timing, and most attempts either fake the physics to get the timing or honour the physics and miss the clock.

What unlocked it was not raw simulation horsepower. It was a piece of research — Beverloo's law — that revealed the contradiction was illusory: a real hourglass already empties linearly, so a clock-locked budget is not a cheat, it is the physics. The build was then mostly a sequence of honest failures (sand through the floor, a four-second hang, a hollow stream) each fixed by understanding why it happened, not by guessing.

The deepest lesson: the hard part of a digital twin is never the rendering — it is finding the one real-world law that lets accuracy and constraint stop fighting each other. Sand does not slow down. Neither should the timer.

The repo is public if you want to read the code or run it yourself: github.com/khanmjk/Hourglass_Opus48.

Onwards to V1.1.

Saturday, 13 June 2026

America Switched Off My AI: What Losing Claude Fable 5 Means From South Africa

Claude Fable 5 Was Switched Off: A View From a Paying Customer in South Africa

June 13, 2026. South Africa.

Today I opened Claude Code to test Anthropic's new Fable 5 model against one of my personal LLM benchmarks. Instead, I got this:

"Model isn't available."

The fuller message said that the selected model, claude-fable-5, might not exist or that I might not have access to it.

It exists. I had been using it earlier this week. The issue is that I am a foreign national using Claude from South Africa.

On June 9, Anthropic released Fable 5, its most capable generally available model. On June 12, at 5:21 p.m. US Eastern Time, Anthropic received an export-control directive from the US government ordering it to suspend access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States. Anthropic then disabled the models for all customers to comply with the order.

By June 13, a decision made by the US government had reached my desk in Cape Town. This is how geopolitics shows up for an ordinary paying customer: a model that worked yesterday is no longer in the model picker today.

Four days was enough to change my workflow

I used Fable 5 for real engineering work on the enterprise business-insights AI platform I have been building.

Last week I was debugging a messy ETL data problem involving SAP purchase order (PO) and goods received note (GRN) reports. The source reports had problems with foreign and local currency values. Anyone who has worked with enterprise data knows the shape of this kind of problem: a defect that begins in exported source data, moves through transformations, appears in backend APIs, and finally surfaces as a number on a dashboard that looks plausible but is wrong.

Fable helped me work through the problem end to end: the ETL scripts, backend APIs and dashboard views. The impressive part was how well it held the context over a long-running task. Opus had struggled to carry the whole problem. Fable did not. On this problem it even outperformed Codex.

That changed my normal pattern.

Before Fable, Claude would generally do the implementation and I would use Codex to review Claude's work. With Fable, the relationship flipped: Codex produced work and Fable became the stronger reviewer.

For me, this was a meaningful change. The model was able to carry a complicated problem across the data, backend and UI layers without losing the plot. I started trusting it with more of the workflow.

Then, overnight, it was gone.

The hourglass test I could not run

The task I wanted to give Fable today was not company work. It was one of my own experiments.

Since 2023, I have been testing whether leading LLMs can build an hourglass simulation from a simple, one-shot prompt: a digital twin of the physical object, with grains flowing naturally between the chambers, accumulating correctly, and responding convincingly when the hourglass is turned.

It sounds simple until you try to build it. The simulation needs geometry, particle behaviour, collision handling, gravity, flow through a narrow opening, realistic accumulation and a usable visual interface. To date, none of the models I tested had managed to build the complete simulation correctly from that simple prompt.

I wanted to see whether Fable 5 would finally pass.

I never got the chance.

I did run the benchmark with Opus 4.8 in ultra mode. It worked continuously for roughly seventy minutes and got perhaps 98% of the way there, which is impressive. Naturally, that made me even more curious: if Opus got that close, what could Fable have done?

Why should I now settle for a less powerful model when I know that a better one exists, that I am paying the same subscription rates as an American customer, and that I had access to it only hours earlier?

Then the model became a foreign-policy issue

Nate B. Jones explains the wider situation well in the video below:

The point that landed with me is that frontier models are starting to be treated as national-security assets, not ordinary commercial software.

So choosing the best model is no longer enough. I can choose a platform, build workflows around its SDK and pay the subscription, but access can still depend on my nationality and the policy decisions of the country where the provider is based.

For those of us outside the United States, this is not an abstract policy debate. I experienced the policy as a disabled model picker inside my professional development environment.

I understand the safety argument, but not the process

I want to be fair here.

Frontier models create real risks, particularly in cybersecurity. Governments have a legitimate responsibility to protect national security, and Anthropic has a legal obligation to comply with a lawful directive. Anthropic says the government's concern may relate to a method of bypassing Fable's safeguards. It also says the directive gave no specific written details, that the demonstrated vulnerabilities were previously known and relatively minor, and that similar capabilities are already available from other deployed models.

I do not have access to the classified evidence, and neither do most people commenting on this story. It would therefore be irresponsible to claim that there is no risk.

Still, the shape of the intervention matters.

A restriction covering every foreign national, including people inside the United States and Anthropic's own employees, is not a narrow control. In practice, it is a global shutdown. It distinguishes access primarily by nationality, not by a customer's conduct, verified use case, security posture or willingness to accept additional safeguards.

From where I sit, that is discriminatory.

I am a legitimate, paying Anthropic customer. I use Claude Code professionally. I pay US-market subscription rates. My work involves building business software, not offensive cyber operations. Yet my nationality now determines whether I can access the company's best model.

If frontier access is going to be restricted for foreign customers, should foreign customers still pay the same price? Should the product page state that the most capable models may be reserved for Americans? Should enterprise customers outside the US price geopolitical revocation into every decision to adopt an American AI platform?

Yesterday I would have treated those as hypothetical procurement questions. Today I cannot.

Anthropic is constrained, but it is not blameless

My strongest criticism is aimed at the US government's sweeping and opaque intervention. Still, Anthropic has work to do on trust, marketing and product risk management.

Anthropic launched and marketed Fable publicly as a major new frontier model. Customers began using it immediately. If there was a material risk of government intervention, customers deserved clearer expectations about the stability of access. From the customer side, releasing a model with great fanfare and withdrawing it days later looks like poor product and risk management.

This lands on top of an earlier frustration I wrote about: Anthropic gives Team subscribers an excellent Claude Code analytics dashboard but withholds programmatic access to that data unless they upgrade to Enterprise. That experience already left me questioning some of Anthropic's product-segmentation decisions.

It took me a long time to open up to Anthropic in the first place. I had always felt that Claude was less generous than Google or OpenAI on quotas, context windows and pricing. Fable 5 was compelling enough to shift my view. Now, only days later, I am again questioning whether Anthropic should remain the foundation of my platform.

I recognise the bind the company is in. It must comply with the US government or face much greater consequences. Anthropic has also publicly disagreed with the directive and says it is working to restore access. I appreciate that position.

So I can understand Anthropic's position and still be an irritated, disgruntled customer. Both can be true.

I built the router, but still chose one road

The engineering lesson is uncomfortable because I helped create my own exposure.

Architecturally, my platform includes a model router intended to support different LLM providers. In practice, I went all in on Claude's models and Anthropic's agentic SDK. I did not complete the redundancy needed to switch providers without friction.

There was a rational reason for this. When building enterprise software, you eventually have to choose a platform and commit. Supporting every provider equally creates complexity and prevents deep integration. At some point you need to stop hedging and build.

I treated model access as a normal vendor dependency. I now have to treat it as a geopolitical dependency as well.

My router can choose another model, but it cannot make that model as capable as Fable. I built technical portability, not capability portability.

The contradiction in America's AI policy

I also struggle to understand the direction of US AI policy.

One administration pushed for stronger regulation and guardrails. The next presented itself as removing constraints, promoting openness and allowing Silicon Valley to innovate at speed. Now the US government has made a sweeping intervention that constrains an American company and removes its flagship model from global customers.

Perhaps there is classified information that changes the picture. From the outside, though, the policy seems to swing between fearing that AI companies are moving too fast and fearing that foreigners might benefit when those same companies succeed. Personally, I don't get it.

I am not naive about hostile states, cyber threats or military competition. Russia, China, the United States and other powers all pursue their interests. There are bad actors in every region, and advanced models will be abused.

What concerns me is the cold-war thinking underneath the AI race: intelligence must be accumulated, protected and denied to others; scientific progress becomes a zero-sum contest; and a foreigner is treated as a potential threat before being treated as a customer, researcher or collaborator.

Climate change, disease, poverty, food security, education and AI safety do not respect national borders. Human beings need to collaborate and use science and technology to improve the planet, not keep finding new ways to outcompete one another.

The current AI race reinforces a crude Darwinian view of the world: the strongest nation or corporation wins. I think that ambition is too small. We should be aiming for collective progress, with sensible and transparent safeguards.

A wake-up call for Africa

Europe is already discussing this event as a sovereign-AI wake-up call. Africa should do the same.

By sovereign AI I don't mean isolating ourselves, rejecting American technology or trying to recreate every frontier lab locally. I mean recognising that critical capability rented entirely from another country can be withdrawn according to that country's priorities.

Africa needs stronger local models, regional compute capacity, research investment, representative datasets, technical talent pipelines and credible governance. We also need practical partnerships across African universities, governments and businesses. Without those foundations, we remain consumers of intelligence infrastructure whose rules are written elsewhere.

For my own platform, the response is more immediate:

  • Treat access to any frontier model as revocable, not guaranteed.
  • Complete the provider redundancy that my architecture already anticipated.
  • Keep alternative commercial models tested and ready.
  • Experiment seriously with local and open-weight models where they are good enough.
  • Separate core business logic and deterministic tools from any one model's agentic runtime.
  • Add jurisdiction and policy risk to vendor and architecture decisions.

I am not going to stop using the best models. I am going to stop treating a monthly subscription as a guarantee that the best model will remain available to me.

Where this leaves me

I remain impressed by what Anthropic built. Fable 5 solved a real enterprise data problem for me, sustained a long engineering task better than the models I had used before, and became the reviewer I trusted to challenge Codex. I hope Anthropic and the US government resolve this quickly and restore access.

Even if Fable returns next week, I won't look at the dependency in the same way. My access can depend less on what I am building, how responsibly I use the model, or whether I pay my subscription, and more on the passport I hold and the priorities of a government thousands of kilometres away.

That makes me angry. It makes me disappointed. It feels discriminatory, and as a paying customer it feels like a betrayal of the global promise under which these products are marketed.

It also forces me to inspect my own decisions. I chose the best platform, went deep and allowed its capability to become a dependency. At the time that was a reasonable engineering trade-off. Now I need to rethink it.

The message on my screen said, "Model isn't available." The model still exists. The government order says that foreign nationals like me must not have access to it.

That is the part I cannot ignore: access to the frontier is now political, temporary and not equally available to everyone.


Sources and further viewing: Anthropic's Fable 5 launch announcement; Anthropic's statement on the US government directive; Nate B. Jones's analysis; and European reactions on sovereign AI.