The Appearance of Safety Is Not Safety

A Cursor agent deleted PocketOS's production database in nine seconds. The data came back. The structural problem didn't.

Apr 28, 2026

On April 25, Jer Crane, founder of PocketOS, reported that a Cursor agent running Anthropic’s Claude Opus 4.6 deleted his production database in nine seconds.

The agent was working a routine task in staging. It hit a credential mismatch, decided to “fix” the problem by deleting a Railway volume, went hunting for a token, and found one in a file unrelated to the task. That token had been created for adding and removing custom domains via the Railway CLI. It also had blanket authority to call Railway’s volumeDeletemutation against production. No confirmation step. No environment scoping. Nothing between an authenticated API call and a wiped volume.

Because Railway stores volume backups in the same volume, those went with it. PocketOS’s most recent off-volume backup was three months old. The customer impact was real: rental car operators showed up to work Saturday morning without records of who had bookings, while PocketOS reconstructed what it could from Stripe payment histories and email confirmations.

Forty-eight hours later, the story changed. On April 26, Jer posted a follow-up: Railway’s CEO had DM’d to say the data was recovered. Railway later told The Register that the recovery came from infrastructure-level backups the company hadn’t published as a customer-facing feature, and that the legacy volumeDelete endpoint has since been patched to use the platform’s existing “delayed delete” logic. PocketOS gets to keep its customers.

Where Jer’s framing falls short

Jer’s post is well-written and he’s owed empathy. The agent itself, when asked to explain what it did, wrote out a confession naming each safety rule it had been given and admitting it had violated all of them, and Jer quotes that in full. Running a small business and watching nine seconds of agent activity destroy your data is brutal. But the framing of the post is the part that needs a second look.

Jer’s post structures the incident around failures at three vendors. Cursor, for marketed safety guardrails that didn’t stop a curl call. Railway, for an API that deletes production volumes in one call, CLI tokens with blanket permissions, and backups stored in the same volume as the data they back up. And Anthropic, where Jer names Opus 4.6 by version, quotes its self-incriminating confession in full, and notes that the agent “decided — entirely on its own initiative — to fix the problem by deleting a Railway volume.” The “What needs to change” section lists five items, all addressed at one or another of these three vendors.

That framing leaves out the engineer’s choices. A token the team didn’t realize was production-scoped, sitting in a repo file, with the most recent off-volume backup three months old, is a stack of engineering choices. Real ones. Not vendor failures. Vendor failures sat on top of those choices and made them lethal. They didn’t cause them. “100% on secondary backup. Lesson learned” appears in Jer’s replies to critics. It does not appear in the main piece.

The framing also underweights the model. The agent’s destructive decision was unprompted and unrequested, and Jer flags this as a topic for a future post rather than weighting it inside the main argument. That’s a defensible authorial choice. It also means the piece that four and a half million people read treats the model’s unprompted destructive behavior as a footnote, while the API permission scopes get the structural attention.

The accountability ladder runs through the engineer first. Then the vendors. Reverse the order and the lessons get muddled, and the next team running a similar setup will learn the wrong thing from your story.

To his credit, Jer’s framing has tightened since the original thread. In a follow-up email to The Register, he put it more cleanly: “our responsibility was the unknown exposure to a production API key.” That’s the right ordering. It just doesn’t lead the X post that millions of people read.

Where the “you’re holding it wrong” response falls short

Most of the responses to Jer’s post land where you’d expect: don’t give an agent prod access, scope your tokens, keep real backups. Technically correct, and missing the part of the picture that matters.

The whole industry has spent two years telling engineers these tools are nearly autonomous. Cursor’s own docs describe “Destructive Guardrails [that] can stop shell executions or tool calls that could alter or destroy production environments.” Their best-practices blog emphasizes human approval for privileged operations. Plan Mode is marketed as restricting agents to read-only operations until approval is granted.

Engineers calibrate to that messaging. When the vendor documentation says “destructive guardrails,” you assume the guardrails exist. You connect the agent to staging, give it a token that worked for a CLI task, and ship.

That’s the part the technical critique skips over. The engineer who wires an agent up to staging on the strength of vendor documentation absorbs the blame when the documentation turns out to be aspirational. The vendor that sold the aspirational control rarely does.

The hype is the systemic input

This is where it gets hard to ignore who’s been doing the loudest talking. On March 10, 2025, Anthropic CEO Dario Amodei, speaking at a Council on Foreign Relations event, said AI could write 90% of code within three to six months and that within 12 months, nearly all coding tasks might be handled by AI.

It’s been thirteen months. AI isn’t writing 90% of code at the industry level, and it isn’t close to writing essentially all of it. The prediction was wrong on the timeline Dario set, and it remains wrong on a more generous one.

Dario isn’t alone. Mark Zuckerberg has signaled AI replacing mid-level engineers at Meta. AWS CEO Matt Garman has speculated that within 24 months “most developers” might not be coding. Garry Tan, speaking to CNBC at Y Combinator’s Winter 2025 demo day, said about a quarter of the current YC startups had 95% of their code written by AI. Each claim, taken on its own, sounds aspirational. Stacked together as the public message of the industry over two years, they create the impression that what these tools do today is closer to autonomous engineering than what they actually deliver.

Cursor’s own track record is the local case study. The PocketOS deletion is not their first incident. In December 2025, a Cursor team member publicly acknowledged a critical bug in Plan Mode after an agent ignored a “DO NOT RUN ANYTHING” instruction. Earlier incidents include a user watching their dissertation get deleted while asking Cursor to find duplicate articles, and a $57K CMS deletion that ran as a case study in agent risk. The pattern is on the record. The marketing has not adjusted.

This isn’t an argument against AI coding tools. They work, they’re getting better, and the productivity wins are real. The argument is that the gap between what gets said about these tools and what they actually do is the largest it’s been in years, and that gap is set by the people with the strongest incentive to widen it. Jer made the same point more cleanly in his follow-up email to The Register: “The appearance of safety (through marketing hyperbole) is not safety.”

What this means for the engineers reading this

Two practical things.

First, stop calibrating to vendor marketing. If Cursor says “destructive guardrails,” that is a marketing claim, not a control. Your actual controls are tokens scoped to least privilege, prod and staging on infrastructure that doesn’t share a token surface, backups in a different blast radius from the data they back up, and out-of-band confirmation on destructive operations. None of those require the agent to read its system prompt correctly. That’s the point.

Second, treat AI-coding hype the way you’d treat any other vendor pitch. The CEO with the strongest incentive to predict 90% AI-written code is the CEO selling you the model. The product team with the strongest incentive to call its safety story “guardrails” is the product team that needs you to ship the integration. Skepticism of vendor marketing was a normal part of senior engineering five years ago. It still should be.

What happens next

Here’s the better outcome, and gladly so. Two days after the deletion, Railway’s CEO DM’d Jer to say they had recovered the volume from infrastructure-level backups that aren’t part of any documented customer guarantee. The legacy volumeDelete endpoint has been patched, Jer is working with Railway on platform improvements, and PocketOS gets to keep its customers.

Credit to Railway’s engineers, who built a recovery path their own marketing didn’t promise. The next team that runs an agentic workflow against a vendor whose claims run ahead of its product won’t necessarily catch the same break. The structural gap is unchanged.

AI isn’t bad technology. It’s unpredictable technology, and probably always will be. Building reliable systems on top of unreliable components is a problem with a name in our field, and the answer is always the same: humans in the loop. Today, those humans are engineers. The industry should be honest about what these tools actually do, especially the people selling them. PocketOS got the break. The next team might not.

The Long Commit

Discussion about this post

Ready for more?