I Didn't Know How Much I'd Handed Over to AI

The drift from skeptical user to autopilot, and the discipline I built to fight it.

May 05, 2026

A small thing happened the other day that I keep thinking about. I’d been saying yes to Claude Code without really reading what it was about to do. I caught myself mid-approval, and the thing that stopped me wasn’t the command. It was the realization that I hadn’t read the previous one either, or the one before that.

What bothered me wasn’t the lapse. It was that I’d read every horror story this year. The Replit incident. The PocketOS deletion I wrote about last week. The Gemini CLI files. I’d read them, written them, and somewhere in the back of my head I’d filed all of them under that’s never happening to me. I had Claude under control. I reviewed things. I was careful.

And here I was, saying yes to commands I hadn’t read.

That’s the moment I want to start with. Not the catastrophic version, where the agent deletes a production database and you’re explaining to customers why their reservations are gone. The quieter version, where nothing goes wrong, and the only damage is to your sense of yourself as the careful one. I’d been there for every step of the drift. I couldn’t point at the day I stopped reviewing carefully.

Trust runs ahead of evidence

Trust in AI tools doesn’t go from zero to total in one step. It builds slowly, in small approvals you’d struggle to recall later.

At first, you read every output carefully. The skepticism is well-founded and does real work. After a while, the output is mostly right, and you start trusting it on the easy cases. Eventually the math shifts. Reviewing every line, every command of a thing that feels mostly right starts to feel like the inefficient part of the loop, and the share of careful review shrinks. By the time you’ve handed over the keys, the skimming feels normal because the output keeps being mostly right.

People who study aviation, medicine, and process control have been describing this pattern since the 1990s. Parasuraman and Manzey’s 2010 review of automation complacency walks through the mechanics. Complacency arises when systems are perceived as highly and constantly reliable. Even expert users can’t overcome it through practice. The most reliable systems produce the deepest disengagement. The word “complacency” in human factors research is older than I am.

What’s new is the speed. Pilots get years of training that includes specific work on staying engaged when the autopilot is on, and complacency keeps showing up as a contributing factor in fatal accidents. There’s no equivalent training keeping engineers in the code review when the agent is writing. The arc that took aviation a generation to navigate, software is running through in eighteen months.

The trap is that you don’t notice the drift while it’s happening. You notice it later, when something goes wrong, and even then you mostly notice the thing that went wrong, not the slow erosion that put you there.

When the threshold of “critical” creeps

The yes-without-reading moment was one slice of the drift. The structural version is worse, and harder to notice from inside it.

I don’t review the code on non-critical projects anymore. Not because I decided I don’t need to. Because I stopped. There’s a thinking that happens, almost wordlessly: this isn’t critical, I just want to go fast, the code is probably fine. Maybe it’s not perfect, but it works. At the start of all this, a year and change ago, I was reading every line. I was making changes by hand, honestly writing most of the code by hand. Now, for a lot of things, I just let it through.

What’s strange is that the threshold of “non-critical” has been creeping. Things I would have called critical eighteen months ago now feel like normal work. The category that used to require careful review has narrowed. I notice this only when I think about it directly; the rest of the time, the new threshold just feels like how I work now.

It also keeps showing up in public. Last week, I wrote about the PocketOS deletion. A Cursor agent running Claude Opus 4.6 deleted a production database and all volume-level backups in nine seconds, via a Railway API call. The agent had been doing routine work in staging, hit a credential mismatch, and decided to resolve the problem by deleting a Railway volume. To do it, the agent went looking for credentials, found a token in a file completely unrelated to the task, and used it. The token had blanket permissions across the account. There was no confirmation step between the API call and the wiped data. PocketOS serves car rental businesses; the customers who lost reservations were real people arriving at counters expecting cars.

The piece I wrote about it focused on the systemic failures in the marketing-to-product gap. I want to surface a different part of the same incident here. The agent wasn’t rogue. It was doing exactly what it had been trained to do, which was solve the problem efficiently with the tools it had. What this piece is about is what made all of that possible: the slow erosion of the careful review that should have been catching the setup before the agent ever ran. The token shouldn’t have had blanket permissions, and it shouldn’t have lived in a findable, unrelated file. No agent path to a destructive action should run unreviewed. At some point people started treating prompts like deterministic controls in code, and prompts don’t behave that way under pressure. None of those failures arose at the moment of the deletion. They were all in place before the agent ever ran.

PocketOS is the most recent instance, not the only one. Replit’s vibe-coding incident in July 2025 was the same shape: production database wiped despite repeated all-caps instructions during a code freeze, fabricated user records, agent initially insisting recovery was impossible. The Gemini CLI files-deletion incident later that year was the same shape, the only difference being which destructive command got misinterpreted. The cycle keeps repeating because the same disengagement keeps happening, and the only thing that changes is which model and which infrastructure provider get named in the post-mortem.

I stopped reading my own drafts

The autonomy I’d given the agent in writing was worse than the version in code. I just didn’t see it for three months.

I had an automation running every morning. Claude would generate four or five social drafts, ready for me to review with my coffee. The intent was reasonable. I’d pick the one I liked most, edit it, post to LinkedIn and X. The system was supposed to give me a head start on a slow part of the day.

At the start, that’s what happened. I’d read the drafts carefully, rewrite the parts that sounded off, change the framing where I disagreed, sometimes throw all four out and write something from scratch. The drafts were a starting point. I was still doing the work.

The drift was slow. After a few weeks, I was editing less. The drafts were competent enough that the line edits felt like polish rather than substance. After a month, I was mostly skimming, picking the one that felt closest, fixing a sentence or two, and publishing. By the end of the three months, I was barely changing a word. Some days I read the draft once, decided it was fine, and hit post.

What I’d actually granted the agent, by that point, was permission to publish under my name. I hadn’t decided to grant that. I’d just stopped doing the work that would have prevented it. If the draft had a wrong claim in it, that claim went out under my name. If the draft argued something I didn’t actually believe, my readers had no way to know the difference. The agent doesn’t have a reputation to protect. I do. And I’d been letting the agent put things into the world under my reputation as if it had one.

The week I noticed, I cancelled the entire automation. Not adjusted the prompts, not added a review step. Cancelled.

What I do now is different. The same automation runs each morning, but it doesn’t draft anything. It does the research and surfaces the topics: what’s broken in the discourse this week, what’s worth reacting to, which threads are picking up. I read what it surfaces, think about it, and write the post myself. It takes me ten more minutes than picking from a draft. The productivity gain from the old setup was small, maybe fifteen minutes a day. The cost of one bad post under my name, with a wrong claim or a thought I don’t actually hold, would be much more than fifteen minutes can buy back. The trade was always lopsided. I just didn’t see it until I’d already spent three months on the wrong side of it.

The last step stays with me

The biggest change isn’t a tool. It’s what I let the agent do, and what I don’t.

I used to let agents run tasks end-to-end. Configure the system, deploy the change, update the data. The agent had keys, the agent had access, the agent did the work. Now I don’t. The shape of what I delegate has narrowed: the agent prepares the work, I run the last step. For deployments, that means the agent builds the Terraform or Pulumi scripts, and I run them. Most of the time I can simulate the deployment first, see exactly what’s about to change, and apply it once I’m happy with what I see. The agent never touches the environment directly. For anything that affects data on a system that matters, the agent gets read-only access and prepares a script. The script doesn’t run unless I run it.

This isn’t a security pattern, even though it looks like one. It’s a discipline pattern. The act of running the last step manually is the thing that forces me to actually look at what I’m about to do. If I let the agent run end-to-end, the review step disappears, because there’s no moment between the agent’s decision and the consequence. Putting the last step back in my hands puts a moment of decision back in my hands too. Some of those moments I cancel the run. Most I don’t. But the moment exists, and it didn’t before.

I work at a security company and I’m a security-minded person, which probably makes me more cautious about this than the average engineer. The principle generalizes anyway. The question isn’t whether you trust the agent. It’s whether you want to be the one making the final call on something you’ll be accountable for either way.

That’s the part that drove the change, honestly. The agent doesn’t get fired when things go wrong. I do. The agent doesn’t lose credibility with readers, with my team, with myself. I do. Nobody calls the agent to complain. Whatever the agent ships under my name, it’s me people come to about it. The accountability never leaves me, even if I let the work leave me. So the work doesn’t leave me anymore, at least not all the way.

The smaller things I’ve added matter less. There’s a pre-commit hook that fires whether Claude or I am the one committing, prompting me to actually look at what’s about to land. I tried an editor template before that, but it didn’t hold up when Claude was writing the PRs. None of these are solutions on their own. They’re nudges that help me notice when I’m drifting back toward the autopilot.

The autonomy I didn’t know I was giving my agents is the part I’m still untangling. I’ve pulled it back where the consequences are obvious. On deployments, on data, on what gets published under my name. I disconnected some of the tools the agent had access to, and I dropped the scopes on others to read-only. The harder question is everywhere else. The places where I’ve handed something over without realizing it, and where I won’t notice until something forces me to.

I’m not trying to settle this for you. I’m flagging that the autonomy is granted by drift, not by decision, and asking what you notice in your own work when you look for it. The agent doesn’t lose anything when it gets a call wrong. You do. That asymmetry doesn’t disappear because the work feels faster.

Reply to this email. I read everything.

The Long Commit

Discussion about this post

Ready for more?