The Sign-Off Layer Is Becoming the Real Engineering System

AI made code generation cheaper. The part that still matters is whether a human can understand, verify, and own what the agent produced.

Jun 09, 2026

Let me start with a story: A senior engineer opens a pull request, everything looks good, the tests pass, the description is clean, the diff is a bit long but nothing crazy. There are comments explaining the migration path, a generated test file, and one of those careful little summaries that makes the whole thing feel more understood than it probably is.

But then, the PR states, coauthored with Claude Code (or any other harness for that matter). It comes with no surprise nowadays that AI had its claws all over the code, but there’s an interesting question that I think needs answering. Who is truly responsible for the changes? Is there a human being who’s willing to say they understand the change well enough to own it when it breaks, defend it in review, explain it during an incident, and accept the consequences of having let it into the system?

That is the part of AI-assisted engineering that still feels under-discussed to me. We spend a lot of time talking about the generation layer. Which model wrote the code. Which agent can use the terminal. Which IDE has the better context window. Which benchmark moved by three points.

But the production system does not end when code appears.

Today, we cover:

Why AI made the approval step more important, not less important.
Why the engineering work is shifting from creation toward supervision.
Why faster code generation exposes the rest of the delivery system.
What sign-off actually means once agents can create real artifacts.
What a sane sign-off system might look like without turning every autocomplete into a compliance event.

The short version is this: AI made generation cheaper. It did not make ownership cheaper.

1. The approval step did not go away

The cleanest AI coding policy I have seen recently comes from a place that is not known for being loose about software process: the Linux kernel.

The penguin itself doesn’t ban the use of AI as maybe some of you would expect, but it doesn’t recognize it as an entity in the process either. It treats AI-assisted contributions as any other contribution. All contributions still need to comply with licensing requirements. AI agents must not add Signed-off-by tags. Only humans can certify the Developer Certificate of Origin. The human submitter is responsible for reviewing the generated code, ensuring licensing compliance, adding their own sign-off, and taking responsibility for the contribution.

That last part is the system.

The kernel also added an Assisted-by tag for AI involvement, including the agent name, model version, and relevant tools. The point is not to shame anyone for using AI. The point is to keep the work attributable enough that reviewers and maintainers can reason about what happened.

The companion tool-generated content guidelines are even more explicit about the underlying problem. Tooling can increase contribution volume, but reviewer and maintainer bandwidth is scarce. If a meaningful amount of content was created by a tool, contributors should be transparent about the tool, the affected parts, the input when it matters, and how the submission was tested.

My read is that the kernel landed on the right framing because it did not start from AI exceptionalism. It started from the existing engineering practices.

The contribution has an origin. The origin needs to be legible. A reviewer needs to know what they are reviewing. A maintainer needs to know who understands the change. A human in the sign-off chain needs to be able to answer questions later.

And this is important! it’s a process with three layers.

The generation layer is where the agent creates something: code, tests, dashboards, documentation, migration plans, internal tools, release notes, incident summaries, or the first version of a design.

The verification layer is where someone checks whether that thing is correct, secure, licensed, observable, compatible with the existing system, and appropriate for the operational risk it carries.

The sign-off layer is where a human accepts ownership in a way the organization can audit later.

Most of the AI tooling conversation is still obsessed with the first layer. That makes sense. Generation is where the demo happens. It is where the speed is visible. It is where a model can do in three minutes what used to take an afternoon.

But the expensive part of engineering was never only producing text that compiles.

The expensive part was knowing what that text means inside a system that already exists.

2. The work is moving from creation to supervision

This is not just a philosophical concern. The work is already moving.

In a longitudinal study submitted to arXiv on May 22, 2026, Annie Vella and Kelly Blincoe followed professional software engineers across two questionnaires six months apart. They had 158 eligible participants at the first point, 101 at the second, and 95 matched participants across both rounds. The headline finding is not subtle: 82% of participants reported spending less time writing code.

But the more interesting finding is what replaced the writing.

The authors describe a shift from creation toward verification activities, and they propose a category they call supervisory engineering work: directing, evaluating, and correcting AI output.

That phrase is a little academic, but the underlying job feels familiar. You ask the agent to make a change. You read the result. You notice it solved the wrong edge case. You redirect it. You ask for tests. You inspect the tests because the tests might only prove the bug it introduced. You check whether the code fits the conventions of the repo. You decide whether the diff is worth keeping, splitting, rewriting, or throwing away.

The same paper reports what it calls a productivity-experience paradox. At both time points, 84% of participants reported productivity improvement. At the same time, among matched participants, the share reporting worsened developer experience in at least one dimension nearly doubled from 14% to 27%. Flow state and cognitive load got worse while feedback loops improved.

That tracks with my own experience much more than the clean productivity story does.

I do feel faster with AI tools. Some days, dramatically faster. I can generate a first pass at an internal script, a content workflow, a test suite, or a migration plan before I would have finished arranging my own thoughts in a blank file.

But I have also noticed the weird part: the tool often moves the work into a mode where I am approving, redirecting, checking, and reconciling. The artifact shows up quickly. The judgment still takes time. Sometimes the judgment takes more attention because the artifact looks more finished than my own unfinished work would have looked at the same point.

This is the approval behavior leak I have written about before. The dangerous moment is not always the spectacular agent failure where a tool tries to delete the repo or run a command it should not run. The dangerous moment is much more ordinary: I am tired, the diff looks plausible, the explanation is polished, and I find myself clicking yes before I have really understood what I am approving.

That moment is going to matter more as the generation layer gets better.

Bad generated code is annoying, but at least it announces itself. Good-looking generated code is harder. It asks for trust before it has earned it.

Now, this is not to say there aren’t players out there completely ignoring this layer, vibe coders and developers are overly relying on AI to the point that they don’t care about the code anymore. That’s not engineering, and while it’s true, they are building things, I surely hope those things are not business critical. Want to vibe code a small internal tool? a dashboard, something that makes your life easier? go ahead! but don’t use the same practice with your critical production systems.

3. Faster generation exposes the delivery system

The optimistic version of AI coding says that if engineers can produce more code, teams can ship more value. Sometimes that is true. It is also incomplete.

If code generation gets faster and the rest of the delivery system stays the same, the bottleneck does not disappear. It moves.

Harness’s 2026 DevOps modernization report is useful here because it looks downstream of the IDE. The report says very frequent AI coding users are more likely to deploy daily or faster, but they also report more deployment pain. According to Harness, 69% of very frequent AI coding users say AI-generated code leads to deployment problems at least half the time. Very frequent users also report longer mean time to recovery for deployment-related production incidents: 7.6 hours, compared with 6 hours for frequent users and 6.3 hours for occasional users.

That does not prove AI caused those incidents. Harness says as much. Teams using AI heavily may simply be pushing more change through systems that were already strained.

But that is exactly the point.

AI does not just generate code. It changes the volume and shape of work flowing through review, CI, deployment, security checks, incident response, documentation, and support. If those systems were held together by a few senior engineers doing late-stage heroics, AI does not remove the heroics. It may increase the number of moments where those heroics are required.

Harness’s engineering excellence material points at the same invisible work from another angle. In a survey of 700 engineering practitioners and managers, Harness argues that developers are becoming validators of machine-generated output, with conventional productivity frameworks failing to capture validation time, agent accuracy, cognitive load, and trust calibration. It reports that 81% of engineering leaders say code review time has increased since deploying AI, and that 31% of a developer’s day is now consumed by AI-related invisible work that appears in no metric.

You can quibble with any one survey number. You should. Vendor surveys come with incentives, definitions, and sampling choices that deserve skepticism.

But the direction matches what I see in practice.

Teams get excited about the gross output. More code. More tasks started. More PRs opened. More internal tools appearing from nowhere. The net output is harder to see because the cost is distributed across review queues, context switching, subtle bug fixing, security review, unplanned coordination, and the cognitive load of deciding what deserves trust.

That is why the sign-off layer matters. It is the place where gross output becomes owned output.

Without that layer, AI adoption creates a pile of things that look done and behave like debt.

4. Sign-off is not code review with a new label

It is tempting to make this a code review essay. That would be too small.

Code review is one part of sign-off, but sign-off is broader than the PR approval button. It includes knowing what tools touched the work, what context they had, what the human author understands, what validation evidence exists, who owns the risk, and where the decision can be reconstructed later.

This matters because agents are no longer just autocomplete in the editor.

Vercel’s Open Agents project is a good example of where the architecture is going. It is an open-source reference app for background coding agents on Vercel. The system includes a web UI, durable agent workflow, sandbox orchestration, and GitHub integration. It can clone repos, work on branches, use file and shell tools, resume runs, and optionally auto-commit, push, and create PRs.

That is a very different object from a tab-completion tool.

Once agents become background systems with authentication, repo access, sandbox lifecycles, workflow state, cancellation behavior, and optional PR creation, “human in the loop” is too vague to be useful. Which human? At what checkpoint? With what evidence? After which tools ran? Before which irreversible action? Under whose account? With what audit trail?

The research world is converging on similar concerns. A 2026 paper in Automated Software Engineering on open, accountable, and trustworthy AI-IDEs frames traceability and validation loops as architecture, not as after-the-fact process decoration. That is the right instinct. If assistant-generated work is going to become normal, the record of how that work came to exist becomes part of the engineering system.

I do not think every autocomplete needs a confession booth. That would be absurd, and it would kill the productivity gain for the least risky cases.

The distinction has to be risk-based.

If an agent completes a line, fixes spelling, renames a variable, or formats a file, I do not need a policy ceremony. That is below the threshold where explicit AI attribution helps the team.

If an agent creates a meaningful function, modifies security-sensitive code, touches production configuration, generates a migration, writes a dashboard people will use for decisions, prepares a customer-facing incident summary, or opens a PR from a background workflow, the standard should be different.

At that point, the question is not “did a model help?”

The question is whether the work is legible enough for a human to own.

The faster the generation layer gets, the more explicit the verification and sign-off layers have to become.

5. The manager and staff engineer angle

This is where the topic becomes less about tools and more about senior work.

A junior engineer can use AI to produce more code. So can a senior engineer. So can an engineering manager who has not opened the repo in six months. The generation layer does not care very much about the title but the sign-off layer does.

Senior engineers and staff engineers are going to spend more time deciding whether generated work fits the system. Not whether it compiles. Not whether the happy path works. Whether the change belongs, whether it preserves the right abstractions, whether the test evidence is meaningful, whether the operational story is complete, whether the blast radius is bounded, and whether the author actually understands what they are asking the team to merge.

Engineering managers are going to face the same shift from a different angle.

It is not enough to tell teams to use more AI. That is just pressure on the generation layer. The management work is designing the operating system around it: review expectations, ownership records, security gates, incident accountability, tool budgets, measurement practices, and the norms that let reviewers slow something down without being treated as blockers to the AI strategy.

If leadership measures AI adoption mostly through visible output, the rational team behavior is to create more visible output. More PRs. More generated docs. More internal dashboards. More agent activity. The sign-off work becomes a hidden tax paid by the people who care enough to read carefully.

That is a bad system. It punishes the engineers doing the work that makes AI safe enough to use. It also teaches everyone else that the organization values generation more than ownership.

This is why I keep coming back to attention. Sign-off consumes real attention, and senior attention is already the scarcest engineering resource in many companies. The fact that an agent can produce a 2,000-line refactor quickly does not mean a staff engineer can responsibly approve it quickly. It may mean the staff engineer now has a harder object to review because the diff is large, coherent, and slightly alien.

The uncomfortable part is that good sign-off will sometimes look slow compared with the demo. That does not mean it is waste. It may be the only part of the system that knows what the demo is allowed to become.

6. What a sane sign-off system might include

I do not think the answer is a giant AI policy document that nobody reads.

The answer is probably a set of boring defaults that make generated work easier to trust, easier to reject, and easier to audit later.

For meaningful AI-assisted contributions, the PR should say what kind of assistance was used. Not a dramatic disclosure. Just enough context for the reviewer to understand whether they are reading hand-shaped work, agent-shaped work, or a mix. The Linux Assisted-by idea is a good starting point because it treats attribution as an engineering aid, not a moral judgment.

Generated work should be smaller by default, not larger. This is one of the places where AI incentives are backwards. Agents are good at creating big coherent patches, but reviewers are still human. If a change cannot be reviewed without trusting the generator’s own summary, it is probably too large.

Reviewers should be allowed to ask for a human-written rationale. Not a polished AI summary. A short explanation from the author: what changed, why this approach was chosen, what alternatives were rejected, what could break, and how the change was validated. If the author cannot explain it, the team should not merge it just because the agent can.

Validation evidence should travel with the work. Tests, security checks, migration dry-runs, performance notes, screenshots, logs, or rollout plans should be linked where they matter. This is not about making PR descriptions longer. It is about making approval less dependent on faith.

Agent-created artifacts need owners. A dashboard that changes a product decision needs an owner. A generated support workflow needs an owner. An internal tool that writes to production-like data needs an owner. A migration plan generated by an agent needs an owner. Ownership cannot stop at “the agent made it.”

Destructive or cross-boundary actions need explicit gates. Anything that touches production data, customer accounts, deployment configuration, billing, auth, secrets, or broad repo state should have boring checkpoints that are hard to bypass accidentally.

The organization should measure the work AI creates around the code, not only the code itself. Review time, rework, incident recovery, subtle bug fixing, context switching, and validation burden are part of the cost. If those stay invisible, leaders will keep making decisions from the wrong side of the ledger.

None of this needs to apply equally to every case.

The threshold should rise with risk. A local helper script is not the same as a payment flow. A generated README edit is not the same as an auth middleware change. A one-line refactor is not the same as a background agent opening a PR after a long autonomous run.

The point is not to make AI usage feel dangerous.

The point is to make approval honest.

Takeaways

The approval problem did not shrink with the implementation problem. If anything, the implementation problem becoming cheaper makes the approval problem easier to ignore. That is a bad trade. The system still needs a human who can understand, defend, and own the change.

The valuable AI skill is not only prompting. Prompting matters, but the higher-leverage skill is making generated work legible enough that another human can approve it without pretending. That means smaller changes, better rationale, visible validation, and enough traceability to reconstruct what happened later.

The bottleneck is moving into senior judgment. Staff engineers, tech leads, and engineering managers are going to feel this first because they already sit near the ownership boundary. They will be asked to approve more work that arrives looking finished. The hard part will be noticing which finished-looking work still has not earned sign-off.

AI policy should start from the existing engineering contract. The Linux kernel guidance is strong because it does not treat AI as magic. The normal process still applies. Humans sign. Tool assistance is attributed where meaningful. Responsibility does not move to the model.

The sign-off layer is where AI coding becomes real engineering. Demos end at generation. Production starts at ownership. Between those two is the part most teams have not designed carefully enough yet.

The PR did not become safer because an agent generated it quickly. It became safer only when a human could explain the change, bound the risk, show the validation, and put their name under it.

That is slower than the demo.

It is also the part that lets the demo survive contact with the real world.

The Long Commit

Discussion about this post

Ready for more?