The Hiring Signal Is Moving Out of the Code

AI did not make technical interviews less technical. It changed what the interview has to prove.

Jun 16, 2026

The easiest mistake in a code review is stopping when the code runs.

The code compiles and the tests pass. Then you notice the part the test did not cover: the retry behavior changed, an implicit contract moved, or the abstraction will make the next incident harder to debug.

That is the gap hiring has always tried to measure. Code is the thing a company can put in front of a candidate, so code became the proxy. Solve the problem. Talk through the implementation. Pass the tests. From there, the hiring team inferred judgment, ownership, taste, and level.

AI makes that inference weaker.

A candidate can now arrive at correct-looking code with a model beside them. That does not make the work fake, and it does not make coding interviews useless. It means the useful questions start after the answer appears: what did the candidate trust, what did they reject, what did they test, and what risk would they own if this shipped?

That is the shift this piece is about. The hiring signal is moving out of code output and into the work around it: how the candidate reviews, validates, recovers, communicates, and owns the result.

Today, we cover:

Why the old proxy was strained before AI.
Why behavioral interviews were already testing technical work.
Why cheating is the loud symptom, not the root problem.
What process evidence looks like when AI is part of the work.
Why senior and staff candidates now have to prove level differently.
What a better hiring loop might measure.

1. The old proxy stopped being clean

Coding interviews were never meant to model the whole job. They were a shortcut for a problem hiring cannot avoid: a company has to make a decision before it has watched someone work inside its systems, with its constraints, for months.

That shortcut made sense when the artifact was expensive enough to produce that it carried more signal. A candidate who could reason through a problem, write a working solution, and explain the implementation gave the interviewer something concrete. It was incomplete, but it was useful.

Developers were already skeptical of the trade. HackerRank’s 2025 Developer Skills Report says 78% of developers believe assessments do not align with real-world tasks, and 56% find algorithm-based questions irrelevant to their jobs. That complaint predates the current AI wave. A lot of engineers have been saying some version of this for years: the interview problem is cleaner than the job.

I recognize that bias in myself. On the power-plant telemetry platform I worked on earlier in my career, the difficult parts were not isolated to writing a parser or moving bytes from an edge device to the cloud. The work was in latency, sensor behavior, failure modes, backpressure, retry logic, observability, and the question of what downstream data analysts could trust. A hiring process that only looked at whether someone could make code appear would miss a lot of the engineering that made the system reliable.

AI widens that gap. In the same HackerRank report, 97% of developers said they use at least one AI assistant, 61% said they use two or more AI tools at work, and AI-generated code accounted for 29% of developers’ code on average. Vendor survey numbers should always be read carefully, but the direction is hard to miss. AI-assisted code is no longer a weird edge case.

That changes the weight of the artifact. The problem is not that every AI-assisted answer is suspicious. The problem is that the answer explains less about the person behind it. If an assessment already felt detached from daily engineering work, and now the artifact it rewards can be produced or polished by a model, the old signal gets squeezed from both sides.

That is the first pressure on the interview loop. It cannot only ask whether the candidate reached an answer. It has to ask whether the path to the answer exposed enough understanding to trust the candidate with the job.

2. The behavioral interview was already more technical than we admitted

The word “behavioral” does a lot of damage in engineering hiring. It makes the round sound soft, secondary, almost ornamental. First you prove you can code, then you talk about teamwork and conflict and communication, as if those things happen somewhere outside the technical work.

That split has always been artificial.

In an April 2026 Pragmatic Engineer piece, Steve Huynh, formerly a Principal Engineer at Amazon, reflected on nearly 1,000 interviews, including around 600 as an Amazon Bar Raiser. His point is useful because it does not come from AI hype. It comes from years of interview loops before this current wave fully landed. The behavioral round, in his telling, often decides fit and level because it exposes the work that coding rounds do not: how someone handles projects going sideways, disagreements, incomplete information, stakeholder pressure, and influence without authority.

Those are not decorations around engineering. They are engineering at senior levels.

A mid-level engineer can often succeed inside a bounded technical problem. A senior engineer has to make progress when the boundary is blurry. A staff engineer has to make other people’s work easier, align teams that do not naturally agree, and turn a technical direction into something the organization can actually execute. At those levels, “tell me about a time a project went wrong” is not a personality test. It is a probe for operational judgment.

That matters now because the artifact has become less isolated from tools, templates, copied context, and generated suggestions. Interviewers need to inspect the candidate’s relationship to the artifact: what they trusted, what they rejected, what they tested, what they understood about the codebase, and what risk they noticed before someone else named it.

The old behavioral round was already asking versions of those questions. The problem is that many loops treated it as a separate soft-skills filter instead of part of the technical evaluation.

3. Cheating is the loud symptom

The cheating story is real. It is also not the whole story.

CodeSignal said in February 2026 that detected cheating and fraud attempt rates in proctored assessments rose from 16% in 2024 to 35% in 2025. For entry-level assessments, CodeSignal said the rate increased from 15% to 40%. Those are vendor numbers, and CodeSignal sells assessment integrity products, so they should be treated carefully. The company is measuring detected and flagged attempts, not proving that every flagged session was successful cheating or that its data represents the whole market.

Still, the direction matches what many hiring teams feel. HackerRank reports that 76% of developers say AI makes gaming assessments easier, and 73% feel it is unfair to lose out to candidates who use AI to game tests. Once AI tools are available to everyone, any assessment that rewards final output while hiding the process becomes easier to distort.

The tempting response is to make the old proxy more tightly controlled. Ban AI. Add proctoring. Detect screen switching. Flag pasted code. Train interviewers to spot suspicious pauses, suspicious speed, suspicious fluency. Some of that is necessary. Companies need fair processes, and candidates deserve not to lose to someone performing a fake version of competence.

But fraud detection can only protect the fairness of the test in front of it. It cannot make a weak test more representative of the job.

If the job itself now includes AI-assisted coding, banning AI from every interview creates a strange mismatch. If the job does not allow AI because of security, compliance, or risk, then banning it in the interview makes sense. But many engineering teams are somewhere messier: developers use AI for debugging, code review, codebase understanding, tests, refactors, scripts, documentation, and first-pass implementation. The interview then pretends the work happens without the tools that shape the work.

That mismatch creates bad incentives. Candidates hide tool use. Companies hunt for hidden tool use. Both sides spend energy preserving the appearance of a clean artifact.

The real design question is narrower and harder: can the interview separate a candidate who used AI responsibly from one who used it to mask weak understanding?

A two-lane flow diagram comparing the old hiring proxy with the AI-era signal map. The old lane moves from candidate writes code, to interviewer evaluates the artifact, to hiring team infers skill. The AI-era lane moves from candidate and AI produce code, to interviewer evaluates process evidence, to hiring team infers judgment and level. Process evidence includes codebase understanding, validation, trade-offs, recovery, communication, and ownership. — *When AI can produce the artifact, the interview has to move closer to the process that created it.*

4. The interview has to collect process evidence

Karat’s NextGen work is useful as a market signal because it shows one direction hiring infrastructure is moving. In its December 2025 launch announcement, Karat described interviews where candidates work on complex multi-file projects with an integrated AI assistant while human interviewers probe reasoning, trade-offs, and judgment in real time. In the April follow-up, Karat said it now gives engineering leaders evidence such as skill scores, rationale write-ups, and timestamped markers tied to moments in the interview.

Again, this is vendor evidence. Karat sells interview infrastructure, so it has every reason to argue that interviews need more infrastructure. But the underlying problem is real: if the output and the candidate’s skill are decoupled, the hiring system needs an evidence trail that lives somewhere other than the final code.

That maps more closely to how AI-assisted work is settling inside real teams. Stack Overflow’s May 2026 pulse survey found that AI agent usage at work rose to 59%, up from 31% in the previous annual survey, but also that 63% of technologists still rarely or never let agents run fully on autopilot. The interesting part is the combination. People are using agents more, but the dominant working mode is still supervised, reviewed, and bounded.

A May 2026 longitudinal study by Annie Vella and Kelly Blincoe makes the same shift visible from another angle. The authors found that 82% of participants reported spending less time writing code, and they describe a broader move from creation toward verification activities. Their term for the new category of work is supervisory engineering: directing, evaluating, and correcting AI output.

If that is increasingly the work, then interviews that only measure artifact creation are behind the job.

A better AI-era technical interview does not need to become a surveillance exercise. It needs to make process observable. Give the candidate a realistic codebase slice. Let them use tools if the role would let them use tools. Ask them to inspect a change, critique a model suggestion, write or adjust tests, explain where the model’s answer is brittle, choose between two implementation paths, and name what they would monitor if this shipped.

The strongest moments in that kind of interview are often not the moments where code appears. They are the moments where the candidate slows down for the right reason.

In practice, the signal is often small. The candidate notices that a generated function handles the happy path but changes an implicit contract. They keep the refactor but rename a concept because the model erased domain meaning. Before changing a data model, they ask about production constraints. When the prompt omits a failure case, they add the test anyway.

The candidate is doing engineering there. The old prompt just did not have a clean way to score it.

5. Senior and staff candidates have to prove level differently

The higher the role, the more dangerous it is to treat the finished patch as the whole signal.

That does not mean senior and staff engineers get a pass on code. A senior engineer who cannot reason through code is a liability, especially now that AI can produce plausible nonsense quickly. But the senior signal is not only implementation under time pressure. It is whether the candidate can choose the right problem shape, reduce risk, and make the result usable by other people.

Huynh’s framework in The Pragmatic Engineer piece is helpful here. The excerpt from his book describes level through four dimensions: scope, contribution, impact, and difficulty. Those dimensions are hard to infer from a standalone coding artifact. They show up more clearly in how a candidate reviews a change, explains a decision, notices ambiguity, and handles a problem that does not stay inside the prompt.

A useful staff-level task here is not a blank editor. It is a small service change where the generated patch fixes the visible bug, but the API is used by another team, the migration has no rollback path, and the test suite only covers the happy path. A strong senior candidate should catch the missing test and explain the production risk. A staff candidate should usually go further: ask who depends on the contract, identify the rollout constraint, and challenge whether this is the right place to solve the problem.

That is the level difference the code alone will not show. The same artifact can hide very different kinds of judgment. One candidate sees a local implementation problem. Another sees a system boundary, an organizational dependency, and a future incident if the rollout goes wrong.

This matters more in the current market because the bar has moved up. The Pragmatic Engineer’s public piece on tech interviews in 2025 describes higher expectations in DSA and system design interviews, more demanding senior and staff bars, and more downleveling. Whether every company experiences this the same way is less important than the direction. In a market with more qualified candidates than open roles, companies can ask for more evidence before they say yes.

AI raises the same pressure from the other side. If more candidates can produce polished artifacts, the differentiator moves to the explanation around the artifact. Senior candidates need to show the shape of their judgment. Staff candidates need to show the scope of it.

This is where “storytelling” gets misunderstood. The goal is not to tell a smoother story. The goal is to make the work legible. A good senior story is evidence of what was ambiguous, what you owned, which alternatives you considered, what changed because of your decision, where the trade-off hurt, and what you learned when reality disagreed with the plan. The code artifact rarely carries all of that by itself.

A table showing how the same AI-assisted code change can expose different hiring signals at different levels. The mid-level signal is local correctness and debugging. The senior signal adds risk, validation, and production ownership. The staff signal adds cross-team dependency, rollout design, and whether the change should exist in this part of the system. — *The same patch can prove different things depending on whether the candidate treats it as a local fix, a production change, or a system-level decision.*

6. A better loop measures the work around the code

The better hiring loop is not simply “allow AI.” That is too shallow. A company can allow AI and still run a bad interview. It can require AI and accidentally select for prompt performance over engineering judgment. It can ban AI and still run a fair process if the actual job bans AI too. The tool policy matters, but it is not the heart of the design.

The heart of the design is whether the loop creates comparable evidence of how the candidate works.

Comparable matters because process-heavy interviews can become unfair very quickly. The more an interview depends on narration, confidence, and polish, the more it can reward candidates who have been coached into the right performance. It can also punish candidates who are strong engineers but less comfortable speaking in a high-pressure environment, or who are working in a second language, or who come from teams where the local storytelling norms are different.

So the answer cannot be “make everything behavioral” and call it modern.

A better loop would still be structured. It would ask consistent questions. It would use a calibrated rubric. It would document evidence rather than vibes. It would give candidates the same kind of task, the same rules around AI, and the same chance to explain what they did. It would score implementation and supervision as related but distinct signals.

For a senior backend role, that might mean giving the candidate a small service with a bug, an incomplete test suite, and an AI assistant. The task is not to produce the most code. The task is to understand the behavior, make a safe change, explain the risk, and show how they would validate it. For a staff role, the loop might add an architectural constraint, a cross-team dependency, or a migration choice. The candidate still writes code, but the interview watches how they reason around it.

This would also make interviews feel less strange to candidates who already work with AI every day. The candidate would not have to pretend their workflow is cleaner than it is. They would have to show that their workflow is responsible.

That is a higher bar in some ways. It is easier to memorize a pattern than to explain why the model’s pattern is wrong for this codebase. It is easier to generate a passing solution than to defend the test strategy. It is easier to look productive than to show good judgment when the tool gives you something almost right.

The phrase “almost right” is doing a lot of work here. AI-generated code often fails in the place where interviews used to stop looking. It compiles, but it misunderstands the domain. It passes the visible tests, but it weakens the invariant. It follows the local style, but it changes the operational behavior. It gives you the answer a good interviewer would now use as the start of the interview, not the end.

Takeaways

The coding round has to become less isolated. The AI-era interview should not drop the technical bar. It should stop treating a finished artifact as the whole technical record. A candidate who cannot reason through code is still not ready for a serious engineering role. A candidate who can produce code without explaining the decisions around it is also harder to trust than they used to be.

Behavioral signal needs a better name. Many of the signals companies call behavioral are really senior engineering signals: handling ambiguity, communicating trade-offs, influencing without authority, owning mistakes, and making decisions when the available information is incomplete. AI did not make these skills newly important. It made them harder to keep outside the technical evaluation.

Integrity tools are necessary but incomplete. Hiring teams need some way to prevent candidates from faking work. That is a real problem, especially for early-career assessments where the pressure is intense and the signal is thin. But if the assessment is built around output without process, proctoring can only defend the shape of the test. It cannot make the test more like the job.

The best interview evidence will look more like review evidence. In daily AI-assisted work, the important human actions are often direction, evaluation, correction, and ownership. Hiring loops need to make those actions visible. That might mean live review, timestamped evidence, structured rationales, realistic codebase tasks, or interviewer notes tied to specific moments. The exact format can vary, but the evidence has to move closer to the work.

Candidates need to make ownership visible. AI-polished artifacts will raise the floor on what many candidates can show. The way to stand out is not to pretend the tools are not there. It is to show the part of engineering the tools do not own: the risk you saw, the trade-off you made, the context you asked for, the change you would not ship, and the consequence you were willing to be responsible for.

I do not think this ends with one standard interview format. Some roles should ban AI because the job does. Some should permit it because the job does. Some should test both modes. The mistake is treating the passed test as the end of the evidence.

The code can still be on the screen. It can still pass. The interview cannot stop there.

The next question is the one that looks more like real work: what would the candidate trust, what would they change, what would they test again, and what would they be willing to own in production?

I do not think hiring has a clean answer yet. But any process that cannot see that layer is measuring less of the job than it thinks.

The Long Commit

Discussion about this post

Ready for more?