Measuring Project Work, Not Just Answers
Assessments expanded toward project work, replayable process, workspace signals, and benchmarks that explain how candidates actually build.
Measuring Project Work, Not Just Answers
Traditional coding assessments are good at one thing: checking whether a final answer passes tests. Modern engineering work asks for more than that.
What changed
We pushed assessments toward project-style work, replayable process, terminal and workspace signals, similarity review, and benchmark-oriented reporting. The product direction is simple: if the candidate's process matters, the platform should preserve enough of it to review fairly.
This does not mean every signal deserves equal weight. It means the final score should not be the only artifact. A candidate's planning, iteration, verification, and use of assistance can all help explain what happened.
Why it matters
Two candidates can land on the same final answer for very different reasons. One reasoned through the problem, tested edge cases, and used tools carefully. Another got lucky or followed a brittle path that would collapse in a larger codebase.
[Assessments](/product/assessments) should help teams see the difference. That is especially important as AI becomes normal in software work. The point is not to punish assistance. The point is to measure whether assistance was used with judgment.
Where it points
This work connects to the [benchmark](/oa/benchmark) direction: clearer standards, better comparison, and less hand-waving around what a score means. The long-term goal is an assessment artifact that a candidate, recruiter, and engineer can all understand.