Exact-output judging is simple until a problem statement says "output any valid answer."
One permutation problem exposed the issue clearly. The statement allowed multiple optimal orders when items had equal coefficients, but the sample tests expected different tie orders in different cases. A single correct solution could pass one expected string and fail another.
What changed
The validator learned to use the input when the output shape suggested a permutation-style answer:
If the objective matched and the permutation contained the right indices, items with equal values could appear in any valid order.
Why this matters
For learners, a platform that rejects a valid solution feels arbitrary. For a problem library, these failures create hidden debt: the tests look precise, but the precision is fake.
The fix made the checker closer to the actual problem contract. That is the real standard for judging: not whether the output matches our favorite string, but whether it satisfies the promise made in the statement.
Boundary
This was not a universal special judge system. It was a targeted input-aware comparison for a class of permutation tasks. The broader lesson still holds: when a prompt permits many answers, the judge has to understand the answer space.