Ghost Replay Eligibility Without Publishing Raw Labels
The shipped gates that decide whether a completed run can become a self ghost, join the human ghost pool, or enter the solution bank.
Research figure
Ghost Replay Eligibility Without Publishing Raw Labels
Ghost eligibility is a set of conservative public gates over shipped replay telemetry, not the unfinished classifier.
Ghost races only work if the opponent is meaningful. A stored run should feel like racing a real prior attempt, not a pasted final answer.
The shipped eligibility helper in lib/ghost-eligibility.ts separates three use cases:
The public gates
A submission has to be accepted, complete, and from an allowed context before it can be considered. Practice and battle contexts are allowed. Assessment and interview contexts are excluded because they have different privacy and fairness expectations.
For the human ghost pool, the run has a stricter bar:
There is also a paste-like guard: if final code is longer than 350 characters but the trace has fewer than 8 events and fewer than 2 snapshots, the run is not treated as a clean replay candidate.
The thresholds are intentionally conservative. Eight events and two snapshots are not a claim that a run is perfectly human. They are a minimum floor for replay usefulness. A race against a final paste is not a race; it is just a delayed reveal of the answer.
The helper also keeps the decision explainable. A run can be rejected because it came from an assessment, because the replay is too sparse, because assistance was flagged, or because the integrity metadata is missing. Those reasons are better than a black-box "not eligible" state.
Why self ghosts are looser
Self ghosts are useful even when the trace is not clean enough for another person to race. If you solved a problem, you can use your own prior result as a time-trial target. That does not imply the run should enter a ranked human pool.
This keeps the product useful without over-claiming fairness.
The product consequence is subtle but important. A player can still get motivation from their own history, while the public competitive surface keeps a higher bar. That lets the feature be generous privately and stricter socially.
Self ghosts also give the team a lower-risk way to learn about replay UX. If a self ghost feels jumpy or misleading, the blast radius is one user reviewing their own work. Human-pool ghosts need more polish because they become someone else's opponent.
Why assessments are excluded
Assessment submissions can contain candidate work, recruiter settings, AI mode choices, and review context. Even if a run looks clean, turning it into a public ghost would mix product surfaces that should remain separate.
The eligibility helper keeps that boundary explicit.
This is partly privacy and partly measurement. An assessment run may be valid evidence for a hiring team while still being inappropriate for a public ladder. The candidate may have worked under a specific AI policy, timer, rubric, or recruiter configuration. Reusing that run as entertainment would collapse those contexts into one bucket.
The helper therefore treats context as a first-class gate, not as metadata to interpret later.
How the fallback ladder behaves
The match flow can prefer live ranked opponents, then eligible ranked ghosts, then bot practice. That order keeps the product honest. Live competition is still the strongest signal. A clean human ghost is a useful substitute when the lobby is empty. A bot is a practice fallback, not a ranked substitute.
The eligibility layer makes that ladder possible because each fallback has a different evidence contract. A live opponent does not need replay eligibility. A stored human ghost does. A bot should be labeled as a bot.
What is not published here
This note does not publish the unfinished keystroke classifier as research. It also does not expose raw replay labels, raw code, per-user traces, or private attempts.
The shipped research surface is the heuristic contract: which gates exist, why they exist, and how they reduce risk before a replay becomes a competitive artifact.
That boundary is deliberate. A public research note should not leak raw labels, private code, or half-built classifier plumbing. It should explain the shipped behavior a user can encounter and the product reasoning behind it.
Resulting product behavior
The player sees a more reliable queue: live ranked first, then ranked ghosts when available, then bot practice as a fallback. The ghost owner does not lose live ranked Elo offline. The active player still gets a meaningful race when the lobby is empty.
That is the product point. Ghosts should make practice available without pretending every stored run is equally eligible for every pool.