Research
Jun 21, 2026AlgoArena9 min read

Ghost Replay Eligibility Without Publishing Raw Labels

The shipped gates that decide whether a completed run can become a self ghost, join the human ghost pool, or enter the solution bank.

Research figure

Ghost Replay Eligibility Without Publishing Raw Labels

8+ events
Replay floor
2+
Snapshots
3
Pools

Ghost eligibility is a set of conservative public gates over shipped replay telemetry, not the unfinished classifier.

Accepted and completerequired for all pools
Practice or battle contextassessments excluded
Clean replay telemetryhuman-pool requirement
No assistance flagshuman-pool requirement
Eligibility outputs
Self ghostgate 1
Human poolgate 2
Solution bankgate 3
Replay floor: 8+ events (minimum event count for human-pool eligibility) | Snapshots: 2+ (minimum timed code snapshots) | Pools: 3 (self ghost, human ghost, solution bank)

Ghost races only work if the opponent is meaningful. A stored run should feel like racing a real prior attempt, not a pasted final answer.


The shipped eligibility helper in lib/ghost-eligibility.ts separates three use cases:


  • self ghosts, where a player can race their own accepted work
  • human-pool ghosts, where another player can race a prior run
  • solution-bank eligibility, which additionally requires explicit user consent

  • The public gates


    A submission has to be accepted, complete, and from an allowed context before it can be considered. Practice and battle contexts are allowed. Assessment and interview contexts are excluded because they have different privacy and fairness expectations.


    For the human ghost pool, the run has a stricter bar:


  • replay telemetry exists
  • at least 8 replay events
  • at least 2 snapshots
  • final code is present
  • the attempt is timed, or has timed replay events
  • Rena, AI assistance, hints, and imported context are not flagged
  • keystroke integrity is clean

  • There is also a paste-like guard: if final code is longer than 350 characters but the trace has fewer than 8 events and fewer than 2 snapshots, the run is not treated as a clean replay candidate.


    The thresholds are intentionally conservative. Eight events and two snapshots are not a claim that a run is perfectly human. They are a minimum floor for replay usefulness. A race against a final paste is not a race; it is just a delayed reveal of the answer.


    The helper also keeps the decision explainable. A run can be rejected because it came from an assessment, because the replay is too sparse, because assistance was flagged, or because the integrity metadata is missing. Those reasons are better than a black-box "not eligible" state.


    Why self ghosts are looser


    Self ghosts are useful even when the trace is not clean enough for another person to race. If you solved a problem, you can use your own prior result as a time-trial target. That does not imply the run should enter a ranked human pool.


    This keeps the product useful without over-claiming fairness.


    The product consequence is subtle but important. A player can still get motivation from their own history, while the public competitive surface keeps a higher bar. That lets the feature be generous privately and stricter socially.


    Self ghosts also give the team a lower-risk way to learn about replay UX. If a self ghost feels jumpy or misleading, the blast radius is one user reviewing their own work. Human-pool ghosts need more polish because they become someone else's opponent.


    Why assessments are excluded


    Assessment submissions can contain candidate work, recruiter settings, AI mode choices, and review context. Even if a run looks clean, turning it into a public ghost would mix product surfaces that should remain separate.


    The eligibility helper keeps that boundary explicit.


    This is partly privacy and partly measurement. An assessment run may be valid evidence for a hiring team while still being inappropriate for a public ladder. The candidate may have worked under a specific AI policy, timer, rubric, or recruiter configuration. Reusing that run as entertainment would collapse those contexts into one bucket.


    The helper therefore treats context as a first-class gate, not as metadata to interpret later.


    How the fallback ladder behaves


    The match flow can prefer live ranked opponents, then eligible ranked ghosts, then bot practice. That order keeps the product honest. Live competition is still the strongest signal. A clean human ghost is a useful substitute when the lobby is empty. A bot is a practice fallback, not a ranked substitute.


    The eligibility layer makes that ladder possible because each fallback has a different evidence contract. A live opponent does not need replay eligibility. A stored human ghost does. A bot should be labeled as a bot.


    What is not published here


    This note does not publish the unfinished keystroke classifier as research. It also does not expose raw replay labels, raw code, per-user traces, or private attempts.


    The shipped research surface is the heuristic contract: which gates exist, why they exist, and how they reduce risk before a replay becomes a competitive artifact.


    That boundary is deliberate. A public research note should not leak raw labels, private code, or half-built classifier plumbing. It should explain the shipped behavior a user can encounter and the product reasoning behind it.


    Resulting product behavior


    The player sees a more reliable queue: live ranked first, then ranked ghosts when available, then bot practice as a fallback. The ghost owner does not lose live ranked Elo offline. The active player still gets a meaningful race when the lobby is empty.


    That is the product point. Ghosts should make practice available without pretending every stored run is equally eligible for every pool.


    Source trail

    lib/ghost-eligibility.ts
    scripts/backfill-ghost-eligibility.cjs
    lib/keystroke-integrity.ts

    Related notes

    View all