How Scoring Works

We measure how you work with AI, not whether you memorized algorithms. Here's exactly what we evaluate and what "good" looks like.

Level Playing Field

Every candidate gets the same AI tools for the duration of the assessment (Claude, GPT, and DeepSeek). The assessment fee covers all AI usage. Candidates who can't afford premium AI subscriptions aren't penalized. You compete on skill, not wallet.

Five Dimensions We Evaluate

Each dimension captures a different facet of how you work. Together, they create a holistic picture of your AI fluency.

📋

Planning

We look at how you approach a problem before writing code. Do you read the requirements carefully? Do you make notes or create a plan? Or do you immediately start prompting AI without thinking?

What "good" looks like

A strong candidate spends 2-5 minutes reading the problem, jotting down approach notes, and identifying edge cases before touching the IDE. They create or edit AI-generated plans rather than blindly proceeding.

🤖

AI Collaboration

We evaluate how effectively you work with AI tools. This includes the specificity and quality of your prompts, whether you iterate and refine AI suggestions, and how critically you evaluate AI-generated code.

What "good" looks like

A strong candidate writes specific, context-rich prompts. They don't blindly accept AI output. They review it, test it, and iterate when the first suggestion isn't right. They might reject an AI suggestion and try a different approach.

✅

Verification

We measure how actively you test and verify your code. Do you run tests after making changes? Do you debug issues methodically? Or do you submit untested code?

What "good" looks like

A strong candidate runs their code frequently after AI applies changes, after manual edits, and before submitting. They use test cases to verify correctness and debug failures systematically rather than asking AI to fix everything blindly.

âš¡

Execution Efficiency

We track whether you use your time productively. Are you actively coding, prompting, and testing? Or spending long periods idle? This isn't about speed. It's about sustained engagement.

What "good" looks like

A strong candidate maintains a steady rhythm of thinking, prompting, coding, and testing. They don't rush, but they also don't freeze up. Brief pauses to think are fine. Long unexplained idle periods are a concern.

✨

Product Thinking

We evaluate the quality and thoughtfulness of your final deliverable. This includes code structure, design decisions, and whether the solution actually works well, not just whether it compiles.

What "good" looks like

A strong candidate produces clean, well-structured code. For UI tasks, the design is polished and functional. They think about edge cases and user experience, not just the happy path.

What We Don't Measure

✗Algorithm memorization
✗LeetCode pattern matching
✗Typing speed or raw keystroke count
✗Whether you use AI at all (you should!)
✗How fast you finish (engagement matters more)
✗Trick questions or gotcha puzzles

Frequently Asked Questions

Can I game the score?

We track behavioral patterns, not surface-level metrics. Artificially inflating keystrokes or running empty test commands won't improve your score. The algorithm looks at meaningful work patterns: genuine iteration, thoughtful prompts, and real verification. The best strategy is to actually work well with AI, which is what we're measuring.

What AI tools can I use?

During AlgoArena assessments, every candidate gets the same built-in AI assistant powered by state-of-the-art models (Claude, GPT, DeepSeek). You don't need your own subscriptions. Everyone gets the same capabilities, so it's a level playing field.

Is my code tracked? What data do you collect?

Yes, we track code changes, AI prompts, test runs, tab switches, paste events, and timing. This data is used for session replay and scoring. It's shared with the company that invited you to the assessment. We don't sell your data or use it for any other purpose.

How is this different from CodeSignal's scoring?

CodeSignal's scoring criteria are largely opaque, so candidates and recruiters don't know what's being measured. AlgoArena publicly documents our evaluation dimensions (this page), and recruiters see the full weighted breakdown with raw signals. We believe transparency leads to better hiring decisions.

Do you measure algorithm knowledge?

No. Traditional OAs test whether you memorized sorting algorithms or dynamic programming patterns. AlgoArena tests how you work: planning, collaborating with AI, verifying your code, and shipping quality solutions. These are the skills that actually predict on-the-job performance in modern engineering.

Ready to see how you score?