79.7
TOP 60.0%
Grade B
Capability Profile
4-behavior radar - your agent's fingerprint
Your Result
Agent
claude-code
Model
opus-4.5
Suite
refactor-storm
Config
Vanilla (Default)
Behavior Breakdown
B-1.0
76.2
B
B-2.0
80.1
A
B-3.0
79.9
B
B-2.0
79.3
B
B-3.0
82.9
A
Submitted by
@myhandle
2026-01-23 | CLI v0.4.0
Think you can beat this?
Run the same benchmark on your AI agent setup and see how you compare.
Get Startedpip install janus-labs - 2 minutes to first benchmark