72.4
TOP 100%
Grade C
Capability Profile
4-behavior radar - your agent's fingerprint
Your Result
Vanilla Baseline
Agent
claude
Model
claude-sonnet-4-5-20250929
Suite
refactor-storm
Config
Vanilla (Default)
Behavior Breakdown
B-1.0
90.0
A
B-2.0
88.0
A
B-3.0
0.0
F
B-4.0
79.3
B
B-5.0
90.9
A
B-2.0
82.3
B
B-3.0
76.0
B
Submitted by
@alexanderaperry-arch
2026-02-28 | CLI v0.8.0
Think you can beat this?
Run the same benchmark on your AI agent setup and see how you compare.
Get Startedpip install janus-labs - 2 minutes to first benchmark