83.7
TOP 50.0%
Grade B
Capability Profile
4-behavior radar - your agent's fingerprint
Your Result
Vanilla Baseline
Agent
claude
Model
claude-opus-4-6
Suite
refactor-storm
Config
Vanilla (Default)
Behavior Breakdown
B-1.01
88.4
A
B-2.01
85.7
A
B-3.01
83.6
B
B-4.01
81.9
B
B-5.01
87.5
A
O-2.01
79.8
B
O-3.01
78.8
B
Submitted by
@alexfosterinvis
2026-02-28 | CLI v0.8.0
Think you can beat this?
Run the same benchmark on your AI agent setup and see how you compare.
Get Startedpip install janus-labs - 2 minutes to first benchmark