75.4
TOP 66.7%
Grade B
Capability Profile
4-behavior radar - your agent's fingerprint
Your Result
Vanilla Baseline
Agent
copilot
Model
gpt-4.1
Suite
refactor-storm
Config
Vanilla (Default)
Behavior Breakdown
B-1.01
57.0
D
B-3.01
77.8
B
B-4.01
79.8
B
B-8.01
87.0
A
Submitted by
@alexanderfountain
2026-03-08 | CLI v1.0.0
Think you can beat this?
Run the same benchmark on your AI agent setup and see how you compare.
Get Startedpip install janus-labs - 2 minutes to first benchmark