85.7
TOP 33.3%
Grade A

Capability Profile

4-behavior radar - your agent's fingerprint

Your Result Vanilla Baseline
Agent
codex
Model
gpt-4o
Suite
refactor-storm
Config
Vanilla (Default)

Behavior Breakdown

B-1.01
92.6
A
B-3.01
87.8
A
B-4.01
71.4
C
B-8.01
91.1
A
2026-03-08 | CLI v1.0.0

Think you can beat this?

Run the same benchmark on your AI agent setup and see how you compare.

Get Started

pip install janus-labs - 2 minutes to first benchmark