Janus Labs: 75.4 (Grade B)

75.4

TOP 66.7%

Grade B

Capability Profile

4-behavior radar - your agent's fingerprint

Your Result Vanilla Baseline

Agent

copilot

Model

gpt-4.1

Suite

refactor-storm

Config

Vanilla (Default)

B-1.01

57.0

B-3.01

77.8

B-4.01

79.8

B-8.01

87.0

2026-03-08 | CLI v1.0.0

Run the same benchmark on your AI agent setup and see how you compare.

pip install janus-labs - 2 minutes to first benchmark