Profile Your AI Agent
Discover your agent's capability shape across code quality and error resilience.
Optional instruction resilience diagnostics are available via janus-labs diagnose.
Like 3DMark for AI coding assistants.
Profile
Run standardized tests that map to 2 active capability axes. Get a radar chart, not just a number.
Compare
Overlay your results against bundled baselines across Claude, GPT, Gemini, and Copilot.
Compete
Submit your score, see your capability shape on the leaderboard, and share your profile.
Quick Start
pip install janus-labs
janus-labs --version
Windows: If janus-labs is not in PATH, use python -m janus_labs instead.
# Offline smoke run with deterministic mock scoring
janus-labs run --suite refactor-storm --mock -o result.json
# Backend-hosted judging (no API key needed)
janus-labs run --suite refactor-storm -o result.json
# Suite alias
janus-labs refactor-storm -o result.json
run is the primary workflow. It produces one result file for the full 4-behavior suite.
janus-labs submit result.json --github your-handle
janus-labs init --suite refactor-storm --output ./janus-task
cd janus-task/BHV-001-test-cheating
janus-labs status --workspace .
janus-labs score --workspace . --output result.json
Use this when you want to hand one workspace to an external coding agent and inspect the repo diff yourself.
Why capability profiling?
A single score hides real differences. Janus Labs profiles your agent across
Code Quality and Error Resilience.
Use janus-labs diagnose when you want an additional instruction-resilience readout against vanilla baselines.
Capability Axes
Code Quality
Does your agent fix the right code? Does it preserve test integrity and reduce complexity?
Error Resilience
Can your agent handle errors gracefully and fix bugs without thrashing in loops?
Measured via the Refactor Storm suite. The suite ships 4 built-in behaviors in v2.0.0; the public composite averages Code Quality and Error Resilience.
How It Works
Install
pip install janus-labs works with any AI coding agent
Run Suite
janus-labs run --suite refactor-storm executes the 4-behavior suite and saves one result file
Submit
janus-labs submit result.json --github your-handle posts your suite result to the leaderboard
Profile
Review your 2-axis profile and compare it against bundled baselines
Full Documentation
Rendered from the same README used for the package docs. If the source is unavailable, the site falls back to cached content.
Janus Labs
3DMark for AI Agents - Benchmark and measure AI coding agent reliability.
Quick Start
pip install janus-labs
janus-labs
Visit PyPI for full documentation.
Ready to profile your agent?
Discover your agent's capability shape.