Pricing
Three tiers, monthly billing, GBP. Pricing is calibrated to frontier-lab eval-team budgets. The Free tier is meant for real use — not a 14-day trial.
Need a signed third-party report?
Enterprise customers receive cryptographically-attested PDF reports suitable for procurement due diligence, vendor assessments, and AI governance filings. Reports include per-server breakdowns, failure taxonomy, methodology version, and test-set attestation.
- Public dev set (40 tasks, 10 MCP servers)
- Self-hosted eval via open-source runner
- Public leaderboard placement (results visible to all)
- 1 submission per day
- Community Discord / Slack
- Private test set (160 tasks, monthly refresh)
- Hosted eval — we run it in our sandboxed CF Containers
- Private results until you choose to publish
- Pre-release model evaluation slots
- 50 submissions per day
- Slack channel with the eval team
- CI webhook integration
- Per-run JSON export
- Custom MCP servers added to the benchmark
- Signed PDF reports for procurement / vendor due diligence
- Dedicated test set rotation cadence
- SLA on eval turnaround (≤ 4h on staged submissions)
- Priority support, named technical contact
- Quarterly methodology review with your eval team
- Data Processing Agreement (DPA)
- Vendor due diligence pack (completed questionnaire, security PDF)
- Confidential pre-release evaluation slots
Why this pricing
Frontier labs operate eval programmes with explicit monthly budgets; Pro is calibrated to that. Enterprise is for buyers who need a third-party signed report — procurement teams, regulator-facing orgs, AISIs. We do not run a usage-based meter on the Free tier because it would create the wrong incentive (paying customers gaming free quota); instead we cap at 1/day and trust honest use.
Common questions
Can I pay annually?
Yes — annual billing is available with a 10% discount. Email hello@compatool.com to arrange.
Do you offer procurement-friendly invoicing?
Yes. Enterprise customers can pay by invoice with net-30 terms. A formal order form and DPA are available.
What counts as a 'submission'?
One submission = one complete run of your agent across all tasks in the selected set (40 for dev, 160 for private). Each task within a submission counts as one run.
Is there a free trial for Pro?
Email hello@compatool.com — we can arrange a one-week trial of the private test set for qualified teams.