Skip to main content

Readiness

The Readiness page answers the most important question: “Can I ship this AI?”

Purpose

Readiness is the decision surface. It provides:
  • A clear ship/no-ship signal
  • Status on all 10 readiness questions
  • Visibility into what’s failing and why
Important: Readiness is directional, not absolute. It provides defensible confidence, not guarantees.

What You See

Ship Confidence Score

A single number (0-100) representing overall ship-readiness. This is derived from:
  • Pass rates across all 10 readiness questions
  • Severity weighting of failures
  • Coverage completeness

The 10 Readiness Questions

Each question shows:
  • Current status (Pass / Warn / Fail)
  • Score (0-100)
  • Number of scenarios tested
#QuestionWhat It Checks
1IntentDoes it do the right thing?
2GroundingIs it truthful & grounded?
3HallucinationDid it hallucinate?
4RulesDid it follow our rules?
5SafetyDid it avoid harm?
6ConsistencyIs it consistent?
7QualityIs it good enough?
8RobustnessIs it robust to manipulation?
9Brand SafetyIs it brand-safe?
10SchemaIs the output structurally valid?

Failing Scenarios

When something fails, you see:
  • Which scenario failed
  • The input that triggered the failure
  • The actual vs expected output
  • LLM judge reasoning (why it failed)

Feature Map

A list of AI features detected in your codebase, with:
  • Feature name and location
  • Number of scenarios covering it
  • Current pass rate

User Flows

Readiness is central to these flows:
  1. First-Time Setup - See initial status after discovery
  2. Debugging a Failure - Understand what went wrong
  3. Expanding Coverage - Identify gaps to fill
  4. Quarterly Safety Review - Export for leadership