flightline check
check is the most powerful command in the Flightline toolkit. It serves as the definitive “ship-readiness gate” for your AI features, answering the critical questions that determine if your code is ready for production.
Usage
The 7 Ship-Blocking Questions
When you run a check, Flightline’s intelligence layer analyzes your traces to answer these seven questions:- Task Completion: Does it do the right thing?
- Grounding: Is it truthful and based on the provided context?
- Hallucination: Did it make up facts or ignore constraints?
- Rule Compliance: Did it follow your specific business rules?
- Safety: Did it avoid producing harmful or biased content?
- Consistency: Is it producing stable results across similar inputs?
- Quality: Is the output tone and formatting up to your standards?
How it Works: The Intelligence Layer
Thecheck command runs a two-tier analysis:
Tier 1: Deterministic (Local)
Fast, high-confidence checks that run on your machine.- Format Validation: Ensures JSON is valid and matches expected schemas.
- PII Detection: Scans for accidental leakage of sensitive data.
- Pattern Matching: Checks for required fields and specific keywords.
Tier 2: Reasoning (Cloud/LLM)
Semantic analysis that understands context.- Semantic Comparison: Checks if the meaning of the output matches the intent.
- Rubric Grading: Evaluates the output against qualitative criteria you define.
- Risk Assessment: Identifies potential failure modes and hallucinations.
Key Options
| Option | Description |
|---|---|
--traces | Path to the directory of traces to evaluate. |
--config, -c | Path to your evaluation spec (default: flightline.eval.yaml). |
--offline | Run Tier 1 deterministic checks only. |
--verbose, -v | Show the reasoning behind every pass/fail decision. |
Exit Codes
flightline check is designed for use in CI/CD pipelines. It uses standardized exit codes:
0: PASS (Ship it)1: WARN (Review recommended)2: BLOCK (Critical failures detected)
Example
Deep System Insights
Beyond the pass/fail verdict,check provides a “Profile” view of your system health, including latency anatomy, token efficiency, and model fit recommendations.
Intelligence Layer
Learn more about the dual-tier architecture behind Flightline checks.
