flightline run
Run your LLM prompts against generated test data and validate the outputs.Usage
What It Does
Therun command:
- Loads synthetic test data from your configured directory
- Runs your prompt against each test case
- Applies the Fact-Checker to validate outputs
- Reports pass/fail results
Example
Validation Checks
The Fact-Checker applies validation checks including:Numerical Consistency
Verifies that numbers in the LLM output match the source data. This catches hallucinated numbers before they reach production.Safety Guardrails
Ensures safety-critical responses trigger correctly. For example, a low credit score should result in a rejection, not an approval.CI/CD Integration
Add Flightline to your CI pipeline to catch regressions before merge. When evaluations fail, the pipeline fails, blocking bad prompts from reaching production.Next: The Mimic Command
Generate synthetic data from existing sample files.
