Skip to main content

flightline run

Run your LLM prompts against generated test data and validate the outputs.

Usage

flightline run

What It Does

The run command:
  1. Loads synthetic test data from your configured directory
  2. Runs your prompt against each test case
  3. Applies the Fact-Checker to validate outputs
  4. Reports pass/fail results

Example

$ flightline run

> Running 'Financial Summary' prompt against 20 records...
> ❌ FAILURE: Scenario #4 (Negative Income).
> Expected: "Applicant rejected."
> Received: "Applicant approved with $0 income."

Validation Checks

The Fact-Checker applies validation checks including:

Numerical Consistency

Verifies that numbers in the LLM output match the source data. This catches hallucinated numbers before they reach production.

Safety Guardrails

Ensures safety-critical responses trigger correctly. For example, a low credit score should result in a rejection, not an approval.

CI/CD Integration

Add Flightline to your CI pipeline to catch regressions before merge. When evaluations fail, the pipeline fails, blocking bad prompts from reaching production.

Next: The Mimic Command

Generate synthetic data from existing sample files.