Skip to main content

flightline generate

The generate command mass-produces realistic test data and scenarios. It ensures you have a comprehensive set of inputs to test your AI system, covering everything from happy paths to hostile edge cases.

Usage

# Generate scenarios directly from discovery (Default)
flightline generate --from-discover flightline.discovery.json --count 20

# Generate from a learned profile (Refinement)
flightline generate --profile profile.json --count 100

Generation Modes

1. From Discovery (Cold-Start)

This is the primary way to get started. It analyzes your code and prompts to infer what the inputs should look like.
  • Goal: Quickly bootstrap a test suite for a new feature.
  • Coverage: Uses “Scenario Space” targeting to ensure scenarios cover different complexity and risk dimensions based on your implementation.

2. From Profile (Refinement)

Used when you have existing sample data and want to enhance the realism of your scenarios. It replicates the patterns and business logic found during the learn step.
  • Goal: Create large, statistically accurate datasets or refine scenarios with real-world domain nuances.
  • Privacy: Automatically generates fake values that look realistic but are entirely synthetic.

Key Options

OptionDescription
--from-discoverPath to your flightline.discovery.json.
--count, -nNumber of records or scenarios to generate.
--model, -mThe model used for synthesis (default: google/gemini-3-flash-preview).
--output, -oCustom path for the generated file.

Refinement with learn

If your generated scenarios feel too generic, you can use real data samples to ground them. Run flightline learn on a sample file first, then point generate at the resulting profile:
flightline learn my_samples.json -o profile.json
flightline generate --profile profile.json --count 50

Scenario Categories

When generating from discovery, Flightline targets specific categories:
  • Happy Path: Standard, well-formatted inputs.
  • Edge Case: Boundary conditions and unusual but valid inputs.
  • Failure Mode: Inputs designed to trigger known AI failure modes in your domain.
  • Adversarial: Hostile or nonsensical inputs to test system robustness.

Next Steps

With your test data ready, the next step is to capture how your system handles these inputs.

Capture Traces

Use fltrace to record real LLM execution data.