Guardrails

Guardrails are post-completion checks that validate LLM outputs before they propagate to the rest of your workflow. Stevora runs these checks after the model produces its final response (after tool calling completes, if applicable) and takes action based on the results.

Configuration

Add a guardrails object to any LLM step. It takes two fields:

postChecks -- an array of checks to run against the model's response
onFailure -- what to do when one or more checks fail

{
  "type": "llm",
  "name": "generate-report",
  "model": "gpt-4o",
  "messages": [
    { "role": "user", "content": "Generate a risk report for {{state.portfolio}}" }
  ],
  "responseFormat": "json",
  "outputSchema": {
    "riskScore": { "type": "number" },
    "summary": { "type": "string" },
    "recommendations": { "type": "array" }
  },
  "guardrails": {
    "postChecks": [
      { "type": "schema_validation" },
      { "type": "content_safety" },
      { "type": "confidence_threshold" }
    ],
    "onFailure": "retry_with_feedback"
  }
}

The schema is validated by guardrailConfigSchema in llm.schema.ts:

const guardrailConfigSchema = z.object({
  postChecks: z.array(guardrailCheckSchema).default([]),
  onFailure: z.enum(['block', 'warn', 'retry_with_feedback']).default('warn'),
});

Check types

schema_validation

Validates that the model's response is valid JSON and contains all required fields defined in the step's outputSchema. This check runs in two stages:

JSON parsing -- the response is extracted and parsed as JSON. If parsing fails, the check fails with "Response is not valid JSON".
Field presence -- every top-level key in outputSchema must be present in the parsed output. Missing fields are reported by name.

{
  "type": "schema_validation"
}

If no outputSchema is defined on the step, this check is skipped automatically.

This check uses Stevora's extractJson helper which handles common LLM response quirks: clean JSON, JSON wrapped in code fences, and JSON embedded in explanatory text.

content_safety

Checks that the model returned a non-empty response. In the current implementation this ensures the model did not return a blank or whitespace-only string, which is a common failure mode when a model refuses a request silently.

{
  "type": "content_safety"
}

This check can be extended with a custom content moderation service (e.g., the OpenAI Moderation API) by configuring the config field on the check.

confidence_threshold

Checks whether the model's output was truncated. If the LLM response has a finishReason of "length" (meaning the output was cut off by the token limit rather than completing naturally), this check fails.

{
  "type": "confidence_threshold"
}

A truncated response usually means maxTokens was set too low for the task, or the model entered a verbose loop. This check catches those cases before the partial output reaches downstream steps.

Failure modes

The onFailure field controls what happens when any check fails. There are three modes:

`warn`

Log the failure and continue. The step completes successfully despite the guardrail violation. Use this during development or for non-critical checks where you want visibility without blocking.

{
  "onFailure": "warn"
}

`block`

Immediately fail the step. The step returns with status "failed", error code GUARDRAIL_BLOCKED, and the details of which checks failed. The workflow will not proceed past this step (unless retry policies re-run it).

{
  "onFailure": "block"
}

`retry_with_feedback`

Re-prompt the model with the failure details and ask it to fix its response. This is the most powerful mode -- Stevora constructs a new message that includes the model's previous response and a description of what went wrong:

Your previous response failed validation:
- schema_validation: Missing required fields: recommendations
Please fix and try again.

The model then gets another chance to produce a valid response. Stevora retries up to 2 times (controlled by MAX_GUARDRAIL_RETRIES in llm-step-handler.ts). If the model still fails after all retries, the step fails with error code GUARDRAIL_RETRIES_EXHAUSTED.

{
  "onFailure": "retry_with_feedback"
}

Each retry is a full LLM call that is recorded in the cost tracker with an incremented attempt number, so guardrail retries are reflected in your cost reporting.

How checks are executed

All post-checks run synchronously in order after the final LLM response is received. The runPostChecks function maps each check to its handler:

export function runPostChecks(
  checks: GuardrailCheck[],
  response: LlmResponse,
  outputSchema?: Record<string, unknown>,
): GuardrailResult[] {
  return checks.map((check) => runSingleCheck(check, response, outputSchema));
}

Each check returns a GuardrailResult:

interface GuardrailResult {
  passed: boolean;
  checkType: string;
  message: string;
  details?: unknown;
}

If any result has passed: false, the failure mode kicks in. Multiple checks can fail at the same time -- all failures are reported together in the feedback message or error details.

Example: strict JSON output

A common pattern is to enforce structured output from the model. Combine responseFormat: "json", an outputSchema, and the schema_validation guardrail:

{
  "type": "llm",
  "name": "extract-entities",
  "model": "claude-sonnet-4-20250514",
  "systemPrompt": "Extract entities from the text. Return JSON with 'people', 'companies', and 'locations' arrays.",
  "messages": [
    { "role": "user", "content": "{{state.document}}" }
  ],
  "responseFormat": "json",
  "outputSchema": {
    "people": { "type": "array" },
    "companies": { "type": "array" },
    "locations": { "type": "array" }
  },
  "guardrails": {
    "postChecks": [
      { "type": "schema_validation" },
      { "type": "content_safety" }
    ],
    "onFailure": "retry_with_feedback"
  }
}

If the model omits one of the required arrays, Stevora tells it exactly which fields are missing and asks it to try again. This gives you reliable structured output without writing custom parsing logic.

Guardrails

On this page