Model Fallback

Stevora supports automatic model fallback so that when an LLM call fails, the step retries with a different model (potentially from a different provider) instead of failing immediately. This gives your workflows resilience against provider outages, rate limits, and transient errors.

Configuration

Add a fallbackModels array to any LLM step. The array lists models to try in order if the primary model fails:

{
  "type": "llm",
  "name": "analyze-document",
  "model": "claude-sonnet-4-20250514",
  "fallbackModels": ["gpt-4o", "gpt-4o-mini"],
  "messages": [
    { "role": "user", "content": "Analyze this document: {{state.document}}" }
  ],
  "responseFormat": "json"
}

In this example, Stevora tries:

claude-sonnet-4-20250514 (Anthropic) -- primary model
gpt-4o (OpenAI) -- first fallback
gpt-4o-mini (OpenAI) -- second fallback

The fallbackModels field is optional. If omitted, the step only tries the primary model and fails if that call fails.

How failover works

The execution engine builds a combined list of [model, ...fallbackModels] and iterates through it. For each model in the list:

Resolve the provider -- Stevora looks up the provider using the model name prefix (e.g., claude- maps to Anthropic, gpt- maps to OpenAI). If no provider is registered for that model, it skips to the next.
Attempt the LLM call -- the full step execution runs, including the tool calling loop and guardrail checks.
On success -- the result is returned immediately. Remaining fallback models are not tried.
On failure -- the error is logged, a failed llmCall record is saved to the database, and the next model in the list is tried.

// From llm-step-handler.ts
const modelsToTry = [step.model, ...(step.fallbackModels ?? [])];

for (let attempt = 0; attempt < modelsToTry.length; attempt++) {
  const model = modelsToTry[attempt]!;
  const provider = resolveProvider(model);

  if (!provider) {
    logger.warn({ model }, 'no provider for model, trying next');
    continue;
  }

  try {
    const result = await executeWithModel(ctx, step, stepRunId, model, provider.name, attempt + 1);
    if (result) return result;
  } catch (err) {
    logger.warn({ model, err }, 'LLM call failed, trying fallback');

    await recordLlmCall({
      stepRunId,
      workflowRunId: ctx.workflowRunId,
      workspaceId: ctx.workspaceId,
      provider: provider.name,
      model,
      promptJson: buildMessages(ctx, step),
      attempt: attempt + 1,
      status: 'failed',
      error: { message: err instanceof Error ? err.message : String(err) },
    });
  }
}

If every model in the list fails, the step returns a final error with code LLM_ALL_FAILED and lists all the models that were tried:

{
  "status": "failed",
  "error": {
    "message": "All models failed: claude-sonnet-4-20250514, gpt-4o, gpt-4o-mini",
    "code": "LLM_ALL_FAILED"
  }
}

What triggers failover

Failover is triggered by any uncaught error during the LLM call. Common causes include:

Provider outages -- the API returns a 500 or 503 error
Rate limiting -- the API returns a 429 error
Network errors -- DNS failures, timeouts, connection resets
Authentication errors -- expired or invalid API keys
Provider-specific errors -- model deprecated, context length exceeded

Guardrail failures do not trigger fallback. If a model's output fails guardrail checks, the guardrail failure mode (block, warn, or retry_with_feedback) handles it within the current model attempt. Fallback only activates on infrastructure-level errors.

Cost implications

Each fallback attempt is a separate LLM call that consumes tokens and incurs cost. Stevora tracks every attempt individually:

Each attempt is recorded as a separate llmCall row with its own token counts and cost
The attempt field increments across fallback models (1 for the primary, 2 for the first fallback, etc.)
Failed attempts that error before receiving a response record zero tokens but are still logged

Because fallback models are often cheaper alternatives, a common pattern is to order them from most capable (and expensive) to least capable (and cheapest):

{
  "model": "claude-opus-4-20250514",
  "fallbackModels": ["claude-sonnet-4-20250514", "gpt-4o-mini"]
}

Stevora tracks per-model pricing internally. The models currently in the pricing table and their approximate costs:

Model	Input (per 1M tokens)	Output (per 1M tokens)
`gpt-4o`	$2.50	$10.00
`gpt-4o-mini`	$0.15	$0.60
`gpt-4-turbo`	$10.00	$30.00
`o1`	$15.00	$60.00
`o1-mini`	$3.00	$12.00
`o3-mini`	$1.10	$4.40
`claude-sonnet-4-20250514`	$3.00	$15.00
`claude-haiku-4-5-20251001`	$0.80	$4.00
`claude-opus-4-20250514`	$15.00	$75.00

These costs are computed automatically by computeCostCents and aggregated per workflow run. See the cost tracking documentation for details on monitoring spend.

Cross-provider fallback

Because provider resolution is based on model name prefixes, fallback works seamlessly across providers. You can mix OpenAI and Anthropic models in the same fallback chain:

{
  "model": "gpt-4o",
  "fallbackModels": ["claude-sonnet-4-20250514", "gpt-4o-mini"]
}

The only requirement is that the corresponding API key is set in the environment. If ANTHROPIC_API_KEY is not configured and a Claude model is in the fallback list, Stevora skips it (logs a warning) and moves to the next model.

Fallback with retry policies

Fallback and step-level retry are independent mechanisms. The retry configuration on a step controls retries of the entire step (including all fallback attempts), while fallbackModels controls which models are tried within a single step attempt:

{
  "model": "gpt-4o",
  "fallbackModels": ["claude-sonnet-4-20250514"],
  "retry": {
    "maxAttempts": 3,
    "backoffMs": 1000,
    "backoffMultiplier": 2
  }
}

In this configuration, each retry attempt tries gpt-4o first and then claude-sonnet-4-20250514. If both fail, the step waits for the backoff period and retries the full sequence again, up to 3 times. This gives you up to 6 total LLM calls (2 models times 3 retry attempts) before the step permanently fails.

Model Fallback

Model Fallback

Configuration

How failover works

What triggers failover

Cost implications

Cross-provider fallback

Fallback with retry policies

On this page