Model Fallback
Automatic failover between LLM providers
Model Fallback
Stevora supports automatic model fallback so that when an LLM call fails, the step retries with a different model (potentially from a different provider) instead of failing immediately. This gives your workflows resilience against provider outages, rate limits, and transient errors.
Configuration
Add a fallbackModels array to any LLM step. The array lists models to try in order if the primary model fails:
{
"type": "llm",
"name": "analyze-document",
"model": "claude-sonnet-4-20250514",
"fallbackModels": ["gpt-4o", "gpt-4o-mini"],
"messages": [
{ "role": "user", "content": "Analyze this document: {{state.document}}" }
],
"responseFormat": "json"
}In this example, Stevora tries:
claude-sonnet-4-20250514(Anthropic) -- primary modelgpt-4o(OpenAI) -- first fallbackgpt-4o-mini(OpenAI) -- second fallback
The fallbackModels field is optional. If omitted, the step only tries the primary model and fails if that call fails.
How failover works
The execution engine builds a combined list of [model, ...fallbackModels] and iterates through it. For each model in the list:
-
Resolve the provider -- Stevora looks up the provider using the model name prefix (e.g.,
claude-maps to Anthropic,gpt-maps to OpenAI). If no provider is registered for that model, it skips to the next. -
Attempt the LLM call -- the full step execution runs, including the tool calling loop and guardrail checks.
-
On success -- the result is returned immediately. Remaining fallback models are not tried.
-
On failure -- the error is logged, a failed
llmCallrecord is saved to the database, and the next model in the list is tried.
// From llm-step-handler.ts
const modelsToTry = [step.model, ...(step.fallbackModels ?? [])];
for (let attempt = 0; attempt < modelsToTry.length; attempt++) {
const model = modelsToTry[attempt]!;
const provider = resolveProvider(model);
if (!provider) {
logger.warn({ model }, 'no provider for model, trying next');
continue;
}
try {
const result = await executeWithModel(ctx, step, stepRunId, model, provider.name, attempt + 1);
if (result) return result;
} catch (err) {
logger.warn({ model, err }, 'LLM call failed, trying fallback');
await recordLlmCall({
stepRunId,
workflowRunId: ctx.workflowRunId,
workspaceId: ctx.workspaceId,
provider: provider.name,
model,
promptJson: buildMessages(ctx, step),
attempt: attempt + 1,
status: 'failed',
error: { message: err instanceof Error ? err.message : String(err) },
});
}
}If every model in the list fails, the step returns a final error with code LLM_ALL_FAILED and lists all the models that were tried:
{
"status": "failed",
"error": {
"message": "All models failed: claude-sonnet-4-20250514, gpt-4o, gpt-4o-mini",
"code": "LLM_ALL_FAILED"
}
}What triggers failover
Failover is triggered by any uncaught error during the LLM call. Common causes include:
- Provider outages -- the API returns a 500 or 503 error
- Rate limiting -- the API returns a 429 error
- Network errors -- DNS failures, timeouts, connection resets
- Authentication errors -- expired or invalid API keys
- Provider-specific errors -- model deprecated, context length exceeded
Guardrail failures do not trigger fallback. If a model's output fails guardrail checks, the guardrail failure mode (block, warn, or retry_with_feedback) handles it within the current model attempt. Fallback only activates on infrastructure-level errors.
Cost implications
Each fallback attempt is a separate LLM call that consumes tokens and incurs cost. Stevora tracks every attempt individually:
- Each attempt is recorded as a separate
llmCallrow with its own token counts and cost - The
attemptfield increments across fallback models (1 for the primary, 2 for the first fallback, etc.) - Failed attempts that error before receiving a response record zero tokens but are still logged
Because fallback models are often cheaper alternatives, a common pattern is to order them from most capable (and expensive) to least capable (and cheapest):
{
"model": "claude-opus-4-20250514",
"fallbackModels": ["claude-sonnet-4-20250514", "gpt-4o-mini"]
}Stevora tracks per-model pricing internally. The models currently in the pricing table and their approximate costs:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
gpt-4o | $2.50 | $10.00 |
gpt-4o-mini | $0.15 | $0.60 |
gpt-4-turbo | $10.00 | $30.00 |
o1 | $15.00 | $60.00 |
o1-mini | $3.00 | $12.00 |
o3-mini | $1.10 | $4.40 |
claude-sonnet-4-20250514 | $3.00 | $15.00 |
claude-haiku-4-5-20251001 | $0.80 | $4.00 |
claude-opus-4-20250514 | $15.00 | $75.00 |
These costs are computed automatically by computeCostCents and aggregated per workflow run. See the cost tracking documentation for details on monitoring spend.
Cross-provider fallback
Because provider resolution is based on model name prefixes, fallback works seamlessly across providers. You can mix OpenAI and Anthropic models in the same fallback chain:
{
"model": "gpt-4o",
"fallbackModels": ["claude-sonnet-4-20250514", "gpt-4o-mini"]
}The only requirement is that the corresponding API key is set in the environment. If ANTHROPIC_API_KEY is not configured and a Claude model is in the fallback list, Stevora skips it (logs a warning) and moves to the next model.
Fallback with retry policies
Fallback and step-level retry are independent mechanisms. The retry configuration on a step controls retries of the entire step (including all fallback attempts), while fallbackModels controls which models are tried within a single step attempt:
{
"model": "gpt-4o",
"fallbackModels": ["claude-sonnet-4-20250514"],
"retry": {
"maxAttempts": 3,
"backoffMs": 1000,
"backoffMultiplier": 2
}
}In this configuration, each retry attempt tries gpt-4o first and then claude-sonnet-4-20250514. If both fail, the step waits for the backoff period and retries the full sequence again, up to 3 times. This gives you up to 6 total LLM calls (2 models times 3 retry attempts) before the step permanently fails.