Stevora

LLM Step

Orchestrate LLM calls with tools, guardrails, and fallbacks

The LLM step calls a language model with full support for tool use loops, guardrail checks, model fallbacks, structured outputs, and template variables. It is the core building block for AI agent workflows in Stevora.

Schema

FieldTypeRequiredDefaultDescription
type"llm"YesStep type discriminator
namestring (1-100 chars)YesUnique step name within the workflow
modelstringYesPrimary model identifier (e.g., gpt-4o, claude-sonnet-4-20250514)
fallbackModelsstring[]NoModels to try if the primary model fails
systemPromptstringNoSystem message prepended to the conversation
messagesLlmMessage[]YesConversation messages (min 1)
temperaturenumberNoSampling temperature (0 -- 2)
maxTokensnumberNoMaximum tokens in the response
responseFormat"json" | "text"NoExpected response format
outputSchemaRecord<string, unknown>NoJSON Schema the LLM output must conform to
toolsLlmToolDef[]NoTools the model can call
guardrailsGuardrailConfigNoPost-completion validation checks
maxToolRoundsnumberNo10Maximum tool call round-trips (1 -- 20)
retryRetryPolicyNoRetry policy for failed executions

LlmMessage

FieldTypeRequiredDescription
role"system" | "user" | "assistant"YesMessage role
contentstringYesMessage content

LlmToolDef

FieldTypeRequiredDescription
namestringYesTool function name
descriptionstringYesDescription shown to the model
parametersRecord<string, unknown>YesJSON Schema for tool parameters

GuardrailConfig

FieldTypeRequiredDefaultDescription
postChecksGuardrailCheck[]No[]Checks to run after each LLM response
onFailure"block" | "warn" | "retry_with_feedback"No"warn"Action when a check fails

GuardrailCheck

FieldTypeRequiredDescription
type"content_safety" | "schema_validation" | "confidence_threshold"YesCheck type
configRecord<string, unknown>NoCheck-specific config

Configuration Example

{
  "type": "llm",
  "name": "generate-outreach-email",
  "model": "gpt-4o",
  "fallbackModels": ["claude-sonnet-4-20250514"],
  "systemPrompt": "You are a sales development representative writing personalized outreach emails.",
  "messages": [
    {
      "role": "user",
      "content": "Write a cold outreach email to {{state.enrichedLead.name}} at {{state.enrichedLead.company}}. Their role is {{state.enrichedLead.title}}. Our product helps with {{input.valueProposition}}."
    }
  ],
  "temperature": 0.7,
  "maxTokens": 1000,
  "responseFormat": "json",
  "outputSchema": {
    "type": "object",
    "properties": {
      "subject": { "type": "string" },
      "body": { "type": "string" }
    },
    "required": ["subject", "body"]
  },
  "guardrails": {
    "postChecks": [
      { "type": "content_safety" },
      { "type": "schema_validation" }
    ],
    "onFailure": "retry_with_feedback"
  },
  "maxToolRounds": 5,
  "retry": {
    "maxAttempts": 3,
    "backoffMs": 2000,
    "backoffMultiplier": 2
  }
}

Template Variables

Message content and the system prompt support {{variable}} template syntax. Variables are resolved from the workflow context at execution time.

PrefixSourceExample
state.workflowState{{state.enrichedLead.name}}
input.workflowInput{{input.companyName}}

The template engine uses dot-notation traversal. If a path resolves to undefined or null, it is replaced with an empty string. Unrecognized prefixes are left as-is in the output.

// Template:
"Summarize this for {{state.recipientName}}: {{state.rawContent}}"

// With workflowState = { recipientName: "Alice", rawContent: "..." }
// Resolves to:
"Summarize this for Alice: ..."

Tool Use Loop

When tools are defined, the LLM step supports multi-round tool calling:

  1. The step sends the messages and tool definitions to the model.
  2. If the model responds with tool calls, Stevora executes them and appends the results as new messages.
  3. The updated conversation is sent back to the model for another round.
  4. This loop continues until the model responds without tool calls, or maxToolRounds is reached.

If the model exceeds maxToolRounds without producing a final response, the step fails with error code MAX_TOOL_ROUNDS.

{
  "type": "llm",
  "name": "research-company",
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Research {{state.companyName}} and summarize their recent funding, team size, and tech stack."
    }
  ],
  "tools": [
    {
      "name": "search_web",
      "description": "Search the web for information",
      "parameters": {
        "type": "object",
        "properties": {
          "query": { "type": "string" }
        },
        "required": ["query"]
      }
    },
    {
      "name": "fetch_page",
      "description": "Fetch the content of a web page",
      "parameters": {
        "type": "object",
        "properties": {
          "url": { "type": "string" }
        },
        "required": ["url"]
      }
    }
  ],
  "maxToolRounds": 8,
  "responseFormat": "json"
}

Guardrails

Guardrails validate the LLM output before the step is marked complete. Three check types are available:

Check TypeDescription
content_safetyChecks the response for harmful or unsafe content
schema_validationValidates the response against the outputSchema
confidence_thresholdChecks if the model's response meets a confidence threshold

Failure modes

The onFailure field controls what happens when a guardrail check fails:

  • warn (default) -- Log the failure but allow the step to complete. The output is used as-is.
  • block -- Fail the step immediately with error code GUARDRAIL_BLOCKED.
  • retry_with_feedback -- Send the guardrail failure details back to the model as a user message and ask it to fix the response. Up to 2 guardrail retries are attempted before the step fails with GUARDRAIL_RETRIES_EXHAUSTED.

Model Fallbacks

When fallbackModels is set, the step tries models in order:

  1. Try the primary model.
  2. If it fails (API error, timeout, etc.), try the first fallback.
  3. Continue through fallbackModels until one succeeds.
  4. If all models fail, the step fails with error code LLM_ALL_FAILED.

Each attempt -- successful or failed -- is recorded as an LLM call for cost tracking.

{
  "model": "gpt-4o",
  "fallbackModels": ["claude-sonnet-4-20250514", "gpt-4o-mini"]
}

Output Format

The completed step output includes the model response plus LLM metadata:

{
  "subject": "Quick question about your infrastructure",
  "body": "Hi Alice, I noticed...",
  "_llm": {
    "model": "gpt-4o",
    "inputTokens": 342,
    "outputTokens": 187
  }
}

When responseFormat is "json", the engine parses the response content as JSON (handling markdown code fences that models sometimes add). The parsed fields become top-level keys in the output and are also written to stateUpdates so subsequent steps can access them from workflowState.

When responseFormat is "text" (or unset), the output is { content: "..." }.