Skip to content

Specification Details

The core specification (v1/spec.yaml) defines the standard vocabulary that all provider manifests and runtimes share.

These parameters have consistent meaning across all providers:

ParameterTypeDescription
temperaturefloatRandomness control (0.0 – 2.0)
max_tokensintegerMaximum response tokens
top_pfloatNucleus sampling threshold
streambooleanEnable streaming response
stopstring[]Stop sequences
toolsobject[]Tool/function definitions
tool_choicestring/objectTool selection mode
response_formatobjectStructured output format

Provider manifests map these standard names to provider-specific parameter names. For example, OpenAI uses max_completion_tokens while Anthropic uses max_tokens.

The specification defines unified streaming event types that runtimes emit:

EventDescription
PartialContentDeltaText content fragment
ThinkingDeltaReasoning/thinking block (extended thinking models)
ToolCallStartedFunction/tool invocation begins
PartialToolCallTool call argument streaming
ToolCallEndedTool invocation complete
StreamEndResponse stream complete
StreamErrorStream-level error
MetadataUsage statistics, model info

Provider manifests declare JSONPath-based rules that map provider-specific events to these standard types.

V2 defines 13 standardized error codes. Provider-specific errors are mapped to these codes for consistent handling across runtimes:

CodeNameCategoryRetryableFallbackable
E1001invalid_requestClientNoNo
E1002authenticationClientNoYes
E1003permission_deniedClientNoNo
E1004not_foundClientNoNo
E1005request_too_largeClientNoNo
E2001rate_limitedRateYesYes
E2002quota_exhaustedRateNoYes
E3001server_errorServerYesYes
E3002overloadedServerYesYes
E3003timeoutServerYesYes
E4001conflictOperationalYesNo
E4002cancelledOperationalNoNo
E9999unknownUnknownNoNo
  • Retryable — Runtimes may retry the request (with backoff) for transient failures
  • Fallbackable — Runtimes may try an alternative provider or model in a fallback chain

The spec defines standard retry strategies:

retry_policy:
strategy: "exponential_backoff"
max_retries: 3
initial_delay_ms: 1000
max_delay_ms: 30000
backoff_multiplier: 2.0
retryable_errors:
- "rate_limited"
- "overloaded"
- "server_error"
- "timeout"

Normalized finish reasons for response completion:

ReasonDescription
end_turnNatural completion
max_tokensToken limit reached
tool_useModel wants to call a tool
stop_sequenceStop sequence encountered
content_filterFiltered by content policy

Providers are categorized into API families to prevent request/response format confusion:

  • openai — OpenAI-compatible APIs (also used by Groq, Together, DeepSeek, etc.)
  • anthropic — Anthropic Messages API
  • gemini — Google Gemini API
  • custom — Provider-specific format