Skip to content

The TokenRateLimitPolicy Custom Resource Definition (CRD)

TokenRateLimitPolicy

Field Type Required Description
spec TokenRateLimitPolicySpec Yes The specification for TokenRateLimitPolicy custom resource
status TokenRateLimitPolicyStatus No The status for the custom resource

TokenRateLimitPolicySpec

Field Type Required Description
targetRef LocalPolicyTargetReferenceWithSectionName Yes Reference to a Kubernetes resource that the policy attaches to. For more info
defaults MergeableTokenRateLimitPolicySpec No Default limit definitions. This field is mutually exclusive with the limits field
overrides MergeableTokenRateLimitPolicySpec No Overrides limit definitions. This field is mutually exclusive with the limits field and defaults field. This field is only allowed for policies targeting Gateway in targetRef.kind
limits MapTokenLimit> No Limit definitions. This field is mutually exclusive with the defaults field

LocalPolicyTargetReferenceWithSectionName

Field Type Required Description
LocalPolicyTargetReference LocalPolicyTargetReference Yes Reference to a local policy target.
sectionName SectionName No Section name for further specificity (if needed).

LocalPolicyTargetReference

Field Type Required Description
group Group Yes Group of the target resource.
kind Kind Yes Kind of the target resource.
name ObjectName Yes Name of the target resource.

SectionName

Field Type Required Description
SectionName v1.SectionName (String) Yes SectionName is the name of a section in a Kubernetes resource.
In the following resources, SectionName is interpreted as the following:
Gateway: Listener name
HTTPRoute: HTTPRouteRule name
* Service: Port name

MergeableTokenRateLimitPolicySpec

Field Type Required Description
strategy String No Merge strategy to apply when merging with other policies. Values: atomic (default), merge
limits MapTokenLimit> Yes Map of named token-based rate limit configurations

TokenLimit

Field Type Required Description
rates []Rate No List of rate limit details including limit and window. If not specified, no rate limits are applied for this limit definition
when []WhenPredicate No List of predicates for this limit. Used in combination with top-level predicates
counters []Counter No CEL expressions that define counter keys for rate limiting. If not specified, rate limiting will be applied globally without user-specific tracking

Rate

Field Type Required Description
limit Number Yes Maximum token count allowed for the given window
window Duration Yes Time window for the limit (e.g., "1h", "24h", "1m", "1d")

WhenPredicate

Field Type Required Description
predicate String Yes CEL expression that must evaluate to true for the limit to apply. See Well-known Attributes

Counter

Field Type Required Description
expression String Yes CEL expression that defines the counter key for rate limiting. See Well-known Attributes

TokenRateLimitPolicyStatus

The status object for TokenRateLimitPolicy follows the PolicyStatus pattern from Gateway API.

Field Type Description
observedGeneration Number Generation of the resource that was last reconciled
conditions []Condition Current state of the policy

Condition

Standard Kubernetes condition fields following Gateway API conventions:

Field Type Description
type String Type of condition (e.g., "Accepted", "Enforced")
status String Status of the condition ("True", "False", "Unknown")
observedGeneration Number Generation observed when this condition was last updated
lastTransitionTime Timestamp Last time the condition transitioned from one status to another
reason String Machine-readable reason for the condition's last transition
message String Human-readable message indicating details about the last transition

Token Usage Tracking

TokenRateLimitPolicy automatically tracks token consumption from AI/LLM responses by monitoring the usage.total_tokens field in response bodies. This enables accurate usage-based rate limiting where:

  • Request Phase: The policy evaluates predicates and descriptors during the request
  • Response Phase: The policy extracts actual token usage from the response body
  • Rate Limiting: Limitador receives the actual token count as hits_addend for precise accounting

Supported Response Format

The policy automatically parses token usage from response bodies in the following format:

{
  "usage": {
    "total_tokens": 150,
    "prompt_tokens": 100,
    "completion_tokens": 50
  }
}

This is compatible with OpenAI-style API responses and similar AI/LLM services.

Important: Currently only non-streaming responses are supported (where stream: false or is omitted in the request).

CEL Expression Context

TokenRateLimitPolicy provides access to request attributes through CEL expressions. For a comprehensive list of available attributes, see the Well-known Attributes RFC.

Common attributes include:

Context Available Attributes Example Usage
Request request.method, request.url_path, request.headers request.method == "POST"
Authentication auth.identity.*, request.auth.claims.* auth.identity.userid, request.auth.claims["tier"]
Request Body requestBodyJSON(path) requestBodyJSON("model")
Remote Address source.address, source.port source.address

Examples

Basic Token Rate Limiting

apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
  name: basic-token-limit
  namespace: gateway-system
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: ai-gateway
  limits:
    global:
      rates:

      - limit: 100000
        window: 1h

User-Based Token Limiting

apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
  name: user-token-limits
  namespace: gateway-system
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: api-gateway
  limits:
    free:
      rates:

      - limit: 50000
        window: 24h
      when:
      - predicate: request.path == "/v1/chat/completions"
      - predicate: 'auth.identity.groups.split(",").exists(g, g == "free")'
      counters:
      - expression: auth.identity.userid
    gold:
      rates:
      - limit: 200000
        window: 24h
      when:
      - predicate: request.path == "/v1/chat/completions"
      - predicate: 'auth.identity.groups.split(",").exists(g, g == "gold")'
      counters:
      - expression: auth.identity.userid

Model-Specific Limiting

apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
  name: model-limits
  namespace: gateway-system
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: ai-api
  limits:
    gpt-4:
      rates:

      - limit: 100000
        window: 24h
      when:
      - predicate: request.path == "/v1/chat/completions"
      - predicate: 'requestBodyJSON("model") == "gpt-4"'
      counters:
      - expression: auth.identity.userid

    gpt-3:
      rates:

      - limit: 500000
        window: 24h
      when:
      - predicate: request.path == "/v1/chat/completions"
      - predicate: 'requestBodyJSON("model") == "gpt-3.5-turbo"'
      counters:
      - expression: auth.identity.userid

Gateway Overrides

apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
  name: org-wide-limits
  namespace: gateway-system
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: api-gateway
  overrides:
    strategy: atomic
    limits:
      org-quota:
        rates:

        - limit: 1000000
          window: 24h
        counters:
        - expression: auth.identity.org_id

See Also