The TokenRateLimitPolicy Custom Resource Definition (CRD)
TokenRateLimitPolicy
TokenRateLimitPolicySpec
Field
Type
Required
Description
targetRef
LocalPolicyTargetReferenceWithSectionName
Yes
Reference to a Kubernetes resource that the policy attaches to. For more info
defaults
MergeableTokenRateLimitPolicySpec
No
Default limit definitions. This field is mutually exclusive with the limits
field
overrides
MergeableTokenRateLimitPolicySpec
No
Overrides limit definitions. This field is mutually exclusive with the limits
field and defaults
field. This field is only allowed for policies targeting Gateway
in targetRef.kind
limits
MapTokenLimit>
No
Limit definitions. This field is mutually exclusive with the defaults
field
LocalPolicyTargetReferenceWithSectionName
Field
Type
Required
Description
LocalPolicyTargetReference
LocalPolicyTargetReference
Yes
Reference to a local policy target.
sectionName
SectionName
No
Section name for further specificity (if needed).
LocalPolicyTargetReference
Field
Type
Required
Description
group
Group
Yes
Group of the target resource.
kind
Kind
Yes
Kind of the target resource.
name
ObjectName
Yes
Name of the target resource.
SectionName
Field
Type
Required
Description
SectionName
v1.SectionName (String)
Yes
SectionName is the name of a section in a Kubernetes resource. In the following resources, SectionName is interpreted as the following: Gateway: Listener name HTTPRoute: HTTPRouteRule name * Service: Port name
MergeableTokenRateLimitPolicySpec
Field
Type
Required
Description
strategy
String
No
Merge strategy to apply when merging with other policies. Values: atomic
(default), merge
limits
MapTokenLimit>
Yes
Map of named token-based rate limit configurations
TokenLimit
Field
Type
Required
Description
rates
[]Rate
No
List of rate limit details including limit and window. If not specified, no rate limits are applied for this limit definition
when
[]WhenPredicate
No
List of predicates for this limit. Used in combination with top-level predicates
counters
[]Counter
No
CEL expressions that define counter keys for rate limiting. If not specified, rate limiting will be applied globally without user-specific tracking
Rate
Field
Type
Required
Description
limit
Number
Yes
Maximum token count allowed for the given window
window
Duration
Yes
Time window for the limit (e.g., "1h", "24h", "1m", "1d")
WhenPredicate
Field
Type
Required
Description
predicate
String
Yes
CEL expression that must evaluate to true for the limit to apply. See Well-known Attributes
Counter
Field
Type
Required
Description
expression
String
Yes
CEL expression that defines the counter key for rate limiting. See Well-known Attributes
TokenRateLimitPolicyStatus
The status object for TokenRateLimitPolicy follows the PolicyStatus pattern from Gateway API.
Field
Type
Description
observedGeneration
Number
Generation of the resource that was last reconciled
conditions
[]Condition
Current state of the policy
Condition
Standard Kubernetes condition fields following Gateway API conventions:
Field
Type
Description
type
String
Type of condition (e.g., "Accepted", "Enforced")
status
String
Status of the condition ("True", "False", "Unknown")
observedGeneration
Number
Generation observed when this condition was last updated
lastTransitionTime
Timestamp
Last time the condition transitioned from one status to another
reason
String
Machine-readable reason for the condition's last transition
message
String
Human-readable message indicating details about the last transition
Token Usage Tracking
TokenRateLimitPolicy automatically tracks token consumption from AI/LLM responses by monitoring the usage.total_tokens
field in response bodies. This enables accurate usage-based rate limiting where:
Request Phase : The policy evaluates predicates and descriptors during the request
Response Phase : The policy extracts actual token usage from the response body
Rate Limiting : Limitador receives the actual token count as hits_addend
for precise accounting
The policy automatically parses token usage from response bodies in the following format:
{
"usage" : {
"total_tokens" : 150 ,
"prompt_tokens" : 100 ,
"completion_tokens" : 50
}
}
This is compatible with OpenAI-style API responses and similar AI/LLM services.
Important : Currently only non-streaming responses are supported (where stream: false
or is omitted in the request).
CEL Expression Context
TokenRateLimitPolicy provides access to request attributes through CEL expressions. For a comprehensive list of available attributes, see the Well-known Attributes RFC .
Common attributes include:
Context
Available Attributes
Example Usage
Request
request.method
, request.url_path
, request.headers
request.method == "POST"
Authentication
auth.identity.*
, request.auth.claims.*
auth.identity.userid
, request.auth.claims["tier"]
Request Body
requestBodyJSON(path)
requestBodyJSON("model")
Remote Address
source.address
, source.port
source.address
Examples
Basic Token Rate Limiting
apiVersion : kuadrant.io/v1alpha1
kind : TokenRateLimitPolicy
metadata :
name : basic-token-limit
namespace : gateway-system
spec :
targetRef :
group : gateway.networking.k8s.io
kind : Gateway
name : ai-gateway
limits :
global :
rates :
- limit : 100000
window : 1h
User-Based Token Limiting
apiVersion : kuadrant.io/v1alpha1
kind : TokenRateLimitPolicy
metadata :
name : user-token-limits
namespace : gateway-system
spec :
targetRef :
group : gateway.networking.k8s.io
kind : Gateway
name : api-gateway
limits :
free :
rates :
- limit : 50000
window : 24h
when :
- predicate : request.path == "/v1/chat/completions"
- predicate : 'auth.identity.groups.split(",").exists(g, g == "free")'
counters :
- expression : auth.identity.userid
gold :
rates :
- limit : 200000
window : 24h
when :
- predicate : request.path == "/v1/chat/completions"
- predicate : 'auth.identity.groups.split(",").exists(g, g == "gold")'
counters :
- expression : auth.identity.userid
Model-Specific Limiting
apiVersion : kuadrant.io/v1alpha1
kind : TokenRateLimitPolicy
metadata :
name : model-limits
namespace : gateway-system
spec :
targetRef :
group : gateway.networking.k8s.io
kind : HTTPRoute
name : ai-api
limits :
gpt-4 :
rates :
- limit : 100000
window : 24h
when :
- predicate : request.path == "/v1/chat/completions"
- predicate : 'requestBodyJSON("model") == "gpt-4"'
counters :
- expression : auth.identity.userid
gpt-3 :
rates :
- limit : 500000
window : 24h
when :
- predicate : request.path == "/v1/chat/completions"
- predicate : 'requestBodyJSON("model") == "gpt-3.5-turbo"'
counters :
- expression : auth.identity.userid
Gateway Overrides
apiVersion : kuadrant.io/v1alpha1
kind : TokenRateLimitPolicy
metadata :
name : org-wide-limits
namespace : gateway-system
spec :
targetRef :
group : gateway.networking.k8s.io
kind : Gateway
name : api-gateway
overrides :
strategy : atomic
limits :
org-quota :
rates :
- limit : 1000000
window : 24h
counters :
- expression : auth.identity.org_id
See Also