Skip to content

RLP can target a Gateway resource

Previous version: https://hackmd.io/IKEYD6NrSzuGQG1nVhwbcw

Based on: https://hackmd.io/_1k6eLCNR2eb9RoSzOZetg

Introduction

The current RateLimitPolicy CRD already implements a targetRef with a reference to Gateway API's HTTPRoute. This doc captures the design and some implementation details of allowing the targetRef to reference a Gateway API's Gateway.

Having in place this HTTPRoute - Gateway hierarchy, we are also considering to apply Policy Attachment's defaults/overrides approach to the RateLimitPolicy CRD. But for now, it will only be about targeting the Gateway resource.

On designing Kuadrant's rate limiting and considering Istio/Envoy's rate limiting offering, we hit two limitations (described here). Therefore, not giving up entirely in existing Envoy's RateLimit Filter, we decided to move on and leverage the Envoy's Wasm Network Filter and implement rate limiting wasm-shim module compliant with the Envoy's Rate Limit Service (RLS). This wasm-shim module accepts a PluginConfig struct object as input configuration object.

Use Cases targeting a gateway

A key use case is being able to provide governance over what service providers can and cannot do when exposing a service via a shared ingress gateway. As well as providing certainty that no service is exposed without my ability as a cluster administrator to protect my infrastructure from unplanned load from badly behaving clients etc.

Goals

The goal of this document is to define:

  • The schema of this PluginConfig struct.
  • The kuadrant-operator behavior filling the PluginConfig struct having as input the RateLimitPolicy k8s objects
  • The behavior of the wasm-shim having the PluginConfig struct as input.

Envoy's Rate Limit Service Protocol

Kuadrant's rate limit relies on the Rate Limit Service (RLS) protocol, hence the gateway generates, based on a set of actions, a set of descriptors (one descriptor is a set of descriptor entries). Those descriptors are send to the external rate limit service provider. When multiple descriptors are provided, the external service provider will limit on ALL of them and return an OVER_LIMIT response if any of them are over limit.

Schema (CRD) of the RateLimitPolicy

---
apiVersion: kuadrant.io/v1
kind: RateLimitPolicy
metadata:
  name: my-rate-limit-policy
spec:
  # Reference to an existing networking resource to attach the policy to. REQUIRED.
  # It can be a Gateway API HTTPRoute or Gateway resource.
  # It can only refer to objects in the same namespace as the RateLimitPolicy.
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute / Gateway
    name: myroute / mygateway

  # The limits definitions to apply to the network traffic routed through the targeted resource.
  # Equivalent to if otherwise declared within `defaults`.
  limits:
    "my_limit":
      # The rate limits associated with this limit definition. REQUIRED.
      # E.g., to specify a 50rps rate limit, add `{ limit: 50, duration: 1, unit: secod }`
      rates: []

      # Counter qualifiers.
      # Each dynamic value in the data plane starts a separate counter, combined with each rate limit.
      # E.g., to define a separate rate limit for each user name detected by the auth layer, add `metadata.filter_metadata.envoy\.filters\.http\.ext_authz.username`.
      # Check out Kuadrant RFC 0002 (https://github.com/Kuadrant/architecture/blob/main/rfcs/0002-well-known-attributes.md) to learn more about the Well-known Attributes that can be used in this field.
      counters: []

      # Additional dynamic conditions to trigger the limit.
      # Use it for filtering attributes not supported by HTTPRouteRule or with RateLimitPolicies that target a Gateway.
      # Check out Kuadrant RFC 0002 (https://github.com/Kuadrant/architecture/blob/main/rfcs/0002-well-known-attributes.md) to learn more about the Well-known Attributes that can be used in this field.
      when: []

    # Explicit defaults. Used in policies that target a Gateway object to express default rules to be enforced on
    # routes that lack a more specific policy attached to.
    # Mutually exclusive with `overrides` and with declaring `limits` at the top-level of the spec.
    defaults:
      limits: {}

    # Overrides. Used in policies that target a Gateway object to be enforced on all routes linked to the gateway,
    # thus also overriding any more specific policy occasionally attached to any of those routes.
    # Mutually exclusive with `defaults` and with declaring `limits` at the top-level of the spec.
    overrides:
      limits: {}

.spec.rateLimits holds a list of rate limit configurations represented by the object RateLimit. Each RateLimit object represents a complete rate limit configuration. It contains three fields:

  • rules (optional): Rules allow matching hosts and/or methods and/or paths. Matching occurs when at least one rule applies against the incoming request. If rules are not set, it is equivalent to matching all the requests.

  • configurations (required): Specifies a set of rate limit configurations that could be applied. The rate limit configuration object is the equivalent of the config.route.v3.RateLimit envoy object. One configuration is, in turn, a list of rate limit actions. Each action populates a descriptor entry. A vector of descriptor entries compose a descriptor. Each configuration produces, at most, one descriptor. Depending on the incoming request, one configuration may or may not produce a rate limit descriptor. These rate limiting configuration rules provide flexibility to produce multiple descriptors. For example, you may want to define one generic rate limit descriptor and another descriptor depending on some header. If the header does not exist, the second descriptor is not generated, but traffic keeps being rate limited based on the generic descriptor.

configurations:

  - actions:
    - request_headers:
        header_name: "X-MY-CUSTOM-HEADER"
        descriptor_key: "custom-header"
        skip_if_absent: true
  - actions:
    - generic_key:
        descriptor_key: admin
        descriptor_value: "1"
  • limits (optional): configuration of the rate limiting service (Limitador). Check out limitador documentation for more information about the fields of each Limit object.

Note: No namespace/domain defined. Kuadrant operator will figure out.

Note: There is no PREAUTH, POSTAUTH stage defined. Ratelimiting filter should be placed after authorization filter to enable authenticated rate limiting. In the future, stage can be implemented.

Kuadrant-operator's behavior

One HTTPRoute can only be targeted by one rate limit policy.

Similarly, one Gateway can only be targeted by one rate limit policy.

However, indirectly, one gateway will be affected by multiple rate limit policies. It is by design of the Gateway API, one gateway can be referenced by multiple HTTPRoute objects. Furthermore, one HTTPRoute can reference multiple gateways.

The kuadrant operator will aggregate all the rate limit policies that apply for each gateway, including RLP targeting HTTPRoutes and Gateways.

"VirtualHosting" RateLimitPolicies

Rate limit policies are scoped by the domains defined at the referenced HTTPRoute's hostnames and Gateway's Listener's Hostname.

Multiple HTTPRoutes with the same hostname

When there are multiple HTTPRoutes with the same hostname, HTTPRoutes are all admitted and envoy merge the routing configuration in the same virtualhost. In these cases, the control plane has to "merge" the rate limit configuration into a single entry for the wasm filter.

Overlapping HTTPRoutes

If some RLP targets a route for *.com and other RLP targets another route for api.com, the control plane does not do any merging. A request coming for api.com will be rate limited with the rules from the RLP targeting the route api.com. Also, a request coming for other.com will be rate limited with the rules from the RLP targeting the route *.com.

examples

RLP A -> HTTPRoute A (api.toystore.com) -> Gateway G (*.com)

RLP B -> HTTPRoute B (other.toystore.com) -> Gateway G (*.com)

RLP H -> HTTPRoute H (*.toystore.com) -> Gateway G (*.com)

RLP G -> Gateway G (*.com)

Request 1 (api.toystore.com) -> apply RLP A and RLP G

Request 2 (other.toystore.com) -> apply RLP B and RLP G

Request 3 (unknown.toystore.com) -> apply RLP H and RLP G

Request 4 (other.com) -> apply RLP G

rate limit domain / limitador namespace

The kuadrant operator will add domain attribute of the Envoy's Rate Limit Service (RLS). It will also add the namespace attribute of the Limitador's rate limit config. The operator will ensure that the associated actions and rate limits have a common domain/namespace.

The value of this domain/namespace seems to be related to the virtualhost for which rate limit applies.

Schema of the WASM filter configuration object: the PluginConfig

Currently the PluginConfig looks like this:

#  The filter’s behaviour in case the rate limiting service does not respond back. When it is set to true, Envoy will not allow traffic in case of communication failure between rate limiting service and the proxy.
failure_mode_deny: true
ratelimitpolicies:
  default/toystore: # rate limit policy {NAMESPACE/NAME}
    hosts: # HTTPRoute hostnames

      - '*.toystore.com'
    rules: # route level actions
      - operations:
          - paths:
              - /admin/toy
            methods:
              - POST
              - DELETE
        actions:
          - generic_key:
              descriptor_value: yes
              descriptor_key: admin
    global_actions: # virtualHost level actions
      - generic_key:
          descriptor_value: yes
          descriptor_key: vhaction
    upstream_cluster: rate-limit-cluster # Limitador address reference
    domain: toystore-app # RLS protocol domain value

Proposed new design for the WASM filter configuration object (PluginConfig struct):

#  The filter’s behaviour in case the rate limiting service does not respond back. When it is set to true, Envoy will not allow traffic in case of communication failure between rate limiting service and the proxy.
failure_mode_deny: true
rate_limit_policies:

  - name: toystore
    rate_limit_domain: toystore-app
    upstream_cluster: rate-limit-cluster
    hostnames: ["*.toystore.com"]
    gateway_actions:
      - rules:
          - paths: ["/admin/toy"]
            methods: ["GET"]
            hosts: ["pets.toystore.com"]
        configurations:
          - actions:
            - generic_key:
                descriptor_key: admin
                descriptor_value: "1"

Update highlights:

  • [minor] rate_limit_policies is a list instead of a map indexed by the name/namespace.
  • [major] no distinction between "rules" and global actions
  • [major] more aligned with RLS: multiple descriptors structured by "rate limit configurations" with matching rules

WASM-SHIM

WASM filter rate limit policies are not exactly the same as user managed RateLimitPolicy custom resources. The WASM filter rate limit policies is part of the internal configuration and therefore not exposed to the end user.

At the WASM filter level, there are no route level or gateway level rate limit policies. The rate limit policies in the wasm plugin configuration may not map 1:1 to user managed RateLimitPolicy custom resources. WASM rate limit policies have an internal logical name and a set of hostnames to activate it based on the incoming request’s host header.

The WASM filter builds a tree based data structure holding the rate limit policies. The longest (sub)domain match is used to select the policy to be applied. Only one policy is being applied per invocation.

rate limit configurations

The WASM filter configuration object contains a list of rate limit configurations to build a list of Envoy's RLS descriptors. These configurations are defined at

rate_limit_policies[*].gateway_actions[*].configurations

For example:

configurations:

- actions:
   - generic_key:
        descriptor_key: admin
        descriptor_value: "1"

How to read the policy:

  • Each configuration produces, at most, one descriptor. Depending on the incoming request, one configuration may or may not produce a rate limit descriptor.

  • Each policy configuration has associated, optionally, a set of rules to match. Rules allow matching hosts and/or methods and/or paths. Matching occurs when at least one rule applies against the incoming request. If rules are not set, it is equivalent to matching all the requests.

  • Each configuration object defines a list of actions. Each action may (or may not) produce a descriptor entry (descriptor list item). If an action cannot append a descriptor entry, no descriptor is generated for the configuration.

Note: The external rate limit service will be called when the gateway_actions object produces at least one not empty descriptor.

example

WASM filter rate limit policy for *.toystore.com. I want some rate limit descriptors configuration only for api.toystore.com and another set of descriptors for admin.toystore.com. The wasm filter config would look like this:

failure_mode_deny: true
rate_limit_policies:

  - name: toystore
    rate_limit_domain: toystore-app
    upstream_cluster: rate-limit-cluster
    hostnames: ["*.toystore.com"]
    gateway_actions:
      - configurations:  # no rules. Applies to all *.toystore.com traffic
          - actions:
              - generic_key:
                  descriptor_key: toystore-app
                  descriptor_value: "1"
      - rules:
          - hosts: ["api.toystore.com"]
        configurations:
          - actions:
              - generic_key:
                  descriptor_key: api
                  descriptor_value: "1"
      - rules:
          - hosts: ["admin.toystore.com"]
        configurations:
          - actions:
              - generic_key:
                  descriptor_key: admin
                  descriptor_value: "1"
  • When a request for api.toystore.com hits the filter, the descriptors generated would be:

descriptor 1

("toystore-app", "1")
descriptor 2
("api", "1")

  • When a request for admin.toystore.com hits the filter, the descriptors generated would be:

descriptor 1

("toystore-app", "1")
descriptor 2
("admin", "1")

  • When a request for other.toystore.com hits the filter, the descriptors generated would be: descriptor 1
    ("toystore-app", "1")