Policy Machinery for reconciliation¶

Feature Name: policy_machinery
Start Date: 2024-07-03
RFC PR: Kuadrant/architecture#95
Issue tracking: Kuadrant/architecture#29

Summary¶

Explain how Kuadrant's Policy Machinery can be used for reconciliation.

Motivation¶

The Policy Machinery project (repo, pkg.go) offers a set of types and functions for implementing Gateway API policies – i.e.

highly flexible representation of topologies of targetable resources;
calculating effective policies based on custom or default merge strategies;
tooling to watch and reconcile resources based on cluster events.

These can be used for tailoring implemention of Kuadrant policies and Kuadrant instances. See example provided.

Leveraging the Policy Machinery can be key to:

Improve flow control of concurrent reconciliation events
Simplification of the calculation of effective policies respectively to the topological routing path of the requests
Correct implementation of Defaults & Overrides' merge strategy (RFC 0009)
Supporting multiple policies targeting a same resource
Supporting targeting sections of a resource (e.g. Gateway listeners, HTTPRouteRules)
Supporting multiple targetRefs in a policy
Extending policies to target other kinds of resources (e.g. GatewayClass, Service, Namespace) -– future

Guide-level explanation¶

Although essentially an implementation detail of the Kuadrant Operator, levering the Policy Machinery may introduce the following user-perceived features:

Acknowledgement of the topological routing path of the request respectively to applicable effective policies
New form of targeting sections of a resource with a policy
Elevated meaning of uniquely identifiable concepts across policy resources (e.g. named policy rules)
Possibility of multiple policies of a kind targeting a same resource
(Window of opportunity for) introducing plural targetRefs

User-acknowledgeable reference of the topological routing path of the request¶

One who specifies Kuadrant policies targeting resources at any levels of the hierarchy Gateway → Listener → HTTPRoute → HTTPRouteRule¹ shall expect the effect of such policies to be reported always respectively to the lowest level of the hierarchy that the kind of policy allows targeting. E.g.:

A DNSPolicy that allows targeting a Gateway hypothetically with or without specifying a sectionName actually targets gateway listeners; the user wants to reason about the state of DNSPolicies regarding their effect on each listeners specified in the Gateway. In a context with 2+ DNSPolicies, simultaneously targeting both a Gateway and specific listeners, DNS records for some listener hostnames may have been reconciled according to the specification from one policy or another, occasionally no policy at all.
A RateLimitPolicy that allows targeting a Gateway, a HTTPRoute or specific HTTPRouteRule actually targets (directly or indirectly) HTTPRouteRule objects; the user wants to reason about the state of all RateLimitPolicies with respect to each HTTPRouteRule, where some HTTPRouteRules may be protected by one RateLimitPolicy, a combination of multiple RateLimitPolicies, or occasionally no RateLimitPolicy at all.

In the specific case of policy kinds that allow targeting HTTPRouteRules, due to complex network topologies supported by Gateway API, including in particular HTTPRoutes with multiple Gateway parents, a same HTTPRouteRule may or may not be affected by a policy, depending on which routing path in the network topology a request flows. Therefore, ultimately users will reason about policies and effective policies in terms of the paths between at least gateways and the lowest levels targeted by the policies that a request can flow. Possibly, in terms of all possible paths between Gateways and Services.

Kuadrant shall provide users with such visibility. Leveraging Policy Machinery is an implementation detail that nonetheless makes achieving this goal easier.

User-facing changes to the policy APIs¶

Targeting sections of network resource¶

Leveraging Policy Machinery may also motivate some user-facing changes. In particular, replacing AuthPolicy's and RateLimitPolicy's routeSelectors for a targetRef with optional sectionName (made possible since kubernetes-sigs/gateway-api#2895.)

This change would cause a policy of AuthPolicy or RateLimitPolicy kind to always be attached to its targets entirely, i.e. without having rules that attach to some sections and other rules to other sections or no section at all. This differs from current situation where a policy of those kinds can be attached to a HTTPRoute and some of the policy's rules more specifically attached to individual HTTPRouteRules only, including with multiple policy rules attached to different HTTPRouteRules of the same targeted HTTPRoute. Instead, attaching a policy must be a cohesive, unambiguous operation, that occasionally requires users to specify more fine-grained policy objects to be attached only to sections of a resource.

Identity of concepts across multiple policy objects¶

In some cases, splitting policy objects for the purpose of targeting sections of a network resource, without breaking the semantics of having a single set of policy objects cohesively defined, also implies that definitions about a same entity or concept within a policy (e.g. a limit definition), repeated at multiple policy objects, may have a way to represent to refer to the same thing (e.g. same set of counters).

This is the case, for example, of limit definitions in a RateLimitPolicy as well as cache configs in an AuthPolicy. To avoid creating multiple rate-limit counter namespaces (analogously, multiple authorization rule cache entries) for definitions that are effectively about the same entity, despite specified at multiple policy objects, the APIs must provide a way for users to convey one of the other intent: definitions refer to the same thing versus definitions refer to different things.²

Other user-facing changes¶

The following possible (non-required) other user-facing changes can be enabled leveraging Policy Machinery, without marginal implementation cost:

Plural targetRefs.
Multiple policies of a kind targeting a same network resource/resource section ("horizontal Defaults & Overrides.")
Aesthetical difference between Direct versus Inherited policies, implied by the merge strategies implemented by each kind of policy, rather than a necessary distinction.

Reference-level explanation¶

Usage of the Policy Machinery consists of importing two packages:

machinery: provides the types and abstractions to build Gateway API topologies of targetable network resources, policies and adjacent objects;
controller: offers tools for implementing topology-based custom controllers of reconciliation logic.

From that on, the following steps drive leveraging the Policy Machinery in the Kuadrant Operator:

Implement the machinery.Policy interface for all kinds of policies.

Example provided for the DNSPolicy, TLSPolicy, AuthPolicy and RateLimitPolicy kinds.
Define wrappers that implement the machinery.Object interface for any kind of adjacent object whose unique identifier as a node in the topology graph cannot be based on the default controller.RuntimeObject type provided (if any.)
Implement the linking functions for all kinds of adjacent objects and corresponding parents and children in the topology graph, including types such as Kuadrant, Istio's WasmPlugin, ConfigMap, etc.

The Kuadrant custom resources shall be the roots of a directed acyclic graph (DAG) from which an entired topology of targetable network resources, adjacent objects and policies are connected.
Start a controller.Controller that:
1. Watches for all kinds of objects to be represented in the topology.
2. Triggers a controller.Workflow on events related to any of the watched resources.
At every reconciliation event³:
1. Reconcile the internal objects for setting up the environment for a Kuadrant instance (deployments, gateway controller configs, etc).
2. For each kind of policy and applicable path in the topology graph relevant for the policy kind:
3. Compute an effective policy and give it a unique identifier.
4. Perform (or delegate) the policy-specific configuration (DNS, TLS, Auth, RL) of the effective policy with the policy decision/enforcement point⁴.
5. For policy kinds enforced at request time (data-plane policies), configure the gateway to call the policy decision point (PDP) on requests that match the attributes of the path⁵, passing in the payload to the PDP the identifier of the effective policy.
6. Update the status stanzas of all targetables in the topology whose paths were configured for an effective policy (or lack of such), with a map that allows users to inspect, for a given path, what effective policy (if any) will be enforced. > Note: If unsuitable for the status stanza of the object, the details of the effective policies may require additional tooling to be inspected by the users and the mapping must be to the unique identifier of the effective policy.
7. Store a DOT representation of the topology graph in a ConfigMap.

Drawbacks¶

Part of the work consists on refactoring, without value perceived by the user.

Rationale and alternatives¶

Annotations¶

Use of annnotations to track back references from targeted objects to policies. This approach has been slowly deprecated to favour the use of an in-memory directed acyclic graph (DAG) representing the relationship between network objects and policies⁶.

Bottom-up reconciliation¶

Bottom-up reconciliation by default, focusing on the policy resources first. This approach has been slowly refactored to using mappers (event handlers) that often multiply a lower-level event into multiple top-down ones, occasionally with the occurrence of repetitive events.

DAG 1.0¶

Preliminary version of the topology DAG⁶ that:

is bootstrapped at every reconciliation event (though leveraging k8s.io/apimachinery's and sigs.k8s.io/controller-runtime's caches);
does not include all kinds of targetables – missing object sections in particular;
does not include internal configuration objects;
was designed for one single kind of policy in each instance of the topology.

Effective policy-less reconciliation¶

Configuration of internal resources for implementing effective policy behavior:

tailored for each specific kind of policy, without leveraging generic and resusable merge strategy functions;
not organically integrated with the topology DAG;
decoupled from the rather user-acknowledgeable reference of the topological routing path of the request.

Prior art¶

Envoy Gateway state-of-the-world reconciliation¶

Envoy Gateway has implemented a custom controller for Gateway API and provider-specific resources with the following characteristics similar to the Policy Machinery controller package's approach:

Based on controller-runtime
Single Reconcile function that:
1. lists all watched resources from the cluster at every reconciliation event;
2. rebuilds and updates a long-living watchable map of all the resources;
3. trigger reconciliation logic subscribed to changes to the map – goroutines decoupled from controller-runtime.

Unresolved questions¶

The fanout problem, especially of status reconciliation.
~~How to avoid changes performed by the reconciliation function to loop back in the form of multiple other (no-op) reconciliation events.~~ ⇨ issue state-modifying actions against the API server directly and safe-guard against create/update/delete events for states already reflected in the topology
~~How to compact multiple equifinal reconciliation events waiting in the queue into a single one, thus avoiding unnecessary loops.~~ ⇨ rely on controller-runtime event coalescing
~~What to do in case of reconciliation failures without retry~~ ⇨ always move the system to a final state, with proper status updating, and wait until state-of-the-world reconciliation kicks in again on the next event

Future possibilities¶

Extending policies to target other kinds of resources (e.g. GatewayClass, Service, Namespace.)

Gateway listeners and HTTPRouteRules can be targeted by specifying in a policy their main Gateway and HTTPRoute objects as targets, in combination with either a sectionName (supported in the targetRef field of the policy) or via routeSelectors (in the policy spec proper), respectively. Not all kinds of policies support targeting all 4 kinds of targetables of the Gateway → Listener → HTTPRoute → HTTPRouteRule hierarchy; some kinds of policies may support targeting only a few of those. ↩
In the context of rate-limit, this problem is also referred to as the problem of the identity of a limit. ↩
Specific steps can be filtered by type of event. ↩
While the DNS operator as well as the configuration performed by the Kuadrant Operator for a TLSPolicy are closer to the enforcement of the specifications in the DNS and TLS policy objects, Authorino and Limitador are policy decision points (PDP) rather. Indistinctively, control-plane operations that configure a service based on the specification of a policy, as well as the data-plane protection services that perform at request-time along with the gateways are all part of the policy enforcement. ↩
The attributes of a path in the topology from a Gateway to a HTTPRouteRule object typically include a hostname and the set of HTTPRouteMatches specified in the HTTPRouteRule. ↩
See kuadrant-operator#530. ↩↩