Skip to content

Commit a53fb30

Browse files
authored
Add experiment support for Gateway API traffic router (#116)
* Add experiment support for Gateway API traffic router This implementation handles adding experiment services to HTTPRoute when an experiment is active and removing them when the experiment completes. Addresses issue #112. Signed-off-by: vthiruveedhi <[email protected]> * The function now uses additionalDestinations passed from the interface instead of pulling from rollout status for better responsiveness. Addresses issue #112. Signed-off-by: vthiruveedhi <[email protected]> --------- Signed-off-by: vthiruveedhi <[email protected]>
1 parent 50fbc6b commit a53fb30

File tree

9 files changed

+570
-2
lines changed

9 files changed

+570
-2
lines changed
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Argo Rollouts Gateway API Experiment Support
2+
3+
This feature adds support for conducting experiments with Argo Rollouts using the Kubernetes Gateway API. Experiments allow you to test multiple versions of your application simultaneously with precise control over traffic distribution.
4+
5+
## Overview
6+
7+
When using the Gateway API traffic router with Argo Rollouts, you can now define experiments that:
8+
9+
- Automatically adjust traffic weights in HTTPRoutes for the additional services created for experiment variants
10+
- Clean up experiment services when experiments complete
11+
12+
## How It Works
13+
14+
The plugin automatically:
15+
16+
1. Detects when an experiment is active in a rollout
17+
2. Adjusts the stable service weight to accommodate experiment traffic
18+
3. Adds experiment services to the HTTPRoute with appropriate weights
19+
4. Removes experiment services when the experiment completes
20+
21+
## Example Usage
22+
23+
The included example demonstrates a rollout with an experiment step that tests:
24+
- A baseline variant based on the stable version (10% traffic)
25+
- A canary variant based on the new version (10% traffic)
26+
27+
During the experiment:
28+
- The stable service receives 80% of traffic (reduced from 100%)
29+
- The canary service continues to receive 0% traffic
30+
- The experiment variants receive their specified weights ( 10% , 10%)
31+
32+
After the experiment completes, traffic distribution returns to normal with stable receiving 100% until the next step begins.
33+
34+
### Sample Manifest
35+
36+
```yaml
37+
apiVersion: argoproj.io/v1alpha1
38+
kind: Rollout
39+
metadata:
40+
name: demo-app
41+
namespace: demo
42+
spec:
43+
strategy:
44+
canary:
45+
canaryService: demo-app-canary
46+
stableService: demo-app-stable
47+
trafficRouting:
48+
plugins:
49+
argoproj-labs/gatewayAPI:
50+
httpRoute: demo-app-route
51+
namespace: demo
52+
steps:
53+
- experiment:
54+
duration: 5m
55+
templates:
56+
- name: experiment-baseline
57+
specRef: stable
58+
service:
59+
name: demo-app-exp-baseline
60+
weight: 10
61+
- name: experiment-canary
62+
specRef: canary
63+
service:
64+
name: demo-app-exp-canary
65+
weight: 10
66+
# Remaining steps...
67+
```
68+
69+
## Implementation Details
70+
71+
The experiment handler:
72+
73+
1. Identifies the matching rule in the HTTPRoute for the rollout
74+
2. Checks if an experiment is active by examining `rollout.Status.Canary.CurrentExperiment`
75+
3. For active experiments:
76+
- Sets the stable service weight to 80%
77+
- Adds experiment services from `rollout.Status.Canary.Weights.Additional`
78+
4. For inactive experiments:
79+
- Removes any experiment services from the HTTPRoute
80+
- Resets the stable service weight to 100%
81+
82+
## Requirements
83+
84+
- Kubernetes cluster with Gateway API CRDs installed
85+
- Argo Rollouts v1.5.0 or newer
86+
- Simple HTTP Gateway (TLS configuration optional)
87+
88+
## See Also
89+
90+
- [Argo Rollouts Documentation](https://argoproj.github.io/argo-rollouts/)
91+
- [Gateway API Documentation](https://gateway-api.sigs.k8s.io/)
92+
- [Experiment Documentation](https://argoproj.github.io/argo-rollouts/features/experiment/)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
kind: Gateway
2+
apiVersion: gateway.networking.k8s.io/v1beta1
3+
metadata:
4+
name: app-gateway
5+
namespace: demo
6+
spec:
7+
gatewayClassName: gke-l7-rilb
8+
listeners:
9+
- name: http
10+
protocol: HTTP
11+
port: 80
12+
allowedRoutes:
13+
namespaces:
14+
from: All
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
kind: HTTPRoute
2+
apiVersion: gateway.networking.k8s.io/v1beta1
3+
metadata:
4+
name: demo-app-route
5+
namespace: demo
6+
labels:
7+
managed-by: external-dns
8+
spec:
9+
parentRefs:
10+
- kind: Gateway
11+
name: app-gateway
12+
namespace: demo
13+
hostnames:
14+
- "demo.example.com"
15+
rules:
16+
- backendRefs:
17+
- name: demo-app-stable
18+
namespace: demo
19+
port: 80
20+
weight: 100
21+
- name: demo-app-canary
22+
namespace: demo
23+
port: 80
24+
weight: 0
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
apiVersion: argoproj.io/v1alpha1
2+
kind: Rollout
3+
metadata:
4+
name: demo-app
5+
namespace: demo
6+
spec:
7+
replicas: 3
8+
strategy:
9+
canary:
10+
canaryService: demo-app-canary
11+
stableService: demo-app-stable
12+
trafficRouting:
13+
plugins:
14+
argoproj-labs/gatewayAPI:
15+
httpRoute: demo-app-route
16+
namespace: demo
17+
steps:
18+
- experiment:
19+
duration: 5m
20+
templates:
21+
- name: experiment-baseline
22+
specRef: stable
23+
service:
24+
name: demo-app-exp-baseline
25+
weight: 10
26+
metadata:
27+
labels:
28+
app: demo-app
29+
- name: experiment-canary
30+
specRef: canary
31+
service:
32+
name: demo-app-exp-canary
33+
weight: 15
34+
metadata:
35+
labels:
36+
app: demo-app
37+
- pause: {} # Empty pause means indefinite - will require manual promotion
38+
- setWeight: 30
39+
- pause: { duration: 5m }
40+
- setWeight: 60
41+
- pause: { duration: 5m }
42+
- setWeight: 100
43+
- pause: { duration: 5m }
44+
revisionHistoryLimit: 2
45+
selector:
46+
matchLabels:
47+
app: demo-app
48+
template:
49+
metadata:
50+
labels:
51+
app: demo-app
52+
spec:
53+
containers:
54+
- name: demo-app
55+
image: argoproj/rollouts-demo:blue # change to green for next version
56+
ports:
57+
- name: http
58+
containerPort: 8080
59+
protocol: TCP
60+
resources:
61+
requests:
62+
memory: 64Mi
63+
cpu: 10m
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
apiVersion: v1
3+
kind: Service
4+
metadata:
5+
name: demo-app-canary
6+
namespace: demo
7+
spec:
8+
ports:
9+
- port: 80
10+
targetPort: http
11+
protocol: TCP
12+
name: http
13+
selector:
14+
app: demo-app
15+
---
16+
apiVersion: v1
17+
kind: Service
18+
metadata:
19+
name: demo-app-stable
20+
namespace: demo
21+
spec:
22+
ports:
23+
- port: 80
24+
targetPort: http
25+
protocol: TCP
26+
name: http
27+
selector:
28+
app: demo-app

pkg/plugin/experiment.go

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
package plugin
2+
3+
import (
4+
"context"
5+
"fmt"
6+
7+
"github.com/argoproj/argo-rollouts/pkg/apis/rollouts/v1alpha1"
8+
"github.com/sirupsen/logrus"
9+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
10+
"k8s.io/client-go/kubernetes"
11+
gatewayv1 "sigs.k8s.io/gateway-api/apis/v1"
12+
gatewayApiClientset "sigs.k8s.io/gateway-api/pkg/client/clientset/versioned"
13+
)
14+
15+
func HandleExperiment(ctx context.Context, clientset *kubernetes.Clientset, gatewayClient *gatewayApiClientset.Clientset, logger *logrus.Entry, rollout *v1alpha1.Rollout, httpRoute *gatewayv1.HTTPRoute, additionalDestinations []v1alpha1.WeightDestination) error {
16+
ruleIdx := -1
17+
stableService := rollout.Spec.Strategy.Canary.StableService
18+
canaryService := rollout.Spec.Strategy.Canary.CanaryService
19+
20+
for i, rule := range httpRoute.Spec.Rules {
21+
if ruleIdx != -1 {
22+
break
23+
}
24+
for _, backendRef := range rule.BackendRefs {
25+
if string(backendRef.Name) == stableService || string(backendRef.Name) == canaryService {
26+
ruleIdx = i
27+
break
28+
}
29+
}
30+
}
31+
32+
if ruleIdx == -1 {
33+
return fmt.Errorf("no matching rule found for rollout %s", rollout.Name)
34+
}
35+
36+
isExperimentActive := rollout.Spec.Strategy.Canary != nil && rollout.Status.Canary.CurrentExperiment != ""
37+
38+
hasExperimentServices := false
39+
for _, backendRef := range httpRoute.Spec.Rules[ruleIdx].BackendRefs {
40+
serviceName := string(backendRef.Name)
41+
if serviceName != stableService && serviceName != canaryService {
42+
hasExperimentServices = true
43+
break
44+
}
45+
}
46+
47+
if isExperimentActive {
48+
logger.Info(fmt.Sprintf("Found active experiment %s", rollout.Status.Canary.CurrentExperiment))
49+
50+
if len(additionalDestinations) == 0 {
51+
logger.Info("No experiment services found in additionalDestinations, skipping experiment service addition")
52+
return nil
53+
}
54+
55+
stableWeight := int32(45)
56+
for i, backendRef := range httpRoute.Spec.Rules[ruleIdx].BackendRefs {
57+
if string(backendRef.Name) == stableService {
58+
httpRoute.Spec.Rules[ruleIdx].BackendRefs[i].Weight = &stableWeight
59+
break
60+
}
61+
}
62+
63+
for _, additionalDest := range additionalDestinations {
64+
serviceName := additionalDest.ServiceName
65+
weight := additionalDest.Weight
66+
67+
exists := false
68+
for _, backendRef := range httpRoute.Spec.Rules[ruleIdx].BackendRefs {
69+
if string(backendRef.Name) == serviceName {
70+
exists = true
71+
break
72+
}
73+
}
74+
75+
if !exists {
76+
logger.Info(fmt.Sprintf("Adding experiment service to HTTPRoute: %s with weight %d", serviceName, weight))
77+
78+
service, err := clientset.CoreV1().Services(rollout.Namespace).Get(ctx, serviceName, metav1.GetOptions{})
79+
if err != nil {
80+
logger.Warn(fmt.Sprintf("Failed to get service %s: %v", serviceName, err))
81+
continue
82+
}
83+
84+
port := gatewayv1.PortNumber(8080)
85+
portName := "http"
86+
for _, servicePort := range service.Spec.Ports {
87+
if servicePort.Name == portName {
88+
port = gatewayv1.PortNumber(servicePort.Port)
89+
break
90+
}
91+
}
92+
93+
if len(service.Spec.Ports) > 0 && port == 8080 {
94+
port = gatewayv1.PortNumber(service.Spec.Ports[0].Port)
95+
}
96+
97+
namespace := gatewayv1.Namespace(rollout.Namespace)
98+
httpRoute.Spec.Rules[ruleIdx].BackendRefs = append(httpRoute.Spec.Rules[ruleIdx].BackendRefs, gatewayv1.HTTPBackendRef{
99+
BackendRef: gatewayv1.BackendRef{
100+
BackendObjectReference: gatewayv1.BackendObjectReference{
101+
Name: gatewayv1.ObjectName(serviceName),
102+
Namespace: &namespace,
103+
Port: &port,
104+
},
105+
Weight: &weight,
106+
},
107+
})
108+
}
109+
}
110+
return nil
111+
}
112+
113+
if !isExperimentActive && hasExperimentServices {
114+
logger.Info("Experiment is no longer active, removing experiment services from HTTPRoute")
115+
116+
stableWeight := int32(100)
117+
filteredBackendRefs := []gatewayv1.HTTPBackendRef{}
118+
119+
for _, backendRef := range httpRoute.Spec.Rules[ruleIdx].BackendRefs {
120+
serviceName := string(backendRef.Name)
121+
122+
if serviceName == stableService {
123+
backendRef.Weight = &stableWeight
124+
filteredBackendRefs = append(filteredBackendRefs, backendRef)
125+
} else if serviceName == canaryService {
126+
zeroWeight := int32(0)
127+
backendRef.Weight = &zeroWeight
128+
filteredBackendRefs = append(filteredBackendRefs, backendRef)
129+
} else {
130+
logger.Info(fmt.Sprintf("Removing experiment service from HTTPRoute: %s", serviceName))
131+
}
132+
}
133+
134+
httpRoute.Spec.Rules[ruleIdx].BackendRefs = filteredBackendRefs
135+
logger.Info("Experiment services removed from HTTPRoute")
136+
}
137+
138+
return nil
139+
}

0 commit comments

Comments
 (0)