Skip to content

[pull] dev from ray-project:master #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 129 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
2faf3b3
[docs] Correct typos in CONTRIBUTING.md and api-server README.md (#3492)
LeoLiao123 Apr 26, 2025
9620772
[Feature] Upgrade grpc gateway version manually (#3491)
JiangJiaWei1103 Apr 26, 2025
014b1c7
[TEST] e2e test for Cluster in `resource_manager` (#3432)
machichima Apr 26, 2025
510827f
Only try once in HTTP health check commands (#3469)
epall Apr 26, 2025
022ff0d
[Prometheus] Add `kuberay_cluster_provisioned_duration_seconds` metri…
win5923 Apr 26, 2025
2ba0dd7
[Apiserver] Set the right amount of resource in e2e test (#3465)
owenowenisme Apr 26, 2025
ed7f3db
Fix upgrade gomega (#3483)
owenowenisme Apr 28, 2025
cbde878
[CI][HELM] Use chart-testing to install Helm charts (#3412)
ChenYi015 Apr 28, 2025
410e8fb
Bump github.com/Masterminds/semver/v3 in /ray-operator (#3500)
dependabot[bot] Apr 28, 2025
321f985
Fix: Helm lint and test CI failed (#3505)
ChenYi015 Apr 28, 2025
f6a401a
[Feature] Upgrade ginkgo (#3503)
LeoLiao123 Apr 29, 2025
0561ba1
[CI] Upload logs as artifacts to BuildKite (#3405)
win5923 Apr 29, 2025
16e44d3
Bump the google-golang group across 5 directories with 3 updates (#3493)
dependabot[bot] Apr 29, 2025
1be2ae0
[Feature] Upgrade net package (#3485)
400Ping Apr 29, 2025
4a4471a
Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 in /kubectl-plugin (#…
dependabot[bot] Apr 29, 2025
1a94b43
[Feature] Manually upgrade k8s package group (#3486)
LeoLiao123 Apr 30, 2025
c031ac8
feat: use specified `--ray-version` in `--image` (#3514)
davidxia Apr 30, 2025
62302d8
[SLI Metrics] kuberay_job_execution_duration_seconds (#3488)
troychiu Apr 30, 2025
2a98241
docs: update dev docs to use Golang 1.24 (#3515)
davidxia Apr 30, 2025
5b76625
[apiserversdk] implement the apiserversdk proxy (#3494)
rueian May 1, 2025
d35c919
[apiserversdk] use config.Middleware at most once (#3522)
rueian May 2, 2025
b8484af
[Fix][kubectl-plugin] Remove filepath.Clean for ray job submit workin…
MortalHappiness May 2, 2025
4b46822
[Fix]remove broken link in doc (#3519)
simo-hsieh May 2, 2025
52e330b
[Feat][kubectl-plugin] Support -v flag for kubectl ray job submit (#3…
MortalHappiness May 2, 2025
4ff8316
[Refactor] Improve developer experience of API server e2e-test (#3466)
JiangJiaWei1103 May 2, 2025
796bf06
[Apiserver] Determine the minimum resource requirements for KubeRay A…
kenchung285 May 3, 2025
c7fe15b
[Fix][Operator] Explictly wait for pod not found for satisfying the d…
MortalHappiness May 3, 2025
2d6cdb1
[apiserver] Support setting headServiceAnnotations (#3523)
troychiu May 3, 2025
5318a73
[TEST] Unit tests for `ray_job_submission_service_server.go` (#3532)
machichima May 4, 2025
2590a0b
Bump github.com/jarcoal/httpmock from 1.2.0 to 1.4.0 in /ray-operator…
dependabot[bot] May 5, 2025
f21b999
[TEST] Improve unit test coverage for apiserver pkg/model (#3495)
JiangJiaWei1103 May 5, 2025
6cefc40
[Prometheus] Refactor `kuberay_cluster_provisioned_duration_seconds` …
win5923 May 6, 2025
2f2c1a2
[Fix] RayCluster fails to transit Status.State to Ready when numOfHos…
CheyuWu May 6, 2025
f45155b
[Feature] Add timeout for apiserver grpc server (#3427)
machichima May 6, 2025
9514884
[apiserversdk] Add query filter to proxy (#3534)
owenowenisme May 7, 2025
6cbb8e7
[Feature] Fix dependency upgrade for gomock (#3558)
400Ping May 7, 2025
abd3f87
[CI] Fix: /etc/docker/daemon.json: No such file or directory (#3565)
win5923 May 8, 2025
3875356
[Bug][kubectl-plugin] Wrong behavior for InteractiveMode RayJob with …
CheyuWu May 8, 2025
0d813b4
[Bug][CI] Multi-platform build fails with docker driver in GitHub Act…
400Ping May 9, 2025
064e0ef
Revert "[Bug][CI] Multi-platform build fails with docker driver in Gi…
kevin85421 May 9, 2025
67596d3
[CI] Fix MultiArch image push (#3575)
kevin85421 May 9, 2025
801f081
[RayCluster][Expectation] Add a test to ensure expectations work well…
kenchung285 May 9, 2025
0721d8f
[apiserver] ListAllServices with pagination (#3490)
tinaxfwu May 9, 2025
4a12d78
Add more grouping to resolve inconsistencies when bumping versions (#…
kenmcheng May 9, 2025
c13498b
[Feature] Auto detect MIG GPUs and pass them into Ray’s logical resou…
siyuanfoundation May 10, 2025
bf8a931
[refactor][operator]: make `RayStartParams` optional (#3202)
davidxia May 11, 2025
7db8f69
[Feature] Add unit test for update service request validation (#3546)
LeoLiao123 May 12, 2025
9306b50
refactor: remove unnecessary type args when type can be inferred (#3585)
davidxia May 13, 2025
7b1f69b
[apiserversdk] check service belongs to kuberay (#3563)
troychiu May 13, 2025
36267ed
Add dashboard component to master (#3566)
han-steve May 13, 2025
5e42b8e
refactor tests: use testify instead of Fatal (#3593)
davidxia May 13, 2025
16d87f1
chore operator: improve `TestIsAutoscalingEnabled` test (#3583)
davidxia May 13, 2025
5b0b9af
Remove unused icon from dashboard (#3599)
han-steve May 13, 2025
6070f60
[apiserversdk] make withFieldSelector private and consistent 'KubeRay…
rueian May 13, 2025
6c235d8
Bump @babel/runtime from 7.24.1 to 7.27.1 in /dashboard (#3591)
dependabot[bot] May 13, 2025
91245ad
Bump braces from 3.0.2 to 3.0.3 in /dashboard (#3590)
dependabot[bot] May 13, 2025
f605b6c
Bump nanoid from 3.3.7 to 3.3.11 in /dashboard (#3589)
dependabot[bot] May 13, 2025
a9c48c1
feat: add Version to AutoscalerOptions (#3578)
davidxia May 13, 2025
4c8bbce
[refactor] Combine TestCalculateMinReplicas and TestCalculateMaxRepli…
tinaxfwu May 13, 2025
b7c084a
[kubectl-plugin] Use dashboard API instead of the stdout of the ray j…
LeoLiao123 May 14, 2025
cf7bbab
[refactor] Use mutate funcs to clearly show per-test field changes (#…
LeoLiao123 May 14, 2025
8779e92
[chore] Remove misleading log (#3601)
kevin85421 May 15, 2025
27632f5
test: remove duplicate delete worker group test (#3605)
emmanuel-ferdman May 15, 2025
f7a76e8
feat plugin: support enabling autoscaler both v1 and v2 (#3459)
davidxia May 15, 2025
2409109
[Prometheus] Add kuberay_cluster_info metric (#3535)
win5923 May 16, 2025
03eb92c
[Refactor] Remove duplicate definition of `get_ray_cluster_status` (#…
LeoLiao123 May 16, 2025
2ebbb39
[docs] Fix typos (#3609)
omahs May 16, 2025
837c3fa
[SLI Metrics] Add metric kuberay_job_info (#3621)
troychiu May 17, 2025
d2d6ab1
chore CI: use Go 1.24 everywhere (#3584)
davidxia May 17, 2025
d9a1801
[SLI-Metrics] Ray service info (#3604)
owenowenisme May 18, 2025
f3ebea7
[Doc][CI] Align K8s version in Doc and CI with minimal required versi…
kenchung285 May 19, 2025
768b29e
[Test][Autoscaler] Add an E2E test for CPU tasks on GPU nodes. (#3629)
LeoLiao123 May 20, 2025
75f7f75
refactor tests: use testify instead of Fatal everywhere (#3600)
davidxia May 20, 2025
60bc89d
[CI] fix missing Go module release step (#3644)
davidxia May 20, 2025
196f789
[kubectl-plugin] Support node selectors for kubectl ray job submit (#…
CheyuWu May 20, 2025
d0683a9
Single go.mod file (#3640)
troychiu May 21, 2025
0601bfa
[SLI Metrics] Add metric kuberay_cluster_condition_provisioned (#3635)
win5923 May 21, 2025
dc689b1
doc: mention kubectl plugin in README (#3652)
davidxia May 21, 2025
2ef997b
[apiserver] Start apiserver v2 in apiserver/cmd/main.go (#3603)
troychiu May 21, 2025
a5cad39
[Test][Autoscaler] Add an E2E test for updating maxReplicas on a work…
machichima May 22, 2025
bc2e2c6
[Test][Autoscaler] Add an E2E test for not removing idle nodes requir…
rueian May 22, 2025
9f013a3
[Grafana] Allow auto-load dashboard jsons (#3643)
owenowenisme May 22, 2025
692138b
[Feature][Ray-operator] Improve RayJob validation for `shutdownAfterJ…
CheyuWu May 22, 2025
aec68d1
test: reduce requests in sample ray service yaml config (#3636)
pawelpaszki May 22, 2025
ee0d6c1
[SLI-Metrics] kuberay_service_ready (#3577)
owenowenisme May 22, 2025
3a925f3
[Hotfix] Extend Autoscaler e2e tests timeout (#3665)
kevin85421 May 22, 2025
a865920
[Test][Autoscaler] Add E2E test for ray.autoscaler.sdk.request_resour…
nadongjun May 23, 2025
bdfb012
[SLI-Metric] kuberay_service_condition_upgrade_in_progress (#3663)
owenowenisme May 23, 2025
bb5b788
[RayJob] Add `RayJobInfo` to RayJob CRD status (#3673)
kevin85421 May 24, 2025
abb0bf4
[Fix][kubectl-plugin] Remove controller-runtime logger warning in `ku…
EagleLo May 24, 2025
105e880
[Metric] kuberay_job_deployment_status (#3656)
troychiu May 24, 2025
b66763d
[RayService] don't update serveConfigV2 in current ray cluster if ray…
fscnick May 24, 2025
942e266
[Test][Autoscaler] Add an E2E test for adding a new worker group (#3680)
kenmcheng May 24, 2025
f7102b2
[Prometheus] Add serviceMonitor for KubeRay Operator (#3530)
win5923 May 24, 2025
d901fd0
[Chore] Add kubectl plugin and dashboard to components in issue templ…
MortalHappiness May 24, 2025
d125ab7
[Autoscaler] Improve `TestRayClusterAutoscalerAddNewWorkerGroup` (#3682)
kevin85421 May 25, 2025
9e8b4ce
[docs] Remove unused docs (#3683)
kevin85421 May 25, 2025
357295d
[docs] Remove unused docs (#3684)
kevin85421 May 25, 2025
955eac7
[apiserver] Make local-e2e-test hermetic (#3513)
troychiu May 25, 2025
773a475
[API Server] Add v2 related helm (#3677)
troychiu May 25, 2025
da78df4
[Feature][kubectl-plugin] Expose setting `shutdownAfterJobFinishes` a…
CheyuWu May 26, 2025
82a587d
[Test][Autoscaler] Add an E2E test for placement groups (#3687)
rueian May 26, 2025
02909a2
[CI] Fix autoscaler e2e test flakiness caused by timeout (#3668)
nadongjun May 26, 2025
846416e
[API Server] consolidate e2e test (#3674)
troychiu May 26, 2025
2285202
[ray-operator][Bug] Rayjob is Failed or Succeed, but Raycluster statu…
dushulin May 26, 2025
be22ecf
[Doc] Update README (#3695)
kevin85421 May 26, 2025
455eb3f
[kubectl-plugin] Generate `submission_id` in `job_submit.go` (#3693)
LeoLiao123 May 27, 2025
348ef38
Fix broken link in documentation (#3697)
nadongjun May 27, 2025
f6637d7
[Grafana] Add flag for enabling auto load dashboards (#3689)
owenowenisme May 28, 2025
7d0eae4
[Ray-operator] Feature flag login bash (#3679)
fscnick May 28, 2025
c88b174
[DOCS] Apiserver improve docs readability (#3564)
machichima May 28, 2025
f95ea65
[chore] Update user to kuberay instead of a contributor's name (#3706)
kevin85421 May 28, 2025
7080cc0
[refactor] Refactor enable login shell (#3704)
kevin85421 May 28, 2025
5769a65
Fix issue where unescaped semicolons caused task execution failures. …
xianlubird May 28, 2025
b8c4e5c
[Fix][Release] Fix KubeRay dahsboard image build pipeline (#3702)
MortalHappiness May 29, 2025
dcb97ce
Add Grafana Dashboard for KubeRay Operator (#3676)
win5923 May 29, 2025
75a63a5
[CI] Split Autoscaler e2e tests into 2 buildkite runners (#3715)
kevin85421 May 30, 2025
11c75ea
Add kuberay operator servicemonitor (#3717)
troychiu May 30, 2025
bc61ad9
[Feat][apiserver] Support CORS config (#3711)
MortalHappiness May 30, 2025
05b77e1
Bump next from 15.2.3 to 15.2.4 in /dashboard (#3709)
dependabot[bot] May 30, 2025
23d0195
[apiserver] Use ClusterIP instead of NodePort for KubeRay API server …
machichima Jun 1, 2025
d55dfc3
[Doc] add ray cluster uv sample yaml (#3720)
fscnick Jun 1, 2025
05f0cf3
[Test][Autoscaler] deflaky unexpected dead actors in tests by higher …
rueian Jun 1, 2025
37808c7
feat: upgrade to Ray 2.46.0 (#3547)
davidxia Jun 1, 2025
3e28b56
[doc] Update API server v1 doc (#3723)
kevin85421 Jun 1, 2025
7cc3548
[Chore] Upgrade Ray to 2.46.0 follow-up (#3722)
MortalHappiness Jun 1, 2025
764971f
[Test][Autoscaler] deflaky autoscaler idle timeout e2e tests by a lon…
rueian Jun 1, 2025
848d400
[Grafana] Update Grafana dashboard (#3726)
win5923 Jun 1, 2025
adcea8e
chore: run yamlft pre-commit hook (#3729)
davidxia Jun 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/setup-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ apt-get update
apt-get install -y python3.11 python3-pip

# Install requirements
pip install --break-system-packages ray[default]==2.41.0
pip install --break-system-packages ray[default]==2.46.0

# Bypass Git's ownership check due to unconventional user IDs in Docker containers
git config --global --add safe.directory /workdir
52 changes: 41 additions & 11 deletions .buildkite/test-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@
- bash ../.buildkite/build-start-operator.sh
- kubectl wait --timeout=90s --for=condition=Available=true deployment kuberay-operator
# Run e2e tests and print KubeRay operator logs if tests fail
- echo "--- START:Running e2e rayservice (nightly operator) tests"
- echo "--- START:Running e2e (nightly operator) tests"
- if [ -n "${KUBERAY_TEST_RAY_IMAGE}"]; then echo "Using Ray Image ${KUBERAY_TEST_RAY_IMAGE}"; fi
- set -o pipefail
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e 2>&1 | awk -f ../.buildkite/format.awk || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay && exit 1)
- echo "--- END:e2e rayservice (nightly operator) tests finished"
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1)
- echo "--- END:e2e (nightly operator) tests finished"

- label: 'Test E2E rayservice (nightly operator)'
instance_size: large
Expand All @@ -31,10 +33,12 @@
- echo "--- START:Running e2e rayservice (nightly operator) tests"
- if [ -n "${KUBERAY_TEST_RAY_IMAGE}"]; then echo "Using Ray Image ${KUBERAY_TEST_RAY_IMAGE}"; fi
- set -o pipefail
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2erayservice 2>&1 | awk -f ../.buildkite/format.awk || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay && exit 1)
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2erayservice 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-rayservice-log.tar -T - && exit 1)
- echo "--- END:e2e rayservice (nightly operator) tests finished"

- label: 'Test Autoscaler E2E (nightly operator)'
- label: 'Test Autoscaler E2E Part 1 (nightly operator)'
instance_size: large
image: golang:1.24
commands:
Expand All @@ -46,11 +50,33 @@
- bash ../.buildkite/build-start-operator.sh
- kubectl wait --timeout=90s --for=condition=Available=true deployment kuberay-operator
# Run e2e tests and print KubeRay operator logs if tests fail
- echo "--- START:Running Autoscaler e2e (nightly operator) tests"
- echo "--- START:Running Autoscaler E2E Part 1 (nightly operator) tests"
- if [ -n "${KUBERAY_TEST_RAY_IMAGE}"]; then echo "Using Ray Image ${KUBERAY_TEST_RAY_IMAGE}"; fi
- set -o pipefail
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2eautoscaler 2>&1 | awk -f ../.buildkite/format.awk || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay && exit 1)
- echo "--- END:Autoscaler e2e (nightly operator) tests finished"
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 60m -v ./test/e2eautoscaler/raycluster_autoscaler_test.go ./test/e2eautoscaler/support.go 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-autoscaler-log.tar -T - && exit 1)
- echo "--- END:Autoscaler E2E Part 1 (nightly operator) tests finished"

- label: 'Test Autoscaler E2E Part 2 (nightly operator)'
instance_size: large
image: golang:1.24
commands:
- source .buildkite/setup-env.sh
- kind create cluster --wait 900s --config ./ci/kind-config-buildkite.yml
- kubectl config set clusters.kind-kind.server https://docker:6443
# Build nightly KubeRay operator image
- pushd ray-operator
- bash ../.buildkite/build-start-operator.sh
- kubectl wait --timeout=90s --for=condition=Available=true deployment kuberay-operator
# Run e2e tests and print KubeRay operator logs if tests fail
- echo "--- START:Running Autoscaler E2E Part 2 (nightly operator) tests"
- if [ -n "${KUBERAY_TEST_RAY_IMAGE}"]; then echo "Using Ray Image ${KUBERAY_TEST_RAY_IMAGE}"; fi
- set -o pipefail
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 60m -v ./test/e2eautoscaler/raycluster_autoscaler_part2_test.go ./test/e2eautoscaler/support.go 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-autoscaler-log.tar -T - && exit 1)
- echo "--- END:Autoscaler E2E Part 2 (nightly operator) tests finished"

- label: 'Test E2E Operator Version Upgrade (v1.3.0)'
instance_size: large
Expand All @@ -66,15 +92,17 @@
- kubectl wait --timeout=90s --for=condition=Available=true deployment kuberay-operator
# Run e2e tests and print KubeRay operator logs if tests fail
- echo "--- START:Running e2e Operator upgrade (v1.2.2 to v1.3.0 operator) tests"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m KUBERAY_TEST_UPGRADE_IMAGE=v1.3.0 go test -timeout 30m -v ./test/e2eupgrade | awk -f ../.buildkite/format.awk || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay && exit 1)
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m KUBERAY_TEST_UPGRADE_IMAGE=v1.3.0 go test -timeout 30m -v ./test/e2eupgrade 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-upgrade-log.tar -T - && exit 1)
- echo "--- END:e2e Operator upgrade (v1.2.2 to v1.3.0 operator) tests finished"

- label: 'Test Apiserver E2E (nightly operator)'
instance_size: large
image: golang:1.24
commands:
- source .buildkite/setup-env.sh
- kind create cluster --wait 900s --config ./ci/kind-config-buildkite-e2e-apiserver.yml
- kind create cluster --wait 900s --config ./ci/kind-config-buildkite.yml
- kubectl config set clusters.kind-kind.server https://docker:6443
# Build nightly KubeRay operator image
- pushd ray-operator
Expand All @@ -87,5 +115,7 @@
# Run e2e tests and print KubeRay api server logs if tests fail
- echo "--- START:Running e2e apiserver (nightly operator) tests"
- set -o pipefail
- E2E_API_SERVER_URL="http://docker:31888" go test -parallel 4 -timeout 60m -v ./test/e2e/... 2>&1 | awk -f ../.buildkite/format.awk || (kubectl logs -l app.kubernetes.io/component=kuberay-apiserver --namespace ray-system > /artifact-mount/kuberay-apiserver-logs.txt && exit 1)
- mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp
- echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR"
- E2E_API_SERVER_URL="http://docker:31888" go test -parallel 4 -timeout 60m -v ./test/e2e/... 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs -l app.kubernetes.io/component=kuberay-apiserver --namespace ray-system | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-apiserver.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-apiserver-log.tar -T - && exit 1)
- echo "--- END:Apiserver e2e (nightly operator) tests finished"
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ body:
options:
- "ray-operator"
- "apiserver"
- "kubectl-plugin"
- "dashboard"
- "ci"
- "Others"
validations:
Expand Down
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ updates:
kubernetes:
patterns:
- "k8s.io/*"
- "sigs.k8s.io/*"
google-golang:
patterns:
- "google.golang.org/*"
github-dependencies:
patterns:
- "github.com/*"
all-dependencies: # for all other dependencies not listed above
patterns:
- "*"
6 changes: 3 additions & 3 deletions .github/workflows/consistency-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
uses: actions/setup-go@v3
with:
# Use the same go version with build job
go-version: v1.22
go-version: v1.24

- name: Check golang version
working-directory: ./ray-operator
Expand All @@ -51,7 +51,7 @@ jobs:
uses: actions/setup-go@v3
with:
# Use the same go version with build job
go-version: v1.22
go-version: v1.24

- name: Check golang version
working-directory: ./ray-operator
Expand All @@ -75,7 +75,7 @@ jobs:
uses: actions/setup-go@v3
with:
# Use the same go version with build job
go-version: v1.22
go-version: v1.24

- name: Update CRD/RBAC YAML files
working-directory: ./ray-operator
Expand Down
88 changes: 0 additions & 88 deletions .github/workflows/helm-lint.yaml

This file was deleted.

136 changes: 136 additions & 0 deletions .github/workflows/helm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
name: Helm

on:
push:
branches:
- master
- release-*
pull_request:
branches:
- master
- release-*

env:
CHART_DIR: helm-chart

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.actor }}
cancel-in-progress: true

jobs:
lint-test:
name: Lint and Test

runs-on: ubuntu-24.04

strategy:
matrix:
chart:
- kuberay-operator
- kuberay-apiserver
- ray-cluster

steps:
- name: Determine branch name
id: get_branch
run: |
BRANCH=""
if [ "${{ github.event_name }}" == "push" ]; then
BRANCH=${{ github.ref_name }}
elif [ "${{ github.event_name }}" == "pull_request" ]; then
BRANCH=${{ github.base_ref }}
fi
echo "BRANCH=$BRANCH" >> "$GITHUB_OUTPUT"

- name: Checkout Code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Helm
uses: azure/[email protected]
with:
version: v3.17.3

- uses: actions/[email protected]
with:
python-version: 3.13

- name: Install Helm unittest plugin
run: helm plugin install https://github.com/helm-unittest/helm-unittest.git --version 0.8.1

- name: Run Helm unittest
run: helm unittest ${{ env.CHART_DIR }}/${{ matrix.chart }} --file "tests/**/*_test.yaml" --strict --debug

- name: Set up chart-testing
uses: helm/[email protected]

- name: Run chart-testing (list-changed)
id: list-changed
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: |
changed=$(ct list-changed --target-branch $BRANCH --chart-dirs ${{ env.CHART_DIR }} | grep ${{ matrix.chart }} || true)
if [[ -n "$changed" ]]; then
echo "changed=true" >> "$GITHUB_OUTPUT"
fi

- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: |
# Run 'helm lint', version checking, YAML schema validation on 'Chart.yaml',
# YAML linting on 'Chart.yaml' and 'values.yaml', and maintainer.
# [Doc]: https://github.com/helm/chart-testing/blob/main/doc/ct_lint.md
ct lint --charts ${{ env.CHART_DIR }}/${{ matrix.chart }} --validate-maintainers=false

- name: Create kind cluster
if: steps.list-changed.outputs.changed == 'true'
uses: helm/[email protected]
with:
cluster_name: kind

- name: Build Docker image (kuberay-operator)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-operator'
run: |
cd ray-operator && make docker-image -e IMG=kuberay/operator:local

- name: Build Docker image (kuberay-apiserver)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-apiserver'
run: |
cd apiserver && make docker-image -e IMG=kuberay/apiserver:local

- name: Build Docker image (security-proxy)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-apiserver'
run: |
cd experimental && make docker-image -e IMG=kuberay/security-proxy:local

- name: Load image to kind cluster (kuberay-operator)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-operator'
run: |
kind load docker-image kuberay/operator:local

- name: Load image to kind cluster (kuberay-apiserver)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-apiserver'
run: |
kind load docker-image kuberay/apiserver:local

- name: Load image to kind cluster (security-proxy)
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'kuberay-apiserver'
run: |
kind load docker-image kuberay/security-proxy:local

- name: Install Custom Resource Definitions to kind cluster
if: steps.list-changed.outputs.changed == 'true' && matrix.chart == 'ray-cluster'
working-directory: helm-chart/kuberay-operator
run: kubectl create -f crds

- name: Run chart-testing
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: |
# Run 'helm install', 'helm test', and optionally 'helm upgrade' on specified charts.
# [Doc]: https://github.com/helm/chart-testing/blob/main/doc/ct_install.md
ct install --charts ${{ env.CHART_DIR}}/${{ matrix.chart }}
Loading