Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2E test for cluster class #2235

Merged
merged 1 commit into from
Jun 20, 2022

Conversation

shysank
Copy link
Contributor

@shysank shysank commented Apr 15, 2022

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
Adds e2e test for clusterclass. The e2e test works as follows:

  1. Create a cluster class: ci-default
  2. Create a cluster that uses ci-default as topology.

To do (1), I've created a new flavor called topology, and use the generated template to create the cluster class. This is not ideal, but it helps to get started with the existing capi test framework.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2234

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 15, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2022
@shysank
Copy link
Contributor Author

shysank commented Apr 15, 2022

/test pull-cluster-api-provider-azure-e2e

@shysank shysank changed the title [WIP] Add E2E test for cluster class Add E2E test for cluster class Apr 18, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2022
@shysank
Copy link
Contributor Author

shysank commented Apr 18, 2022

/test ls

@k8s-ci-robot
Copy link
Contributor

@shysank: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-coverage
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-upstream-windows-dockershim
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-coverage
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test ls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shysank
Copy link
Contributor Author

shysank commented Apr 18, 2022

/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-exp
/test pull-cluster-api-provider-azure-e2e-optional
/test pull-cluster-api-provider-azure-capi-e2e
/test pull-cluster-api-provider-azure-e2e-workload-upgrade

@shysank
Copy link
Contributor Author

shysank commented Apr 18, 2022

/test pull-cluster-api-provider-azure-e2e

@shysank
Copy link
Contributor Author

shysank commented Apr 27, 2022

/test pull-cluster-api-provider-azure-ci-entrypoint
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Apr 27, 2022

/test pull-cluster-api-provider-azure-e2e

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for including windows in the initial implementation 👍

templates/test/ci/prow-topology/base.yaml Outdated Show resolved Hide resolved
templates/test/ci/prow-topology/base.yaml Outdated Show resolved Hide resolved
@shysank
Copy link
Contributor Author

shysank commented Apr 29, 2022

@CecileRobertMichon @mboersma ptal, whenever you get a chance.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machineHealthCheck:
maxUnhealthy: 100%
unhealthyConditions:
- type: E2ENodeUnhealthy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this value come from? I am looking into adding a health check for windows on the other e2e test but can not determine if this health condition is valid. I don't find this type in capz/capi repositories and the docs give examples using Ready on the machines: https://cluster-api.sigs.k8s.io/tasks/healthcheck.html#creating-a-machinehealthcheck

Using Ready on the machines makes sense but I wasn't sure if I was missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it from the existing templates, but not sure how it works though.

test/e2e/azure_test.go Outdated Show resolved Hide resolved
@shysank
Copy link
Contributor Author

shysank commented May 26, 2022

/test pull-cluster-api-provider-azure-e2e-optional

@shysank
Copy link
Contributor Author

shysank commented May 26, 2022

/test pull-cluster-api-provider-azure-e2e

@shysank
Copy link
Contributor Author

shysank commented May 26, 2022

@CecileRobertMichon I think I've addressed all the comments. PTAL, whenever you get a chance.

identityRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureClusterIdentity
location: replace_me
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, is "replace_me" an arbitrary string or is it some agreed upon contract/pattern for required variables in clusterclass?

Copy link
Member

@sbueringer sbueringer May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunately just a workaround for fields that have to be set for the validating webhook and are then later overwritten per cluster.

I think this workaround kind of becomes a pattern right now, but there is no standardization (yet).

Not sure how we can improve this, maybe it's an option that these fields should be optional when the resource is referenced in a ClusterClass. But not sure how feasible this is.

(as far as I'm aware CAPA is using a similar but not the same string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAPA is using REPLACEME. I don't particularly have a preference but I vaguely remember using replace_me being used in other places where a similar mechanism was involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inferring from reading this history that empty string val doesn't suffice?

Copy link
Member

@sbueringer sbueringer Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik the validation webhook will complain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a capi issue tracking how we might improve this? IMO this is a fairly significant bit of UX friction for users to deal with, we should do this work on behalf of them (happy to help).

Everything else here actually looks fine to me and test signal is green, I'm going to mark this as /lgtm.

Thanks so much @shysank, really happy to have ClusterClass coverage in capz!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, we probably have some guidance around cluster-specific fields and that they shouldn't be added to cluster templates (which was in general nicely done by CAPZ). But I"m not sure if that falls under the "cluster-specific" field category.

I think it makes sense to open an issue in CAPI and see what we have / need.

We can discuss documentation improvements or potential implementation changes to mitigate this issue (e.g. maybe some fields shouldn't be mandatory on a cluster template / machine template when it's used as template in a ClusterClass).

@CecileRobertMichon
Copy link
Contributor

let's squash commits?

@shysank shysank force-pushed the cc_templates branch 2 times, most recently from 1766e20 to 238fcf5 Compare May 27, 2022 15:55
@shysank
Copy link
Contributor Author

shysank commented May 27, 2022

/test pull-cluster-api-provider-azure-ci-entrypoint

@shysank
Copy link
Contributor Author

shysank commented May 27, 2022

/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2022
@CecileRobertMichon
Copy link
Contributor

@shysank will you be able to rebase this or would you like someone else to take over?

@jackfrancis
Copy link
Contributor

/assign

Happy to inherit this by default if @shysank no longer has cycles

@sbueringer
Copy link
Member

@shysank will you be able to rebase this or would you like someone else to take over?

cc @sonasingh46 (just fyi)

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2022
@shysank
Copy link
Contributor Author

shysank commented Jun 9, 2022

@CecileRobertMichon I've rebased it with main.
@jackfrancis @sonasingh46 feel free to take over, I've sent you invite to access my fork if that helps.

@CecileRobertMichon
Copy link
Contributor

/milestone v1.4

@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jun 9, 2022
@jackfrancis
Copy link
Contributor

jackfrancis commented Jun 20, 2022

Got a passing test locally after rebasing this PR.

$ GINKGO_FOCUS="Creating clusters using clusterclass" GINKGO_SKIP="" SKIP_CLEANUP="true" ./scripts/ci-e2e.sh
...
[2] PASS
[1] clusterclass.cluster.x-k8s.io/ci-default created
[1] kubeadmcontrolplanetemplate.controlplane.cluster.x-k8s.io/ci-default-kubeadm-control-plane created
[1] azureclustertemplate.infrastructure.cluster.x-k8s.io/ci-default-azure-cluster created
[1] azuremachinetemplate.infrastructure.cluster.x-k8s.io/ci-default-control-plane created
[1] kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/ci-default-worker created
[1] azuremachinetemplate.infrastructure.cluster.x-k8s.io/ci-default-worker created
[1] kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/ci-default-worker-win created
[1] azuremachinetemplate.infrastructure.cluster.x-k8s.io/ci-default-worker-win created
[1] azureclusteridentity.infrastructure.cluster.x-k8s.io/cluster-identity created
[1] cluster.cluster.x-k8s.io/capz-e2e-vbsu0e-cc created
[1] clusterresourceset.addons.cluster.x-k8s.io/capz-e2e-vbsu0e-cc-calico created
[1] clusterresourceset.addons.cluster.x-k8s.io/csi-proxy created
[1] configmap/cni-capz-e2e-vbsu0e-cc-calico created
[1] configmap/csi-proxy-addon created
[1] 
[1] INFO: Waiting for the cluster infrastructure to be provisioned
[1] STEP: Waiting for cluster to enter the provisioned phase
[1] INFO: Waiting for control plane to be initialized
[1] INFO: Waiting for the first control plane machine managed by capz-e2e-vbsu0e/capz-e2e-vbsu0e-cc-9qc88 to be provisioned
[1] STEP: Waiting for one control plane node to exist
[1] INFO: Waiting for control plane to be ready
[1] INFO: Waiting for control plane capz-e2e-vbsu0e/capz-e2e-vbsu0e-cc-9qc88 to be ready (implies underlying nodes to be ready as well)
[1] STEP: Waiting for the control plane to be ready
[1] INFO: Waiting for the machine deployments to be provisioned
[1] STEP: Waiting for the workload nodes to exist
[1] STEP: Waiting for the workload nodes to exist
[1] INFO: Waiting for the machine pools to be provisioned
...
[1] PASS

Ginkgo ran 1 suite in 21m48.582663228s
Test Suite Passed
...
$ k get clusterclasses -A
NAMESPACE         NAME         AGE
capz-e2e-vbsu0e   ci-default   25m
$ k get nodes -o wide
NAME                                           STATUS   ROLES                  AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION      CONTAINER-RUNTIME
capz-e2e-ggf2v                                 Ready    <none>                 6m17s   v1.22.9   10.1.0.5      <none>        Windows Server 2019 Datacenter   10.0.17763.2803     containerd://1.6.1
capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   Ready    control-plane,master   9m14s   v1.22.9   10.0.0.4      <none>        Ubuntu 20.04.4 LTS               5.13.0-1022-azure   containerd://1.6.1
capz-e2e-vbsu0e-cc-md-0-infra-nsdgd-gj7c2      Ready    <none>                 7m40s   v1.22.9   10.1.0.4      <none>        Ubuntu 20.04.4 LTS               5.13.0-1022-azure   containerd://1.6.1
$ k get pods -A -o wide
NAMESPACE     NAME                                                                   READY   STATUS    RESTARTS      AGE   IP                NODE                                           NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-969cf87c4-d66hn                                1/1     Running   0             19m   192.168.149.130   capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   calico-node-nw9vg                                                      1/1     Running   0             17m   10.1.0.4          capz-e2e-vbsu0e-cc-md-0-infra-nsdgd-gj7c2      <none>           <none>
kube-system   calico-node-ts25d                                                      1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   calico-node-windows-gppfw                                              2/2     Running   1 (14m ago)   16m   10.1.0.5          capz-e2e-ggf2v                                 <none>           <none>
kube-system   coredns-78fcd69978-77tln                                               1/1     Running   0             19m   192.168.149.129   capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   coredns-78fcd69978-qj69s                                               1/1     Running   0             19m   192.168.149.131   capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   csi-proxy-686k5                                                        1/1     Running   0             15m   10.1.0.5          capz-e2e-ggf2v                                 <none>           <none>
kube-system   etcd-capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7                      1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   kube-apiserver-capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7            1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   kube-controller-manager-capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   kube-proxy-jdc56                                                       1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>
kube-system   kube-proxy-lq2mf                                                       1/1     Running   0             17m   10.1.0.4          capz-e2e-vbsu0e-cc-md-0-infra-nsdgd-gj7c2      <none>           <none>
kube-system   kube-proxy-windows-kpqch                                               1/1     Running   0             16m   10.1.0.5          capz-e2e-ggf2v                                 <none>           <none>
kube-system   kube-scheduler-capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7            1/1     Running   0             19m   10.0.0.4          capz-e2e-vbsu0e-cc-control-plane-gngsg-67cd7   <none>           <none>

I think I see some things that can be cleaned up, will do so now. But this looks generally good to go.

cc @CecileRobertMichon

@jackfrancis
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-optional

Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2022
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for validating @jackfrancis

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2022
@k8s-ci-robot k8s-ci-robot merged commit 54749d7 into kubernetes-sigs:main Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate default template
7 participants