Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Remove the requirement for VAP #3908

Merged

Conversation

varshaprasad96
Copy link
Member

@varshaprasad96 varshaprasad96 commented Dec 23, 2024

What type of PR is this?

Fixes: #3496

What this PR does / why we need it:

VAP is a default admission plugin enabled while starting an API server for visibility (ref). The Kueue
controller has additional permissions to watch those GVKs even though it is not required. Disabling the plugin from api server helps in keeping it minimal and maintaining compatibility with previous versions of K8s.

For details:
By default, the admission server enables the default set of admission plugins while starting an API server in K8s: https://github.com/kubernetes/kubernetes/blob/e85c72d4177fba224cb1baa1b5abfb5980e6d867/pkg/kubeapiserver/options/admission.go#L59. However, all the plugins are not necessary, as the visibility server is publishing metrics though read-only APIs, rather than modifying/validating the request.

Does this PR introduce a user-facing change?

Disable the unnecessary Validating Admission Policy for the visibility server, and drop the associated RBAC permissions to make the server minimal. This also prevents periodic error logging on clusters above Kubernetes 1.29+.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 23, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 23, 2024
Copy link

netlify bot commented Dec 23, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 81ec4ac
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/676bf3867239050008877bf9

@@ -77,7 +78,7 @@ func applyVisibilityServerOptions(config *genericapiserver.RecommendedConfig) er
o.SecureServing.BindPort = 8082
// The directory where TLS certs will be created
o.SecureServing.ServerCert.CertDirectory = "/tmp"

o.Admission.DisablePlugins = []string{validatingadmissionpolicy.PluginName}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the visibility server can be minimal and any other plugins which are not required can be disabled. I believe the only plugin we would need would be NamespaceLifecycle, ServiceAccount, ResourceQuota.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. What are your motivations for trying to get rid of only the VAP plugin?
What if when we enable only NamespaceLifecycle, ServiceAccount, and RersourceQuota?

Additionally, why do we need ResourceQuota plugin? IIUC, the visibility server doesn't touch ResourceQuta everywhere.

Copy link
Member Author

@varshaprasad96 varshaprasad96 Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your motivations for trying to get rid of only the VAP plugin?

There are 2 benefits to this:

  1. Latest releases of Kueue would become compatible with previous versions of K8s - (like 1.29) where VAP is not enabled by default in the admission plugin.
  2. Currently Kueue has CRUD perms for VAP API which is not required by the controller at all. So this would reduce unnecessary privileges being given to the controller.

Additionally, why do we need ResourceQuota plugin? IIUC, the visibility server doesn't touch ResourceQuta everywhere.

Fair. Ideally it would be good to spawn the visibility server with just what's required rather than enabling all the plugins by default. If ResourceQuota isn't required we can remove it from the list too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Makes sense to me to enable just the features we are using.

Copy link
Member

@tenzen-y tenzen-y Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Ideally it would be good to spawn the visibility server with just what's required rather than enabling all the plugins by default. If ResourceQuota isn't required we can remove it from the list too.

@varshaprasad96 That makes sense. Could you open an issue for that?

@@ -38,8 +38,6 @@ const (
// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch;update
// +kubebuilder:rbac:groups="admissionregistration.k8s.io",resources=mutatingwebhookconfigurations,verbs=get;list;watch;update
// +kubebuilder:rbac:groups="admissionregistration.k8s.io",resources=validatingwebhookconfigurations,verbs=get;list;watch;update
// +kubebuilder:rbac:groups="admissionregistration.k8s.io",resources=validatingadmissionpolicies,verbs=get;list;watch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why these roles were put in cert rotations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to give Kueue controller permissions when errors were coming with k8s 1.31. Turns out the root cause for it is from the API server rather than cert controller (ref).

@kannon92
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 23, 2024
@varshaprasad96 varshaprasad96 force-pushed the visibility-server-minimal branch from aff35fd to 26a6b55 Compare December 24, 2024 13:15
VAP is a default admission plugin enabled while
starting an API server for visibility. The Kueue
controller has additional permissions to watch those
GVKs even though it is not required. Disabling the plugin
from api server helps in keeping it minimal and maintaining
compatibility with previous versions of K8s.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
@dgrove-oss
Copy link
Contributor

This should fix #3496

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

/lgtm
/approve
/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 31, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 31, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3093dbda981fd855113fcaaa556f831f49e7c53e

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y, varshaprasad96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 31, 2024
@tenzen-y
Copy link
Member

@mimowo We might be able to consider this PR as a bug fix and cherry-pick this to the release-0.10 branch since this PR tries to remove the unnecessary permissions.

WDYT?

@k8s-ci-robot k8s-ci-robot merged commit 869d003 into kubernetes-sigs:main Dec 31, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Dec 31, 2024
@varshaprasad96
Copy link
Member Author

@tenzen-y could we back port this to v0.10.x and v0.9.x too? Both the versions would have this bug.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 3, 2025

@tenzen-y could we back port this to v0.10.x and v0.9.x too? Both the versions would have this bug.

Before we backport this to older versions, as I mentioned #3908 (comment), let me have an agreement with Michal.

@mimowo
Copy link
Contributor

mimowo commented Jan 7, 2025

Before we backport this to older versions, as I mentioned #3908 (comment), let me have an agreement with Michal.

Thank you for investigating the issue, I'm good to cherry-pick this back to 0.9 and 0.10 as a bugfix.

@mimowo
Copy link
Contributor

mimowo commented Jan 7, 2025

/kind bug

IIUC it also fixes #3496.

@varshaprasad96 please add a release note, and add a note "Fixes #3496" to indicate that. As for the release note maybe:
`Disable the unnecessary Validating Admission Policy for the visibility server, and drop the associated RBAC permissions to
make the server minimal. This also prevents periodic error logging on Kubernetes 1.29.".

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 7, 2025
@kannon92
Copy link
Contributor

kannon92 commented Jan 7, 2025

/cherry-pick release-0.10

@k8s-infra-cherrypick-robot
Copy link
Contributor

@kannon92: new pull request created: #3940

In response to this:

/cherry-pick release-0.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Jan 8, 2025

@kannon92 any particular reason to close the cherry-pick PR?

@varshaprasad96
Copy link
Member Author

@mimowo Done, thank you!

@tenzen-y
Copy link
Member

tenzen-y commented Jan 8, 2025

/cherry-pick release-0.9
/cherry-pick release-0.10

@k8s-infra-cherrypick-robot
Copy link
Contributor

@tenzen-y: new pull request created: #3946

In response to this:

/cherry-pick release-0.9
/cherry-pick release-0.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Contributor

@tenzen-y: #3908 failed to apply on top of branch "release-0.9":

Applying: Remove the requirement for VAP
Using index info to reconstruct a base tree...
M	charts/kueue/templates/rbac/role.yaml
M	config/components/rbac/role.yaml
M	pkg/visibility/server.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/visibility/server.go
Auto-merging config/components/rbac/role.yaml
CONFLICT (content): Merge conflict in config/components/rbac/role.yaml
Auto-merging charts/kueue/templates/rbac/role.yaml
CONFLICT (content): Merge conflict in charts/kueue/templates/rbac/role.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Remove the requirement for VAP

In response to this:

/cherry-pick release-0.9
/cherry-pick release-0.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 8, 2025

@varshaprasad96 Could you manually create a cherry pick PR against release-0.9 branch?

ChristianZaccaria pushed a commit to opendatahub-io/kueue that referenced this pull request Jan 10, 2025
VAP is a default admission plugin enabled while
starting an API server for visibility. The Kueue
controller has additional permissions to watch those
GVKs even though it is not required. Disabling the plugin
from api server helps in keeping it minimal and maintaining
compatibility with previous versions of K8s.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
ChristianZaccaria pushed a commit to opendatahub-io/kueue that referenced this pull request Jan 13, 2025
VAP is a default admission plugin enabled while
starting an API server for visibility. The Kueue
controller has additional permissions to watch those
GVKs even though it is not required. Disabling the plugin
from api server helps in keeping it minimal and maintaining
compatibility with previous versions of K8s.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
ChristianZaccaria pushed a commit to opendatahub-io/kueue that referenced this pull request Jan 13, 2025
VAP is a default admission plugin enabled while
starting an API server for visibility. The Kueue
controller has additional permissions to watch those
GVKs even though it is not required. Disabling the plugin
from api server helps in keeping it minimal and maintaining
compatibility with previous versions of K8s.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
@mimowo
Copy link
Contributor

mimowo commented Jan 15, 2025

@varshaprasad96 @ChristianZaccaria I see some commits on your branch, are there some unforeseen complications for cherry-picking to 0.9? Asking cause I would like to release 0.10.1 along with 0.9.2 this week.

ChristianZaccaria pushed a commit to opendatahub-io/kueue that referenced this pull request Jan 15, 2025
VAP is a default admission plugin enabled while
starting an API server for visibility. The Kueue
controller has additional permissions to watch those
GVKs even though it is not required. Disabling the plugin
from api server helps in keeping it minimal and maintaining
compatibility with previous versions of K8s.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
@varshaprasad96
Copy link
Member Author

@tenzen-y Sorry, missed your message last week. I can create a cherry-pick PR now.

@mimowo Are you referring to these? Backporting to v0.9.x shouldn't be an issue afaik. We cherry-picked this commit and back ported it to 0.10.x in downstream because of internal release deadlines and were planning to use 0.10 directly instead of 0.9.

@mimowo
Copy link
Contributor

mimowo commented Jan 15, 2025

Thank you! Cool, yeah I was referring to the commits as they are referenced by GH, wasn't sure why they are displayed.

@varshaprasad96
Copy link
Member Author

#3977 -> cherry pick of this fix in 0.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repeated errors in kueue-manager log when Kueue 0.9 is installed on Kubernetes 1.29
7 participants