Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nftables kube-proxy blog post for 1.33 #49393

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

Blog post about kube-proxy nftables mode, to celebrate it becoming GA in 1.33.

This can be published at any point before or after the 1.33 release. (The code is entirely usable and stable in beta in 1.32 so we don't mind calling attention to it early.)

cc @aojea @npinaeva

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sftim for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Jan 11, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2025
Copy link

netlify bot commented Jan 11, 2025

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 6de931a
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/6782c6a61f70ed00080b0e1c
😎 Deploy Preview https://deploy-preview-49393--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

in Kubernetes 1.29. Currently in beta, it will officially be GA as of
1.33. The new mode fixes long-standing performance problems with the
iptables mode and is now recommended for most users running on systems
with reasonably-recent kernels. (For compatibility reasons, even once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we offer a hint of what reasonable-recent means? 5.x?

Comment on lines +116 to +118
Even with those optimizations, it can still be necessary to make use of
kube-proxy's `minSyncPeriod` config option to ensure that it doesn't
spend every waking second trying to push iptables updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people not familiar with this option may not understand what this means, maybe one sentence introducing the concept of minSyncPeriod will help, kube-proxy resyncs all the iptables rule every period defined by the configuraiton option minsSyncPeriod, even with those optimizations, you may need to bump the minSyncPeriod ...


The nftables APIs allow for doing much more incremental updates, and
when kube-proxy in nftables mode does an update, the size of the
update is only **O(n)** in the number of _changed_ services and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a parenthesis or something to clarify this is about the number of "changed" services and the other on the total number of services ... __changed__ vs total ?

kube-proxy is using iptables or nftables.)

Second, the nftables mode will not work on older Linux distributions;
currently it requires a 5.13 or newer kernel. Additionally, because of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +154 to +156
Finally, kube-proxy in nftables mode is intentionally not 100%
compatible with kube-proxy in iptables mode, because some of iptables
mode's legacy behaviors are inherently less efficient or less secure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase it differently as this reads negative, and you explain this always very well as "since iptables-mode was GA for a long time, there are certain behaviors we wanted to change but we couldnt because they will break compatibility, however, nftables mode allowed us to make this modifications to improve this behaviors at the cost of breaking compatibility between different proxy modes"

it is not the _default_, and we do not yet have a plan for changing
that. We will continue to support the iptables mode for a long time.

The future of the IPVS mode of kube-proxy is less certain: its main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should start to create awareness

@aojea
Copy link
Member

aojea commented Jan 13, 2025

LGTM only cosmetic comments

Copy link
Member

@npinaeva npinaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice summary!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe that's just me, but "5000 service cluster" and "10000 service cluster" sounds like they are different clusters. Do you think just "5000 services" and "10000 services" will be more precise?

the two graphs, but you may have to squint to see the nftables
results!:

![kube-proxy iptables-vs-nftables first packet latency, at various percentiles, in clusters of various sizes](iptables-vs-nftables.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this link should be .svg?

result, kube-proxy's nftables updates can be done much more
efficiently than with iptables.

(Unfortunately I don't have cool graphs for this part.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was one comparison I've done for programming time https://docs.google.com/presentation/d/1WMNZApX8HDHbi7ZnUFuy86UUYov22mBj/edit#slide=id.g30c6bb99704_1_0
not much data, but maybe still useful?


## Future Plans

As mentioned above, while nftables is now the _best_ kube-proxy mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to fix or at least get some fresh data on kube-proxy restart time (which is where it may not be the best yet). I am going to do that before 1.33, so maybe we can update this post later with the fresh data, or ignore this part completely?

performance as IPVS mode (actually, slightly better), without any of
the downsides:

![kube-proxy ipvs-vs-nftables first packet latency, at various percentiles, in clusters of various sizes](ipvs-vs-nftables.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.svg too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants