-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nftables kube-proxy blog post for 1.33 #49393
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
in Kubernetes 1.29. Currently in beta, it will officially be GA as of | ||
1.33. The new mode fixes long-standing performance problems with the | ||
iptables mode and is now recommended for most users running on systems | ||
with reasonably-recent kernels. (For compatibility reasons, even once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we offer a hint of what reasonable-recent means? 5.x?
Even with those optimizations, it can still be necessary to make use of | ||
kube-proxy's `minSyncPeriod` config option to ensure that it doesn't | ||
spend every waking second trying to push iptables updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
people not familiar with this option may not understand what this means, maybe one sentence introducing the concept of minSyncPeriod
will help, kube-proxy resyncs all the iptables rule every period defined by the configuraiton option
minsSyncPeriod, even with those optimizations, you may need to bump the minSyncPeriod ...
|
||
The nftables APIs allow for doing much more incremental updates, and | ||
when kube-proxy in nftables mode does an update, the size of the | ||
update is only **O(n)** in the number of _changed_ services and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a parenthesis or something to clarify this is about the number of "changed" services and the other on the total number of services ... __changed__ vs total
?
kube-proxy is using iptables or nftables.) | ||
|
||
Second, the nftables mode will not work on older Linux distributions; | ||
currently it requires a 5.13 or newer kernel. Additionally, because of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see , you have it here https://github.com/kubernetes/website/pull/49393/files#r1913406303
Finally, kube-proxy in nftables mode is intentionally not 100% | ||
compatible with kube-proxy in iptables mode, because some of iptables | ||
mode's legacy behaviors are inherently less efficient or less secure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would phrase it differently as this reads negative, and you explain this always very well as "since iptables-mode was GA for a long time, there are certain behaviors we wanted to change but we couldnt because they will break compatibility, however, nftables mode allowed us to make this modifications to improve this behaviors at the cost of breaking compatibility between different proxy modes"
it is not the _default_, and we do not yet have a plan for changing | ||
that. We will continue to support the iptables mode for a long time. | ||
|
||
The future of the IPVS mode of kube-proxy is less certain: its main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should start to create awareness
LGTM only cosmetic comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice summary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe that's just me, but "5000 service cluster" and "10000 service cluster" sounds like they are different clusters. Do you think just "5000 services" and "10000 services" will be more precise?
the two graphs, but you may have to squint to see the nftables | ||
results!: | ||
|
||
![kube-proxy iptables-vs-nftables first packet latency, at various percentiles, in clusters of various sizes](iptables-vs-nftables.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this link should be .svg
?
result, kube-proxy's nftables updates can be done much more | ||
efficiently than with iptables. | ||
|
||
(Unfortunately I don't have cool graphs for this part.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was one comparison I've done for programming time https://docs.google.com/presentation/d/1WMNZApX8HDHbi7ZnUFuy86UUYov22mBj/edit#slide=id.g30c6bb99704_1_0
not much data, but maybe still useful?
|
||
## Future Plans | ||
|
||
As mentioned above, while nftables is now the _best_ kube-proxy mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to fix or at least get some fresh data on kube-proxy restart time (which is where it may not be the best yet). I am going to do that before 1.33, so maybe we can update this post later with the fresh data, or ignore this part completely?
performance as IPVS mode (actually, slightly better), without any of | ||
the downsides: | ||
|
||
![kube-proxy ipvs-vs-nftables first packet latency, at various percentiles, in clusters of various sizes](ipvs-vs-nftables.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.svg
too?
Blog post about kube-proxy nftables mode, to celebrate it becoming GA in 1.33.
This can be published at any point before or after the 1.33 release. (The code is entirely usable and stable in beta in 1.32 so we don't mind calling attention to it early.)
cc @aojea @npinaeva