CA DRA: support DaemonSet/static pods using DRA #7684
Labels
area/cluster-autoscaler
area/core-autoscaler
Denotes an issue that is related to the core autoscaler and is not specific to any provider.
kind/feature
Categorizes issue or PR as related to a new feature.
wg/device-management
Categorizes an issue or PR as relevant to WG Device Management.
Which component are you using?:
/area cluster-autoscaler
/area core-autoscaler
/wg device-management
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
To ensure correct scheduling simulations, Cluster Autoscaler has to predict the exact DaemonSet/static pods that will be present on a new Node created from a given NodeGroup. This can be done in 2 different major ways:
When the sanitized Node/Pods use DRA, we have to sanitize the relevant DRA objects as well.
Sanitizing/duplicating ResourceSlices is easy. When we duplicate a Node, we also duplicate all ResourceSlices that are local to that Node (on the assumption that creating a new Node will create new Node-local ResourceSlices). When sanitizing the slices, we change the names of the listed Device Pools - (Pool names have to be unique within a DRA driver, each Node has dedicated Pools for its local Devices) in addition to changing the usual Name, UID, NodeName.
It's not immediately clear how to sanitize DS/static Pods that reference ResourceClaims. For the DRA autoscaling MVP, we went with the following logic:
ReservedFor
field of the shared claim when the fresh NodeInfo is added to the ClusterSnapshot (the claim is stored indynamicresources.Snapshot
inside ClusterSnapshot).dynamicresources.Snapshot
.The logic described above has at least the following caveats:
CloudProvider.TemplateNodeInfo()
has to be called, all ResourceClaim allocations in the returned NodeInfo have to be provided by CloudProvider. The CloudProvider has to run scheduler predicates internally to obtain the allocations, or have them precomputed somehow. Precomputing will probably be impossible/cumbersome for more complex cases.We should figure out if these limitations are important for DRA use-cases in practice. If they are, we need to remove them. If they aren't, we could start validating them.
Describe the solution you'd like.:
CloudProvider.TemplateNodeInfo()
easier by running scheduler predicates for Pods referencing claims after obtaining the NodeInfo from CloudProvider. ThenCloudProvider.TemplateNodeInfo()
could just return unallocated claims which should be straightforward.Additional context.:
This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. An MVP of the support was implemented in #7530 (with the whole implementation tracked in kubernetes/kubernetes#118612). There are a number of post-MVP follow-ups to be addressed before DRA autoscaling is ready for production use - this is one of them.
The text was updated successfully, but these errors were encountered: