You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In obi, it is not enough to use index and label to point to the same pod or node. When a deploy or node is updated, the same index may point to a different pod or node.
The lack of a resource or field to meet the need to monitor all nodes or all pods under a deploy is a requirement often used in schedulers
OBIG example:
apiVersion: arbiter.k8s.com.cn/v1alpha1kind: ObservabilityIndicantGroupmetadata:
name: metric-server-node-cpuspec:
obiHistoryLimit: 10# How many additional instances of expired obi to keep, 0 means coincide with the actual resource updates, no expired obi to keep# Same as obi below, with only 2 minor differences:# 1. no `spec.targetRef.index`# 2. `spec.targetRef.kind` only support `Node` and `Deploy` now.metric:
historyLimit: 1metricIntervalSeconds: 15metrics:
cpu:
aggregations:
- timedescription: ""query: ""unit: 'm'timeRangeSeconds: 3600source: metric-servertargetRef:
group: ""kind: Nodelabels:
"data-test": "data-test"name: ""namespace: ""version: v1
OBIG running logic:
The logic for running obig is as follows:
When an obig is created to monitor nodes, the observer queries what all the current nodes are, creates an obi for each node (obi spec.targetRef.name will be node name), adds a new obi to monitor the new node when there is a new node, and stops the obi update for the old node when there is a deletion and deletes it after triggering the history length limit spec.obiHistoryLimit.
monitor the obig of deploy in the same way. create obis for each pod (obi spec.targetRef.name will be pod name).
How arbiter-scheduler to use OBIG:
This can be used for the actual resource scheduling of the scheduler:
When the arbiter-scheduler is created. The arbiter-scheduler will automatically create one obig to monitor the actual resource usage of the node's cpu and memory.
The scheduler will only be responsible for creating this obig, the obi creation and update will be done by the observer, and if this obig is already created, the obig already created by user will be used.
The scheduler will only use the obi created by this obig. node.metric.cpu and node.metric.mem (by looking at the obig's ownerReferences) to update the metric that JS will use in the scheduler's Score CRD.
The text was updated successfully, but these errors were encountered:
In obi, it is not enough to use index and label to point to the same pod or node. When a deploy or node is updated, the same index may point to a different pod or node.
The lack of a resource or field to meet the need to monitor all nodes or all pods under a deploy is a requirement often used in schedulers
We probably need a OBIG for latter phase, but for now, I think we can make it simple using the approach below:
For pod, we don't need to use metrics from a specified pod, as it maybe recreated anytime, and the restarted one will be probably have different metrics as the one before. So scheduling should not depend on the metrics from specified pod. In most cases, it should be from the service level for more general metrics, or we can use the label/annotation for resource trait.
For node , the name should be more stable than pod, and we can let user create separate OBI for each node manually for now.
So before we have more real user cases, we can keep it simple and make a better decision later.
Current Limitations:
OBIG example:
OBIG running logic:
The logic for running obig is as follows:
spec.targetRef.name
will be node name), adds a new obi to monitor the new node when there is a new node, and stops the obi update for the old node when there is a deletion and deletes it after triggering the history length limitspec.obiHistoryLimit
.spec.targetRef.name
will be pod name).How arbiter-scheduler to use OBIG:
This can be used for the actual resource scheduling of the scheduler:
node.metric.cpu
andnode.metric.mem
(by looking at the obig's ownerReferences) to update the metric thatJS
will use in the scheduler'sScore
CRD.The text was updated successfully, but these errors were encountered: