Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document CI cluster selection, CPU : RAM ratio / machine types, and general recommendations specific to prow.k8s.io #34139

Open
BenTheElder opened this issue Jan 13, 2025 · 0 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@BenTheElder
Copy link
Member

What would you like to be added:

We don't have a single place to point to regarding which cluster: you should use and why, and how much resources to use (and how to avoid pointlessly scheduling minuscule amounts of memory per CPU core, which ultimately costs us more when workloads prefer more CPU time to allocating and memory sits unused).

We should do this per-cluster and create a doc somewhere discoverable, perhaps under config/jobs.

We should also consider adding details like:

  • kubekins / CI image recommendations
    • docker in docker
  • Additional pointers to the hacks we have employed in the clusters (like pre-allocating loop devices, tuning sysctls ...).

Why is this needed:

So contributors can understand the Kubernetes specific CI environment and how to effectively schedule to it / write prow.k8s.io specific jobs.

/sig testing k8s-infra
@kubernetes/sig-k8s-infra-leads @kubernetes/sig-testing-leads


These are really not discoverable:

https://github.com/kubernetes/k8s.io/blob/86089ae44dd87d86fa1a2a651bb0d6f4ceb06270/infra/aws/terraform/prow-build-cluster/terraform.prod.tfvars#L39C32-L39C44

https://github.com/kubernetes/k8s.io/blob/86089ae44dd87d86fa1a2a651bb0d6f4ceb06270/infra/gcp/terraform/k8s-infra-prow-build/main.tf#L101)

Along with "what is the trusted cluster" etc.

We should also deprecate out the eks-job-migration doc and associated job report results, and we should consider how to balance scheduling to EKS/GKE more generally now that the budgets are similar and all the workloads are running in community accounts. (And also how to approach Azure with the much smaller budget ...)

@BenTheElder BenTheElder added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 13, 2025
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

2 participants