Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy Server: reduce memory usage #528

Closed
flavio opened this issue Jul 21, 2023 · 13 comments
Closed

Policy Server: reduce memory usage #528

flavio opened this issue Jul 21, 2023 · 13 comments

Comments

@flavio
Copy link
Member

flavio commented Jul 21, 2023

While setting up Kubewarden at a demo booth of SUSECon we noticed that policy-server was consuming more memory compared to the other workloads defined inside of the cluster.

This lead to some quick investigation that resulted in this fix. We however decided to schedule more time to look into other improvements.

@flavio flavio transferred this issue from kubewarden/policy-server Sep 18, 2023
@flavio flavio changed the title Scalability performance improvements of policy-server Policy Server: reduce memory usage Sep 20, 2023
@flavio
Copy link
Member Author

flavio commented Sep 20, 2023

I've started looking into this issue during the last days.

TLDR: I think we can close this issue because we do not have high memory usage. There are some changes with regards to our architecture that we could do, but these would reduce the request per seconds that can be processed by a single instance of Policy Server.

The environment

I've ran the tests against Kubewarden 1.7.0-rc3. The tests have been done running a Policy Server with replicaSize 1.

Everything has been installed from our helm charts. I've enabled the recommended policies of the kubewarden-default chart, plus I've added the ones used by our load-testing suite.

It's important to highlight I've not loaded any of the new "vanilla WASI" policies introduced by 1.7.0. The numbers shown inside of the memory reduction PR were skewed by the usage of the go-wasi-template policy. This is a policy built with the official Go compiler. It's huge (~20 Mb) and provides a distorted picture of our memory usage, see this comment I've just added on the closed PR.

To recap, I've been testing a Policy Server hosting 10 policies:

Contents of policies.yml
clusterwide-do-not-run-as-root:
  namespacedName:
    Namespace: ''
    Name: do-not-run-as-root
  url: ghcr.io/kubewarden/policies/user-group-psp:v0.4.9
  policyMode: monitor
  allowedToMutate: true
  settings:
    run_as_group:
      rule: RunAsAny
    run_as_user:
      rule: MustRunAsNonRoot
    supplemental_groups:
      rule: RunAsAny
clusterwide-do-not-share-host-paths:
  namespacedName:
    Namespace: ''
    Name: do-not-share-host-paths
  url: ghcr.io/kubewarden/policies/hostpaths-psp:v0.1.9
  policyMode: monitor
  allowedToMutate: false
  settings:
    allowedHostPaths:
    - pathPrefix: "/tmp"
      readOnly: true
clusterwide-drop-capabilities:
  namespacedName:
    Namespace: ''
    Name: drop-capabilities
  url: ghcr.io/kubewarden/policies/capabilities-psp:v0.1.13
  policyMode: monitor
  allowedToMutate: true
  settings:
    required_drop_capabilities:
    - ALL
clusterwide-no-host-namespace-sharing:
  namespacedName:
    Namespace: ''
    Name: no-host-namespace-sharing
  url: ghcr.io/kubewarden/policies/host-namespaces-psp:v0.1.6
  policyMode: monitor
  allowedToMutate: false
  settings:
    allow_host_ipc: false
    allow_host_network: false
    allow_host_pid: false
clusterwide-no-privilege-escalation:
  namespacedName:
    Namespace: ''
    Name: no-privilege-escalation
  url: ghcr.io/kubewarden/policies/allow-privilege-escalation-psp:v0.2.6
  policyMode: monitor
  allowedToMutate: true
  settings:
    default_allow_privilege_escalation: false
clusterwide-no-privileged-pod:
  namespacedName:
    Namespace: ''
    Name: no-privileged-pod
  url: ghcr.io/kubewarden/policies/pod-privileged:v0.2.7
  policyMode: monitor
  allowedToMutate: false
  settings:
namespaced-default-policy-echo:
  namespacedName:
    Namespace: default
    Name: policy-echo
  url: ghcr.io/kubewarden/policies/echo:latest
  policyMode: protect
  allowedToMutate: false
  settings: {}
namespaced-default-psp-apparmor:
  namespacedName:
    Namespace: default
    Name: psp-apparmor
  url: ghcr.io/kubewarden/policies/apparmor-psp:v0.1.9
  policyMode: protect
  allowedToMutate: false
  settings:
    allowed_profiles:
    - runtime/default
namespaced-default-psp-capabilities:
  namespacedName:
    Namespace: default
    Name: psp-capabilities
  url: ghcr.io/kubewarden/policies/capabilities-psp:v0.1.9
  policyMode: protect
  allowedToMutate: true
  settings:
    allowed_capabilities:
    - CHOWN
    required_drop_capabilities:
    - NET_ADMIN
namespaced-default-psp-user-group:
  namespacedName:
    Namespace: default
    Name: psp-user-group
  url: ghcr.io/kubewarden/policies/user-group-psp:v0.2.0
  policyMode: protect
  allowedToMutate: true
  settings:
    run_as_group:
      rule: RunAsAny
    run_as_user:
      rule: RunAsAny
    supplemental_groups:
      rule: RunAsAny
Memory usage

I've collected the memory usage statistics of the Policy Server pod using the metrics server.

An idle instance, with 2 workers, consumes 141 MB right after its bootstrap. I ran our load tests faking 10 users, the memory usage increased to 143 MB and remained constant during the whole testing time.

Virtual Memory

During the research I got confused by the memory usage of the policy-server process. I logged on the machine where policy-server was running and looked at the memory consumption via ps and htop.
This is when I discovered the process had allocated 160 GB of Virtual Memory, while having 145 MB of RES memory.

I started to look into the memory maps of the process using the pmap utility. Here I discovered that each Wasm module being loaded (one per policy, each worker has one -> 20 modules) takes 6 GB of virtual memory.

The reason that leads to this memory usage boils down to how wasmtime JIT optimizes the module to achieve the highest execution speed.

This is all documented here and here.

Long story short: by default a wasmtime::Engine performs a static allocation of the module memory because it provides better performance. On 64 bit architectures each module is assigned a 4GB address space, on top of that an extra 2 GB address space is added as a guard, to further optimize the compilation of the Wasm instructions to native code.

Having a process with high virtual memory is not unusual and is not an issue. We can safely ignore that. As a matter of fact, the process virtual memory is not even shown by the metric server because is not relevant.

Update: this is the output produced by running smap against the demo code I wrote. The one that instantiated one Wasm module using wasmtime.

Further work

The current code loads all the Wasm modules, optimizes them and keeps them in memory inside of a list of wasmtime::Module objects. This list is then shared with all the workers that are part of our worker pool. Each worker then creates one wapc::WapcHost instance per policy to be evaluated.

Reduce the number of wapc::WapcHost

A user might define 10 policies, all based on the same Wasm module. Each one of them with slightly different configuration values. This will result in only 1 instance of wasmtime::Module. However, this will lead to 10 wapc::WapcHost instances. Theoretically we could optimize for this kind of scenarios to have just one instance of wapc::WapcHost; however that would pose some challenges with the mechanism we use to identify which policy is performing a host callback. Right now we leverage the unique ID assigned to each wapc::WapcHost to understand which policy is making a host callback and take decisions. Conflating multiple policies to have the same ID would cause visibility issues (it would not be possible to understand which policy generated a trace event) and security issue (two policies might have different settings with regards to the Kubernetes resources they are granted access to).

We can think of a way to reduce the number of instances of wapc::WapcHost inside of the workers, without having to conflate everything

Spawn wapc::WapcHost on demand

Each worker has one wapc::WapcHost ready for each policy that has been defined by the user. Having this object ready allows us to have a quick evaluation of the incoming request.

I've done a test and changed the current code to create the wapc::WapcHost instances on demand, and just drop them once the evaluation is done.

This is summarized by these sequence charts

Current architecture
sequenceDiagram
    Webhook->>+Worker: process admission request
    Worker-->>Worker: Wasm eval
    Worker->>-Webhook: evaluation result
Loading
New architecture
sequenceDiagram
    Webhook->>+Worker: process admission request
    Worker-->>Worker: create wapcHost
    Worker-->>Worker: Wasm eval
    Worker->>-Webhook: evaluation result
    Worker-->>Worker: delete WapcHost
Loading

This approach lead to a lower memory usage, but it caused a massive drop in terms of performances. The request per seconds went from ~1100 down to ~90.

The amount of time required to setup a wapc::WapcHost instance is causing a this performance drop. Right now, this quick POC I wrote leverages the public API of wapc. That means we create a wasmtime::Instance object from scratch, on each evaluation. I think we can improve the performances by saving some computation via the usage of a wasmtime::InstancePre. Internally the wasmtime-provider crate already uses it (see this code I've added), we could try exposing that to the consumers of the library.
However, I think it will be hard to go from the 90 RPS back to 1100 RPS with this change.

Have a dynamic worker pool

Currently we size of the worker pool can be either specified by the user or can be calculated starting from the number of CPU cores.

We could allow the user to set a minimum and a maximum number of active workers. We could then grow and shrink the pool depending on the amount of incoming requests to be evaluated.

This would allow an idle server to have a smaller memory consumption.

Summary

I think the current memory usage is fine. We can do further optimizations, but I think right now we have more important issues to address.

Given the time, I think I would definitely try to investigate how we can reuse wapc::WapcHost instances and how we can implement a dynamic worker pool.

I think going to the on-demand model would be good for memory usage, but it will never provide the high throughput of requests that we currently offer.

@flavio flavio moved this to Pending review in Kubewarden Sep 20, 2023
@flavio
Copy link
Member Author

flavio commented Sep 20, 2023

Another possible activity for the future: rewrite the load-tests to make use of k6.

Currently the load-tests are written with locus, which is doing a fine job. However, if we were to migrate to k6 we could then have a performance testing environment made of:

  • prometheus: to collect node and pods resource usage
  • k6: to generate load
  • grafana: to inspect the data collected

The most interesting - and appealing - advantage of this solution is the ability to correlate the load generated by the k6 suite and the resource usage. Right now this analysis required quite some manual work from my side. It would be great to have something that produces better and more consistent results.

IMHO this should be the 1st thing we should address

@jvanz
Copy link
Member

jvanz commented Sep 21, 2023

I agree with @flavio's arguments and we can leave this enhancements for the future. Maybe we can keep the improvements in the performance tests in the queue and leave the further changes for the future. Therefore, when we found any possible issue, we bring those improvements to the table again.

@viccuad
Copy link
Member

viccuad commented Sep 25, 2023

This is incredible work, thanks for the insights and the learning pointers.

From what is presented, I agree on the priorities, and I would also like to tackle a dynamic worker pool and trying wasmtime::InstancePre, yet I prefer to punt on this.
I would start with refactoring the load-tests with k6.

@flavio flavio moved this from Pending review to Done in Kubewarden Sep 25, 2023
@ish-xyz
Copy link

ish-xyz commented Nov 14, 2023

We are using Kubewarden in some of our staging clusters and with 61 cluster-wide policies deployed the policy-server is using a lot of memory (40-50GB).
It seems like that the memory used from the policy-server is linearly scaled with the number of policies and seems a bit excessive.

Any suggestion or optimization I could do on my side?

@flavio
Copy link
Member Author

flavio commented Nov 14, 2023

Can you provide some details about your usage pattern, this could help us fine tune the optimization ideas we have.

Some questions:

  • How did you get this metric, is that via prometheus/cAdvisor?
  • Do you happen to have the same policy deployed multiple times, but with different settings?
  • Have you configured the number of workers used by the policy server instance?
  • Do you happen to be using the experimental Kyverno policy?
  • What is the percentage of ClusterAdmissionPolicy vs AdmissionPolicy? Do you have the same (policy + settings) policy deployed inside of multiple namespaces via AdmissionPolicy? Scratch it, you said you're using only ClusterAdmissionPolicies
  • Do you have context aware policies loaded? If so, what kind of resources are they accessing and how many instances of these resources do you have in the cluster?
  • Is the memory consumption increasing over the time? I wonder if there's any policy that is leaking memory

@flavio flavio self-assigned this Nov 17, 2023
@flavio flavio moved this from Done to In Progress in Kubewarden Nov 17, 2023
@flavio
Copy link
Member Author

flavio commented Nov 20, 2023

@ish-xyz I'm currently working on a fix for this issue. Can you please provide more information about your environment (see previous comment)?

flavio added a commit to flavio/policy-server that referenced this issue Nov 23, 2023
Reduce the memory consumption of Policy Server when multiple instances
of the same Wasm module are loaded.

Thanks to this change, a worker will have only once instance of
`PolicyEvaluator` (hence of wasmtime stack), per unique type of module.

Literally speaking, if a user has the `apparmor` policy deployed 5 times
(different names, settings,...) only one instance of `PolicyEvaluator`
will be allocated for it.

Note: the optimization works at the worker level. Meaning that `PolicyEvaluator`
are NOT sharing these instances between themselves.

This commit helps to address issue kubewarden/kubewarden-controller#528

Signed-off-by: Flavio Castelli <[email protected]>
@flavio
Copy link
Member Author

flavio commented Nov 23, 2023

JFYI: there's a PR under review that reduces the amount of memory consume: kubewarden/policy-server#596

flavio added a commit to flavio/policy-server that referenced this issue Nov 23, 2023
Reduce the memory consumption of Policy Server when multiple instances
of the same Wasm module are loaded.

Thanks to this change, a worker will have only once instance of
`PolicyEvaluator` (hence of wasmtime stack), per unique type of module.

Literally speaking, if a user has the `apparmor` policy deployed 5 times
(different names, settings,...) only one instance of `PolicyEvaluator`
will be allocated for it.

Note: the optimization works at the worker level. Meaning that `PolicyEvaluator`
are NOT sharing these instances between themselves.

This commit helps to address issue kubewarden/kubewarden-controller#528

Signed-off-by: Flavio Castelli <[email protected]>
flavio added a commit to flavio/policy-server that referenced this issue Nov 24, 2023
Reduce the memory consumption of Policy Server when multiple instances
of the same Wasm module are loaded.

Thanks to this change, a worker will have only once instance of
`PolicyEvaluator` (hence of wasmtime stack), per unique type of module.

Literally speaking, if a user has the `apparmor` policy deployed 5 times
(different names, settings,...) only one instance of `PolicyEvaluator`
will be allocated for it.

Note: the optimization works at the worker level. Meaning that `PolicyEvaluator`
are NOT sharing these instances between themselves.

This commit helps to address issue kubewarden/kubewarden-controller#528

Signed-off-by: Flavio Castelli <[email protected]>
@ish-xyz
Copy link

ish-xyz commented Nov 24, 2023

@flavio
How did you get this metric, is that via prometheus/cAdvisor? Yes prometheus.
Do you happen to have the same policy deployed multiple times, but with different settings? That's correct, we use kubewarden as PSP replacement
Have you configured the number of workers used by the policy server instance? We are using the default settings here
Do you happen to be using the experimental Kyverno policy? No
Do you have context aware policies loaded? Unsure, need to double check. I'll get back to you on this one
Is the memory consumption increasing over the time? Yes, it looks like it
I wonder if there's any policy that is leaking memory Could be, I'lll double check this

@flavio flavio moved this from In Progress to Pending review in Kubewarden Dec 12, 2023
@flavio flavio moved this from Pending review to Todo in Kubewarden Dec 12, 2023
flavio added a commit to flavio/policy-server that referenced this issue Dec 12, 2023
Reduce the memory consumption of Policy Server when multiple instances
of the same Wasm module are loaded.

Thanks to this change, a worker will have only once instance of
`PolicyEvaluator` (hence of wasmtime stack), per unique type of module.

Literally speaking, if a user has the `apparmor` policy deployed 5 times
(different names, settings,...) only one instance of `PolicyEvaluator`
will be allocated for it.

Note: the optimization works at the worker level. Meaning that `PolicyEvaluator`
are NOT sharing these instances between themselves.

This commit helps to address issue kubewarden/kubewarden-controller#528

Signed-off-by: Flavio Castelli <[email protected]>
@flavio
Copy link
Member Author

flavio commented Dec 15, 2023

@ish-xyz we've finished a major refactor of the policy-server codebase (see kubewarden/policy-server#596 (comment)).

Beginning of next year (Jan 2024), we are going to release a new version of Kubewarden. In the meantime it would be great if you could give a quick test to the changes by consuming the latest version of the policy-server image.

@vincebrannon
Copy link

vincebrannon commented Jan 8, 2024

@ish-xyz
EDIT
Ignore this as there was confusion between me and Flavio. The issue is with kubeapi and not policy server..
The memory increase is due to the replicasets count increasing by thousands wich we cannot explain.

Flavio pointed me at this issue, we have another support case with this problems. Kubewarden when installed spams replicasets and floods the apiserver eating up memory. HE suggested I collect info from the customer and add it here:

• How did you get this metric, is that via prometheus/cAdvisor? If you are referring to the number of replicasets, it is the one reported in Rancher's UI.
• Do you happen to have the same policy deployed multiple times, but with different settings? No
• Have you configured the number of workers used by the policy server instance? It is running with 3 replicas
• Do you happen to be using the experimental Kyverno policy? No
• Do you have context aware policies loaded? If so, what kind of resources are they accessing and how many instances of these resources do you have in the cluster? I do not think so. The policies that we have are the following:

NAME POLICY SERVER MUTATING BACKGROUNDAUDIT MODE OBSERVED MODE STATUS AGE
disallow-service-loadbalancer default false true protect protect active 26d
do-not-run-as-root default true true protect protect active 26d
do-not-share-host-paths default false true protect protect active 26d
drop-capabilities default true true monitor monitor active 23d
kubewarden-mandatorynamespacelabels-policy default true true protect protect active 26d
no-host-namespace-sharing default false true protect protect active 26d
no-privilege-escalation default true true protect protect active 26d
no-privileged-pod default false true protect protect active 26d

All of them have been provided by Rancher, the kubewarden-mandatorynamespacelabels-policy is not using any regex, just checking there is a projectId label to prevent namespaces from being created outside projects.

settings:
    mandatory_labels:
      - [field.cattle.io/projectId](https://field.cattle.io/projectId)

• Is the memory consumption increasing over the time? I wonder if there's any policy that is leaking memory. No, we have not seen the memory consumption increasing overtime in any of the Kubewarden pods. For instance, every Policy server pod is consuming 700M flat. What happened was related to apiserver in the control plane. I will attach some screenshots from the CPU, memory and disk utilization around the first time it happened.
CPU
disk
memory

@viccuad
Copy link
Member

viccuad commented Jan 26, 2024

To clarify the last comment @vincebrannon, it seems that the culprit was misconfigured mutating policies that were interacting fighting with a k8s controller, therefore ReplicaSets were continously created. This has been fixed for default policies (that now only target pods) in Kubewarden 1.10.0.
Also, the docs have been expanded with an explanation here.

@viccuad
Copy link
Member

viccuad commented Jan 26, 2024

For general memory consumption, the proposed changes (and more) are now included in the newly released Kubewarden 1.10.0. This release reduces memory footprint, and more importantly, the memory consumption is now constant regardless of number of thread workers or horizontal scalling.

You can read more about it in the 1.10.0 release blogpost.

Closing this card then. Please don't hesitate to reopen, comment here, or open any other card if this becomes an issue again!

@viccuad viccuad closed this as completed Jan 26, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Kubewarden Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants