Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

Closed
Tracked by #4872
consideRatio opened this issue Oct 3, 2024 · 2 comments

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Oct 3, 2024

#4872 sais that we should roll out a system to all clusters, but this is a bit ambigious and doing it in the full sense is a compromise on cloud costs and startup times.

Context

The grafana based AWS cost attribution system can capture most 2i2c attributed costs, and it can without trouble categorize the kinds of costs (compute, home storage, ...). This functionality can easily be rolled out to all AWS clusters.

However, the cost attribution to specific hubs can only be done thoroughly with a compromise on cloud costs, startup times, and some complexity added to our cloud infra. Since its a tradeoff and not just a pure improvement, a decision on how we roll this out to communities should be made.

Explanation to additional cloud costs and startup times

I'll explain this by analogy, where stones are user servers, buckets are nodes incurring cost when used, and the act of getting new buckets is to startup a node - which leads to startup time.

Let's say we have some amount of generic stones and are to fit them in some amount of generic buckets, we will at worst need to leave up to one bucket unused. However, if we need to treat stones differently based on Z amounts of different colors, and put them in buckets with matching color, we will instead at worst have to leave up to Z buckets unused.

This means that on average we'll use cloud resources less efficiently on average like this, and with the user of more buckets, we expect more addition/removal events to happen as well on average, which is what incurrs startup times.

Overview of complexities to consider

  • Hub specific compute cost attribution
    Introducing this means to hub specific colored buckets, which is a tradeoff in cloud costs, startup time, and some cloud infra complexity.
  • Hub specific home storage cost attribution
    • Introducing this when we use AWS EFS is just incurring some cloud infra complexity, but it can also be a benefit from a maintenance perspective as backups can be restored for a subset of users instead of all users etc.
    • Introducing this when we use jupyterhub-home-nfs for storage, or if we are using GCP using filestore, then we'll have a tradeoff again.
      This is because with jupyterhub-home-nfs and GCP filestore, we pay for a fixed amount of storage capacity which we increase when needed, while when we use AWS EFS, we just pay for what we use no matter what.
      In practice, we'll end up with the colored buckets situation again if we transition to not use AWS EFS.

Definition of done

  • We have come to a decision on when to roll out what kind of cost attribution setup to which communities
@consideRatio
Copy link
Contributor Author

consideRatio commented Oct 16, 2024

After having thought about this further, I think the best path forward is to: do a rollout for all AWS hubs without splitting apart node pools or EFS storage, and let hub specific node pools and storage be opt-in for both existing and new hubs.

The reason it should be opt-in to do hub specific cost attribution of node pools and EFS storage this is that it would incur additional cloud costs and startup times for communities, and additional complexity to setup for 2i2c.

A situation where this isn't merited is for nasa-cryo for example, they just have staging/prod and doesn't care about hub specific cost attribution given that they only have staging/prod.

@consideRatio
Copy link
Contributor Author

With acceptance from @Gman0909, the decision is to default to not providing hub specific cost attribution cloud infra with hub specific EFS storage and hub specific node pools. So, we'll roll that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant