You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#4872 sais that we should roll out a system to all clusters, but this is a bit ambigious and doing it in the full sense is a compromise on cloud costs and startup times.
Context
The grafana based AWS cost attribution system can capture most 2i2c attributed costs, and it can without trouble categorize the kinds of costs (compute, home storage, ...). This functionality can easily be rolled out to all AWS clusters.
However, the cost attribution to specific hubs can only be done thoroughly with a compromise on cloud costs, startup times, and some complexity added to our cloud infra. Since its a tradeoff and not just a pure improvement, a decision on how we roll this out to communities should be made.
Explanation to additional cloud costs and startup times
I'll explain this by analogy, where stones are user servers, buckets are nodes incurring cost when used, and the act of getting new buckets is to startup a node - which leads to startup time.
Let's say we have some amount of generic stones and are to fit them in some amount of generic buckets, we will at worst need to leave up to one bucket unused. However, if we need to treat stones differently based on Z amounts of different colors, and put them in buckets with matching color, we will instead at worst have to leave up to Z buckets unused.
This means that on average we'll use cloud resources less efficiently on average like this, and with the user of more buckets, we expect more addition/removal events to happen as well on average, which is what incurrs startup times.
Overview of complexities to consider
Hub specific compute cost attribution
Introducing this means to hub specific colored buckets, which is a tradeoff in cloud costs, startup time, and some cloud infra complexity.
Hub specific home storage cost attribution
Introducing this when we use AWS EFS is just incurring some cloud infra complexity, but it can also be a benefit from a maintenance perspective as backups can be restored for a subset of users instead of all users etc.
Introducing this when we use jupyterhub-home-nfs for storage, or if we are using GCP using filestore, then we'll have a tradeoff again.
This is because with jupyterhub-home-nfs and GCP filestore, we pay for a fixed amount of storage capacity which we increase when needed, while when we use AWS EFS, we just pay for what we use no matter what.
In practice, we'll end up with the colored buckets situation again if we transition to not use AWS EFS.
Definition of done
We have come to a decision on when to roll out what kind of cost attribution setup to which communities
The text was updated successfully, but these errors were encountered:
After having thought about this further, I think the best path forward is to: do a rollout for all AWS hubs without splitting apart node pools or EFS storage, and let hub specific node pools and storage be opt-in for both existing and new hubs.
The reason it should be opt-in to do hub specific cost attribution of node pools and EFS storage this is that it would incur additional cloud costs and startup times for communities, and additional complexity to setup for 2i2c.
A situation where this isn't merited is for nasa-cryo for example, they just have staging/prod and doesn't care about hub specific cost attribution given that they only have staging/prod.
With acceptance from @Gman0909, the decision is to default to not providing hub specific cost attribution cloud infra with hub specific EFS storage and hub specific node pools. So, we'll roll that out.
#4872 sais that we should roll out a system to all clusters, but this is a bit ambigious and doing it in the full sense is a compromise on cloud costs and startup times.
Context
The grafana based AWS cost attribution system can capture most 2i2c attributed costs, and it can without trouble categorize the kinds of costs (compute, home storage, ...). This functionality can easily be rolled out to all AWS clusters.
However, the cost attribution to specific hubs can only be done thoroughly with a compromise on cloud costs, startup times, and some complexity added to our cloud infra. Since its a tradeoff and not just a pure improvement, a decision on how we roll this out to communities should be made.
Explanation to additional cloud costs and startup times
I'll explain this by analogy, where stones are user servers, buckets are nodes incurring cost when used, and the act of getting new buckets is to startup a node - which leads to startup time.
Let's say we have some amount of generic stones and are to fit them in some amount of generic buckets, we will at worst need to leave up to one bucket unused. However, if we need to treat stones differently based on Z amounts of different colors, and put them in buckets with matching color, we will instead at worst have to leave up to Z buckets unused.
This means that on average we'll use cloud resources less efficiently on average like this, and with the user of more buckets, we expect more addition/removal events to happen as well on average, which is what incurrs startup times.
Overview of complexities to consider
Introducing this means to hub specific colored buckets, which is a tradeoff in cloud costs, startup time, and some cloud infra complexity.
jupyterhub-home-nfs
for storage, or if we are using GCP using filestore, then we'll have a tradeoff again.This is because with
jupyterhub-home-nfs
and GCP filestore, we pay for a fixed amount of storage capacity which we increase when needed, while when we use AWS EFS, we just pay for what we use no matter what.In practice, we'll end up with the colored buckets situation again if we transition to not use AWS EFS.
Definition of done
The text was updated successfully, but these errors were encountered: