AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

consideRatio · 2024-10-03T08:09:40Z

#4872 sais that we should roll out a system to all clusters, but this is a bit ambigious and doing it in the full sense is a compromise on cloud costs and startup times.

Context

The grafana based AWS cost attribution system can capture most 2i2c attributed costs, and it can without trouble categorize the kinds of costs (compute, home storage, ...). This functionality can easily be rolled out to all AWS clusters.

However, the cost attribution to specific hubs can only be done thoroughly with a compromise on cloud costs, startup times, and some complexity added to our cloud infra. Since its a tradeoff and not just a pure improvement, a decision on how we roll this out to communities should be made.

Explanation to additional cloud costs and startup times

I'll explain this by analogy, where stones are user servers, buckets are nodes incurring cost when used, and the act of getting new buckets is to startup a node - which leads to startup time.

Let's say we have some amount of generic stones and are to fit them in some amount of generic buckets, we will at worst need to leave up to one bucket unused. However, if we need to treat stones differently based on Z amounts of different colors, and put them in buckets with matching color, we will instead at worst have to leave up to Z buckets unused.

This means that on average we'll use cloud resources less efficiently on average like this, and with the user of more buckets, we expect more addition/removal events to happen as well on average, which is what incurrs startup times.

Overview of complexities to consider

Hub specific compute cost attribution
Introducing this means to hub specific colored buckets, which is a tradeoff in cloud costs, startup time, and some cloud infra complexity.
Hub specific home storage cost attribution
- Introducing this when we use AWS EFS is just incurring some cloud infra complexity, but it can also be a benefit from a maintenance perspective as backups can be restored for a subset of users instead of all users etc.
- Introducing this when we use jupyterhub-home-nfs for storage, or if we are using GCP using filestore, then we'll have a tradeoff again.
  This is because with jupyterhub-home-nfs and GCP filestore, we pay for a fixed amount of storage capacity which we increase when needed, while when we use AWS EFS, we just pay for what we use no matter what.
  In practice, we'll end up with the colored buckets situation again if we transition to not use AWS EFS.

Definition of done

We have come to a decision on when to roll out what kind of cost attribution setup to which communities

The text was updated successfully, but these errors were encountered:

consideRatio · 2024-10-16T14:58:50Z

After having thought about this further, I think the best path forward is to: do a rollout for all AWS hubs without splitting apart node pools or EFS storage, and let hub specific node pools and storage be opt-in for both existing and new hubs.

The reason it should be opt-in to do hub specific cost attribution of node pools and EFS storage this is that it would incur additional cloud costs and startup times for communities, and additional complexity to setup for 2i2c.

A situation where this isn't merited is for nasa-cryo for example, they just have staging/prod and doesn't care about hub specific cost attribution given that they only have staging/prod.

consideRatio · 2024-10-17T13:49:44Z

With acceptance from @Gman0909, the decision is to default to not providing hub specific cost attribution cloud infra with hub specific EFS storage and hub specific node pools. So, we'll roll that out.

This was referenced Oct 3, 2024

[EPIC] Roll out a cost attribution system in all AWS clusters #4872

Closed

[Spike, 1h max] Followup cost attribution chat #4919

Closed

consideRatio closed this as completed Oct 17, 2024

consideRatio mentioned this issue Nov 1, 2024

[victor] Upgrade k8s version, separate nodepools per hub, enable cost-allocation #5022

Merged

GeorgianaElena mentioned this issue Nov 12, 2024

AWS cost attribution: documentation about opting in to have hub specific infra for compute and home storage #5000

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

consideRatio commented Oct 3, 2024 •

edited

Loading

consideRatio commented Oct 16, 2024 •

edited

Loading

consideRatio commented Oct 17, 2024

AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

AWS cost attribution (feedback 4): Clarify situation on what is to be rolled out for all AWS clusters #4928

Comments

consideRatio commented Oct 3, 2024 • edited Loading

Context

Explanation to additional cloud costs and startup times

Overview of complexities to consider

Definition of done

consideRatio commented Oct 16, 2024 • edited Loading

consideRatio commented Oct 17, 2024

consideRatio commented Oct 3, 2024 •

edited

Loading

consideRatio commented Oct 16, 2024 •

edited

Loading