-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add proposal for grafana alerting #1349
Conversation
Personally, I'm in favor of option three. This would also allow us to implement this in multiple passes by implementing alert rules first, as the routing is label based anyway. |
Thanks for taking another shot at this @theSuess , here are my initial thoughts. Trying to figure out what I feel about the suggestion, I will probably change my mind 3 times ;) I lean towards following how it's done in terraform and have a specific resource for Rule Groups, so doing option 1 but instead of calling it One of my main concern with this, due to the number of resources that point to each other is how we should connect them. This is an easy problem to solve in terraform since you can just reference the Datasource resource in your RuleGroup resource, this is sadly not an easy thing to do in Kubernetes. The issue is the same in How do you see |
Calling the resource Regarding references to the datasource, UIDs can be set during creation - might require some changes in the operator though. I'm trying to get every API to support UIDs at creation on the grafana side 🤞 Mute timings and Notification templates are identified by name, we should be good here. |
That is great, I have been looking at the http API docs allot https://grafana.com/docs/grafana/latest/developers/http_api/data_source/, but thinking of it now, those are mostly examples and not the full schema 🤦 Getting API to support UID at creation would be awesome, Igor got an old issue around that for teams if I remember correctly. How do you see the linking happening between the resources? Should we use UID? It's not the most user-friendly way of solving it. But it sure makes life easy for us ;) Around performing changes to our existing APIs I'm all open for it. We can always release a Also, to make this design document easier to read in the future, it would be nice with full examples on how we want the CRDs to look like. I think it will also make it clearer when we do the implementation so we have something to compare it to. Of course, we might change the design a bit during the implementation, but that we at least have a discussion point before doing so. |
I think UID linking is the way to go here. We don't have many places where we actually need to link things. I think this linking makes sense: flowchart LR
AlertGroup-- uid -->Folder
AlertGroup-- label selector --> GrafanaInstance
AlertGroup-- uid --> DataSource
Folder-- label selector --> GrafanaInstance
NotificationPolicy-- label selector --> GrafanaInstance
NotificationPolicy-- uid --> ContactPoint
ContactPoint-- label selector --> GrafanaInstance
Everything needs a label selector to decide to which grafana instance this applies to and cross-references are made using UIDs |
@theSuess is the idea then to use predictable UIDs (chosen by the user) to make it easier to link the resources together? |
Yes, that's the way I'd go about it |
We'll also need to gather infos on how alert rule groups interact with subfolders |
26a9688
to
22877e9
Compare
Updated the proposal with more examples and some more findings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an implementation detail in a comment. Don't know if it needs to be added in the design document, though. But good to think about.
When implementing this feature it would be nice to use the new go-sdk for the grafana API as stataed in #1357.
All in all I think this looks great. Nice job @theSuess
Would love some feedback on this PR from the community. |
d928e11
to
f464cf1
Compare
f464cf1
to
c04e2e5
Compare
This picks up #911 and #1144. The proposal contains three different options for realizing alerting support in the operator and should serve as a base for discussion regarding this topic.