-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build health metrics dimensions #952
Comments
Report per pipeline A solution we are looking at could be to produce an histogram metric per pipeline and per result. It would be inspired by the standardized I was waiting for the OTel CI/CD SIG to standardize such a metric, work is in progress with:
@christophe-kamphaus-jemmic did you put thoughts on such metrics? The metric could look like
Report per pipeline step High cardinality problems look even more a risk here. I'm wondering if we should not stick to solve this doing metrics queries on the traces similar to what TraceQL metrics queries offer. Controlling cardinality I'm thinking of helping Jenkins admins control cardinality of such metrics enabling allow & deny lists of pipeline names as we have seen Jenkins instances with thousands of pipelines. @mrh666 Is it he kind of ideas you had in mind? |
@cyrille-leclerc that's exactly what I have in mind!
You have reasonable worries about cardinality. In influx world it's easily can kill all the DB performance. But just make it optional. Something like otel.exporter.otlp.metrics.build_health.enabled
In Dynatrace world it's impossible or close to impossible. I've digging into such a functionality and not achieved any results. |
Thanks @mrh666 . Can you please share with us:
Same question for build steps |
In the current project:
This one is really important! |
Here is a proposal:
See: Feedback welcome cc @mrh666 |
Indeed work in the OTel CI/CD SIG related to metrics is in progress. One issue I can see with using a histogram is that the chosen buckets might not give enough insight to take any action. Some jobs might be of very short duration, while others could take hours or even days to complete. I had very good success in using metrics queries on traces/spans for job duration as well as stage duration using the steps I introduced in #827 (example use in a pipeline here: #811 (comment)). This allowed me to have very detailed statistics (eg. average duration per day or job) and is filterable per job.
For sure cardinality is an issue when the number of time series scales with a dynamic value like the number of jobs managed by Jenkins. It's not as bad as when we would have a separate time series per build, but still it needs to be managed. (prometheus-plugin has per-build metrics guarded by a checkbox config option) Controlling which jobs generate this metric on Jenkins-side I think is a very good option. Alternatively it's also possible to filter later:
|
Cc @miraccan00 |
Please use the See documentation https://github.com/jenkinsci/opentelemetry-plugin/blob/main/docs/monitoring-metrics.md#build-duration I'm marking your enhancement request as solved. Please open new enhancement requests if needed. |
@cyrille-leclerc Thank you! Now I've been trying to use it with Jenkins. Here is config line from jenkins:
I changed those params a lot of times. But I can't trigger the metric. Like:
Am I doing wrong with regex? And another point - please fix the documentation https://github.com/jenkinsci/opentelemetry-plugin/blob/main/docs/monitoring-metrics.md#build-duration: You wrote:
Should be I believe:
|
Thanks for testing @mrh666 . For the documentation and https://github.com/jenkinsci/opentelemetry-plugin/pull/993/files#r1863138803 |
@cyrille-leclerc For example, the simple job: https://xxxxxxx/job/telemetry%20test%20pipe/ |
@mrh666 please just configure:
Don't put |
@cyrille-leclerc Maybe I'm doing something wrong? Here is settings: Here is all metrics coming to Dynatrace perfectly except ci.pipeline.run.duration Versions: |
Can you verify that you have a histogram metric |
@cyrille-leclerc
What could be the cause of it? |
@mrh666 it seems that Dynatrace just introduced support for OTel histogram metrics: |
Friendly ping @mrh666 |
What feature do you want to see added?
@cyrille-leclerc on that page https://plugins.jenkins.io/opentelemetry/ there is a screenshot of kibana https://raw.githubusercontent.com/jenkinsci/opentelemetry-plugin/master/docs/images/kibana_jenkins_overview_dashboard.png with all graphs I need, e.g. job duration, failed steps, long steps, etc. How can we get those metrics exported to Dynatrace?
Upstream changes
No response
Are you interested in contributing this feature?
No response
The text was updated successfully, but these errors were encountered: