-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] adding otlp endpoint #7996
[FEATURE] adding otlp endpoint #7996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the efforts here! Looks good already. 🙂
I did recommend copying the translator struct over to use thanos protos natively instead of doing a full conversion, but it could be a maintenance burden.
Let's wait for other maintainers to have a look as well.
Couple of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this work @nicolastakashi ❤️, I'm adding a couple more suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! ❤️
This already looks quite good. @nicolastakashi we can probably mark it ready for review, and get others to have a look as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks for working on this, this makes the project much more relevant in the whole Observability space.
That said, I am not a huge fan of the whole copying files over, but I am okay with it in name of optimization. The code for the handler looks good, I just wanted to see more core reuse in general (but not a problem to solve on this PR). We already have duplication because of Capn Proto ingestion and now also because of OTLP.
Once again, amazing work Nicolas! 💪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @nicolastakashi 🙌 , this is looking better with every pass.
I'm adding couple more suggestions to improve the parameters, I hope that's OK and doesn't add to much effort.
Two more things:
-
I think it would be good to have a separate documentation section to explain the OTLP endpoint and mainly to explain the behavior of resource attributes and how the parameters affect the ingested metrics / labels.
-
With regards to the translator code, I'm personally also leaning towards just importing instead of copying and all of the code, and then seeing if we need to copy the code over for further optimization. But it's not a strong position, if people see value in including this optimization from the get-go, I'm not opposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more additions to the previous changes related to the parameters, otherwise looks good on my side!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from pending comments + conflict, I think it looks good now!
I think there few things to note/remember as TODOs,
- otlptranslator duplication to use zlabels. I think having it here works, but would cause some maintenance burden overall, we can choose to remove later if that were to be the case (or even upstream zlabels)
- We are also duplicating a lot of handler code. I had to kind of do the same initially in Support remote write 2.0 on receive #8033 for remote write 2.0. But I think once this is merged, I'll update my PR to make things around handling/replication more generic for rw1, 2.0, otlp and capnproto!
- Having a blog post + more docs about this feature once it lands, maybe even with some reference arch :)
} | ||
} | ||
|
||
req, err := remote.DecodeOTLPWriteRequest(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called lots of times - https://github.com/prometheus/prometheus/blob/1ea9b72997a116b4a7f7a22635c16d71dc5ab440/storage/remote/codec.go#L873 maybe this function could accept a []byte
so that we could pool this buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense to me.
Do you think we should change upstream before we merge it?
Do you prefer copy this function and maintain it on thanos also?
5ad37d2
to
14179f7
Compare
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
Co-authored-by: Matej Gera <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Co-authored-by: Matej Gera <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
14179f7
to
eda86c3
Compare
Signed-off-by: Nicolas Takashi <[email protected]>
@nicolastakashi see seems like test failure is relateD? |
Matej, seems some CI flakyness, I can't reproduce the issue locally, I can see the errors are timeout |
Signed-off-by: Saswata Mukherjee <[email protected]>
Changes
Verification