-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sources should report stream latency of stuck events #104
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
istreeter
added a commit
to snowplow-incubator/snowplow-lake-loader
that referenced
this pull request
Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
pondzix
reviewed
Jan 6, 2025
modules/streams-core/src/main/scala/com.snowplowanalytics.snowplow/sources/SourceAndAck.scala
Show resolved
Hide resolved
Currently in common-streams apps, the _application_ (i.e. not this library) is responsible for tracking latency of events consumed from the stream. Because of the way it is implemented in the apps, if an event gets stuck (e.g. cannot be written to destination) then the latency drops to zero for the period when the app is retrying the stuck event. This commit aims to fix the problem where the `latency_millis` metric wrongly drops to zero. 1. The source itself now tracks the latency of the event which is currently being processed by the downstream application. 2. The Metrics class allows metric starting values to be wrapped in and Effect, i.e. `F[Metrics.State]`. This means the application can pass in the latency reported by the Source.
The `SourceAndAck` is now responsible for pushing the latency metric. This consolidates the metric across multiple apps into one place. The `TokenedEvents` now does not contain a stream timestamp. The downstream app has no business knowing the stream timestamp, now that latency is calculated inside common-streams lib. The `LowLevelSource` no longer provides a `lastLiveness`. Instead, the low level source is expected to periodically emit a `None` whenever it is healthy. The higher level `SourceAndAck` then takes responsibility for monitoring whether the low level source is healthy. As a result of these changes, the `SourceAndAck` emits a latency metric of zero whenever the source is genuinely healthy but there are no events to be pulled. This means the app's latency metric is only zero if there are no events. This is better than the previous situation, where the app might mis-leadingly emit a zero latency when the app is stuck, e.g. cannot connect to warehouse.
pondzix
approved these changes
Jan 9, 2025
istreeter
force-pushed
the
fix-latency-metric
branch
from
January 9, 2025 17:29
9b1b094
to
78da1d5
Compare
istreeter
added a commit
to snowplow-incubator/snowplow-lake-loader
that referenced
this pull request
Jan 10, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently in common-streams apps, the application (i.e. not this library) is responsible for tracking latency of events consumed from the stream. Because of the way it is implemented in the apps, if an event gets stuck (e.g. cannot be written to destination) then the latency drops to zero for the period when the app is retrying the stuck event.
This commit aims to fix the problem where the
latency_millis
metric wrongly drops to zero.F[Metrics.State]
. This means the application can pass in the latency reported by the Source.