-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support metric to monitor import lag #179
Comments
Hiya! However, there's only one answer in SW engineering: "it depends" So here are some thoughts & questions Key questions for this are about expectations of usage, and what you're looking to use this on. Effectively you'd be running a count as well as the query to get up to the MAX row per poll, then submitting this metric in addition to storing it. Assuming its for just the initial spool up, this could simply be a count run on startup of the task then you'd more or less get the lag running down to zero over time, and that would be relatively easy to manage, if a touch awkward. BUT, if you are running fast enough that you expect the Connector to get behind a lot, its likely that you'd be better served with something that reads directly from the write ahead log rather than querying. Questions: |
We plan to use such a metric for an initial spool up. After all rows are imported other processes in our application landscape can start processing. In other words, other processes must wait for the initial spool up to finish. Our main goal is to recognize that the spool up is finished. Technically, it would be a count on the database. If I understand you correct, you suggest implementing the count query during jdbcSourceTask.start(…) or jdbcSourceTask.poll depending on whether the count must be updated or not. To answer your questions:
|
Short version: |
Hey @Ugbot, If you talk about metrics, do you talk about a metric that is published via JMX MBean? Or is there another way to provide metrics. I’m not sure, if I got what you mean by 'metrics'. |
Hiya! Storing this data is a bit of an issue, you could put it into your DB, but from the description of the task, it does not actually need to be stored. The table is not going to change so having to do the count again on task startup should not be a huge hit. I'd literally keep it in memory and just have it published with the task poll logging. I'll need to dig into the JMX stuff a bit to see if I can confirm this will work as expected.... |
Hi @Ugbot, I did some programming and provided a solution with a PullRequest!
Is there anything I can do to speed up the merge process? |
Hey @Ugbot, I’d like to kindly ask if there is any feedback to the code review? |
We need a metric to measure the import lag during processing messages from a jdbc source. We were wondering whether it is possible to add further metrics to the source connector that can be added to the kafka connect Prometheus metrics.
Our goal: Especially during first or initial import of database rows we want to monitor how many rows still need to be imported and are not already processed.
There is a metric that displays the number of polled rows (e.g.
source-record-poll-total
). This metric represents the number of rows which the connector has already processed/read from database. One cannot determine how many unpolled rows are left in the database and still need to be read from db.I am new to kafka connect plugin development. First of all: is it possible to add such a metric to this kafka connect plugin? Second, if we would implement that metric would you merge that pull request to your code base?
The text was updated successfully, but these errors were encountered: