Add ingest/processed/bytes metric #17581

neha-ellur · 2024-12-17T23:52:49Z

A new metric ingest/processed/bytes has been introduced to track the total number of bytes processed during ingestion tasks, including native batch ingestion, streaming ingestion, and multi-stage query (MSQ) ingestion tasks. This metric helps provide a unified view of data processing across different ingestion pathways.

Key changed/added classes in this PR

This metric was added in three key ingestion task classes:

IndexTask: A sequential ingestion task. The processed bytes were retrieved from the RowIngestionMetersTotals object (buildSegmentsMeters) and emitted directly after segment publication.
ParallelIndexSupervisorTask: A task that supervises parallel ingestion. Processed bytes were aggregated from subtasks' ingestion metrics (RowIngestionMetersTotals).
SeekableStreamIndexTaskRunner: A runner for ingestion tasks that consume data from seekable streams (e.g., Kafka). The processed bytes were calculated based on the size of the data buffers (e.g., ByteEntity buffers) being processed for each record. The metric was emitted for each processed record.
MsqContollerImpl: The ingest/processed/bytes metric is emitted by aggregating bytes processed across all stages and workers during MSQ task execution. This includes fetching counters from the CounterSnapshotsTree, summing up bytes from all input channels for each worker and stage using a stream-based aggregation logic and emitting the aggregated bytes as the ingest/processed/bytes metric for the entire MSQ task.

This PR has:

cryptoe · 2025-01-07T17:26:44Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

@@ -329,6 +331,27 @@ public void run(final QueryListener queryListener) throws Exception
    }
    // Call onQueryComplete after Closer is fully closed, ensuring no controller-related processing is ongoing.
    queryListener.onQueryComplete(reportPayload);
+
+    long totalProcessedBytes = reportPayload.getCounters().copyMap().values().stream()


This seems like a wrong place to put this logic .
Ingest/processed/bytes seems like a ingestion only metric no ?
If that is the case, we should emit the metric only if the query is an ingestion query.

you could probably expose a method here https://github.com/apache/druid/blob/9bebe7f1e5ab0f40efbff620769d0413c943683c/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java#L517 saying emit summary metrics and have the task report and the query passed to it.

Moved the logic

I think the place where it has moved is correct.
Rather than ingest in the metric name can we rename the matric to input/processed/bytes or something since we would want that metric in msq selects as well.

Also the msq code might need to be adjusted so that only leaf nodes contribute to this metric no ? as an equivalent batch ingest with range partitioning will show less processed bytes since the shuffle stage input is not being counted for. A simple test should be sufficient to rule this out.

Try a query like replace bar all using select * from extern(http) partitioned by day clustered by col1 and an equivalent range partitioning spec for batch ingestion for the same http input source.
cc @kfaraz

@cryptoe This metric will be used in the billing console and should be named ingest/processed/bytes.
Regarding the msq code to being on the leaf nodes, where would that be? Regarding the test, any pointers to existing tests would be helpful, this is my first time in this area of code.

cryptoe · 2025-01-07T17:28:04Z

Also there are some static check failures which need to be looked at.

neha-ellur · 2025-01-09T14:56:27Z

Also there are some static check failures which need to be looked at.

@cryptoe fixed

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

kfaraz · 2025-01-15T09:03:02Z

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

+            long bytesProcessed = 0;
+            for (ByteEntity entity : record.getData()) {
+              bytesProcessed += entity.getBuffer().remaining();
+            }


Can we just reuse the rowIngestionMeters.getProcessedBytes() instead of computing the value explicitly here?

kfaraz · 2025-01-15T09:04:12Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

+    long totalProcessedBytes = msqTaskReportPayload.getCounters() != null
+        ? msqTaskReportPayload.getCounters().copyMap().values().stream().mapToLong(
+            integerCounterSnapshotsMap -> integerCounterSnapshotsMap.values().stream()
+                .mapToLong(counterSnapshots -> {
+                  Map<String, QueryCounterSnapshot> workerCounters = counterSnapshots.getMap();
+                  return workerCounters.entrySet().stream().mapToLong(
+                      channel -> {
+                        if (channel.getKey().startsWith("input")) {
+                          ChannelCounters.Snapshot snapshot = (ChannelCounters.Snapshot) channel.getValue();
+                          return snapshot.getBytes() == null ? 0L : Arrays.stream(snapshot.getBytes()).sum();
+                        }
+                        return 0L;
+                      }).sum();
+                }).sum()).sum()
+        : 0;


This seems a little difficult to read.
Can we clean up this logic a little? Maybe by returning early when msqTaskReportPayload.getCounters() is null etc.

refactored the code

kfaraz · 2025-01-15T14:24:37Z

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

+            final ServiceMetricEvent.Builder metricBuilder = new ServiceMetricEvent.Builder();
+            IndexTaskUtils.setTaskDimensions(metricBuilder, task);
+            toolbox.getEmitter().emit(
+                metricBuilder.setMetric("ingest/processed/bytes", rowIngestionMeters.getProcessedBytes()));


Shouldn't this metric be emitted once per task probably at the end of the run method?
The other task types seem to be doing that.

kfaraz · 2025-01-15T15:02:32Z

@neha-ellur , just found this PR #14582 .
I wonder if the changes here are even needed since the ingest/input/bytes metric already contains the processed bytes for index, index_kafka, and some other task types.

We probably just need to wire up things for MSQ tasks.

changes

7c1e15b

github-actions bot added the Area - Ingestion label Dec 17, 2024

changes

4991db1

github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 18, 2024

kfaraz self-requested a review December 18, 2024 03:52

apache deleted a comment from neha-ellur Dec 18, 2024

cryptoe reviewed Jan 7, 2025

View reviewed changes

neha-ellur added 5 commits January 8, 2025 05:59

move the logic

d435068

fix imports

2a95606

fix imports

23d6c2f

fix checkstyle

62b12db

fix tests

b5e6879

kfaraz reviewed Jan 9, 2025

View reviewed changes

neha-ellur added 2 commits January 9, 2025 21:23

address comments

6dba948

refactor

89b1cc3

kfaraz reviewed Jan 15, 2025

View reviewed changes

cleanup

05003c2

kfaraz reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ingest/processed/bytes metric #17581

Add ingest/processed/bytes metric #17581

neha-ellur commented Dec 17, 2024 •

edited

Loading

cryptoe Jan 7, 2025 •

edited

Loading

neha-ellur Jan 8, 2025

cryptoe Jan 11, 2025

neha-ellur Jan 11, 2025 •

edited

Loading

cryptoe commented Jan 7, 2025

neha-ellur commented Jan 9, 2025 •

edited

Loading

kfaraz Jan 15, 2025

neha-ellur Jan 15, 2025

kfaraz Jan 15, 2025

neha-ellur Jan 15, 2025

kfaraz Jan 15, 2025 •

edited

Loading

kfaraz commented Jan 15, 2025

Add ingest/processed/bytes metric #17581

Are you sure you want to change the base?

Add ingest/processed/bytes metric #17581

Conversation

neha-ellur commented Dec 17, 2024 • edited Loading

Key changed/added classes in this PR

cryptoe Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

neha-ellur Jan 8, 2025

Choose a reason for hiding this comment

cryptoe Jan 11, 2025

Choose a reason for hiding this comment

neha-ellur Jan 11, 2025 • edited Loading

Choose a reason for hiding this comment

cryptoe commented Jan 7, 2025

neha-ellur commented Jan 9, 2025 • edited Loading

kfaraz Jan 15, 2025

Choose a reason for hiding this comment

neha-ellur Jan 15, 2025

Choose a reason for hiding this comment

kfaraz Jan 15, 2025

Choose a reason for hiding this comment

neha-ellur Jan 15, 2025

Choose a reason for hiding this comment

kfaraz Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

kfaraz commented Jan 15, 2025

neha-ellur commented Dec 17, 2024 •

edited

Loading

cryptoe Jan 7, 2025 •

edited

Loading

neha-ellur Jan 11, 2025 •

edited

Loading

neha-ellur commented Jan 9, 2025 •

edited

Loading

kfaraz Jan 15, 2025 •

edited

Loading