[POC][DEPRECATED] exporter batcher - byte size based batching #12017

sfc-gh-sili · 2025-01-05T01:30:58Z

DEPRECATED
See for #12091 for the updated POC

Description

This is an POC of serialized size based batching.

Configuration is supported via an additional field to MaxSizeConfig.

type MaxSizeConfig struct {
	MaxSizeItems int `mapstructure:"max_size_items"`
	MaxSizeBytes int `mapstructure:"max_size_bytes"`
}

We will validate that at most one of the above fields are specified (TODO) and switch between item count-based batching vs. byte size-based batching accordingly.

To get the byte size of otlp protos, this PR updates pdata/internal/cmd/pdatagen/internal/templates/message.go.tmpl to expose an interface Size(). This change will apply to all pdatagen-generated files.

Performance Benchmark

Conclusions

Byte size based batching is more expensive.
Both merging and splitting are expensive
- Cost of splitting is dependent on number of logs that are "extracted". The most unfortunate case is where "the incoming request has to be split, but the remaining capacity is large enough to hold most of the logs"
As a best practice, users should make sure maxSizeLimit is large enough for most requests. (Assumption 1)
With the Assumption 1 , Optimization 1 would save us from the above unfortunate case.
"Optimization 2" can significantly improvement the performance at the cost of accuracy. It helps with the case where many requests are merged into a single batch.

Benchmark 1

This benchmark merge splits 1000 requests x 10 logs / request. None of the incoming request would result in a split. The accumulated batch is ~1MB

BenchmarkSplittingBasedOnItemCountManySmallLogs-10    	     338	   3,551,946 ns/op
BenchmarkSplittingBasedOnByteSizeManySmallLogs-10     	       9	 121,470,181 ns/op

Benchmark 2

The following benchmarks test the case where every request would result in a split.

Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit

BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit

BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      54	  22,659,051 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      37	  31,614,578 ns/op

Benchmark 3

This benchmark merge splits a request with 100000 logs / 9.6MB in to 10 batches. With the above mentioned assumption, this should rarely occur in practice.

BenchmarkSplittingBasedOnItemCountHugeLog-10    	      40	  30,575,706 ns/op
BenchmarkSplittingBasedOnByteSizeHugeLog-10     	       1	9,909,146,541 ns/op

Optimization 1

Instead of splitting precisely, simply put the new request into a new batch if it goes beyond capacity.
Running Benchmark 2 again with optimization:

Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      48	  24,296,155 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	11,746,665,625 ns/op

Takeaway: if size limit is not large enough, this optimization might hurt the performance.

Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      54	  22,659,051 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      37	  31,614,578 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      52	  22,252,627 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      43	  25,942,651 ns/op

Optimization 2

Optimization 2 is done at the cost of being inaccurate. For example, with Benchmark 1, the actual byte size is 1010000 while the estimated byte size is 1011010. Delta is expected to be larger in real situation as we will have more metadata with logs.

(The original cost is not high either, so optimization 2 is not essential)

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManySmallLogs-10    	     340	   3470374 ns/op
BenchmarkSplittingBasedOnByteSizeManySmallLogs-10     	     493	   2784610 ns/op

Link to tracking issue

Fixes #3262

Testing

Documentation

TODO: ByteSize() should return int64 instead of int

codecov · 2025-01-05T01:38:30Z

Codecov Report

Attention: Patch coverage is 5.88235% with 192 lines in your changes missing coverage. Please review.

Project coverage is 90.98%. Comparing base (306c939) to head (43e811c).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
exporter/exporterhelper/logs_batch.go	3.75%	77 Missing ⚠️
exporter/exporterbatcher/config.go	0.00%	8 Missing and 4 partials ⚠️
exporter/internal/queue/default_batcher.go	75.00%	2 Missing and 1 partial ⚠️
exporter/exporterhelper/internal/request.go	0.00%	2 Missing ⚠️
exporter/exporterhelper/logs.go	0.00%	2 Missing ⚠️
exporter/exporterhelper/metrics.go	0.00%	2 Missing ⚠️
exporter/exporterhelper/traces.go	0.00%	2 Missing ⚠️
...xporter/exporterhelper/xexporterhelper/profiles.go	0.00%	2 Missing ⚠️
pdata/pcommon/generated_instrumentationscope.go	0.00%	2 Missing ⚠️
pdata/pcommon/generated_resource.go	0.00%	2 Missing ⚠️
... and 43 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #12017      +/-   ##
==========================================
- Coverage   91.70%   90.98%   -0.73%     
==========================================
  Files         455      455              
  Lines       24053    24253     +200     
==========================================
+ Hits        22058    22066       +8     
- Misses       1625     1812     +187     
- Partials      370      375       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sfc-gh-sili · 2025-01-14T23:21:15Z

Feedback from Bogdan:

Performance number looks bad.
We should be able to do Optimization 2 without compromise to accuracy (take a look at the Size() implementation)
Cache size in request (like Optimization 2)

sfc-gh-sili force-pushed the sili-byte-size branch 3 times, most recently from 812f050 to d156151 Compare January 8, 2025 02:49

sfc-gh-sili force-pushed the sili-byte-size branch 2 times, most recently from 6f4e460 to e86780d Compare January 14, 2025 02:19

sfc-gh-sili force-pushed the sili-byte-size branch from 15d9db6 to e86780d Compare January 15, 2025 02:44

sfc-gh-sili added 2 commits January 14, 2025 18:50

POC byte based batching

a4ad24c

Benchmarks and optimizations

716bbda

sfc-gh-sili force-pushed the sili-byte-size branch 2 times, most recently from a8a28cd to 716bbda Compare January 15, 2025 02:53

sfc-gh-sili changed the title ~~[POC] exporter batcher - byte size based batching~~ [POC][DEPRECATED] exporter batcher - byte size based batching Jan 15, 2025

sfc-gh-sili mentioned this pull request Jan 15, 2025

[POC][NEW] exporter batcher - byte size based batching #12091

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC][DEPRECATED] exporter batcher - byte size based batching #12017

[POC][DEPRECATED] exporter batcher - byte size based batching #12017

sfc-gh-sili commented Jan 5, 2025 •

edited

Loading

codecov bot commented Jan 5, 2025 •

edited

Loading

sfc-gh-sili commented Jan 14, 2025 •

edited

Loading

[POC][DEPRECATED] exporter batcher - byte size based batching #12017

Are you sure you want to change the base?

[POC][DEPRECATED] exporter batcher - byte size based batching #12017

Conversation

sfc-gh-sili commented Jan 5, 2025 • edited Loading

Description

Performance Benchmark

Conclusions

Benchmark 1

Benchmark 2

Benchmark 3

Optimization 1

Optimization 2

Link to tracking issue

Testing

Documentation

codecov bot commented Jan 5, 2025 • edited Loading

Codecov Report

sfc-gh-sili commented Jan 14, 2025 • edited Loading

sfc-gh-sili commented Jan 5, 2025 •

edited

Loading

codecov bot commented Jan 5, 2025 •

edited

Loading

sfc-gh-sili commented Jan 14, 2025 •

edited

Loading