Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC][DEPRECATED] exporter batcher - byte size based batching #12017

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-sili
Copy link
Contributor

@sfc-gh-sili sfc-gh-sili commented Jan 5, 2025

DEPRECATED
See for #12091 for the updated POC

Description

This is an POC of serialized size based batching.

Configuration is supported via an additional field to MaxSizeConfig.

type MaxSizeConfig struct {
	MaxSizeItems int `mapstructure:"max_size_items"`
	MaxSizeBytes int `mapstructure:"max_size_bytes"`
}

We will validate that at most one of the above fields are specified (TODO) and switch between item count-based batching vs. byte size-based batching accordingly.

To get the byte size of otlp protos, this PR updates pdata/internal/cmd/pdatagen/internal/templates/message.go.tmpl to expose an interface Size(). This change will apply to all pdatagen-generated files.


Performance Benchmark

Conclusions

  • Byte size based batching is more expensive.
  • Both merging and splitting are expensive
    • Cost of splitting is dependent on number of logs that are "extracted". The most unfortunate case is where "the incoming request has to be split, but the remaining capacity is large enough to hold most of the logs"
  • As a best practice, users should make sure maxSizeLimit is large enough for most requests. (Assumption 1)
  • With the Assumption 1 , Optimization 1 would save us from the above unfortunate case.
  • "Optimization 2" can significantly improvement the performance at the cost of accuracy. It helps with the case where many requests are merged into a single batch.

Benchmark 1

This benchmark merge splits 1000 requests x 10 logs / request. None of the incoming request would result in a split. The accumulated batch is ~1MB

BenchmarkSplittingBasedOnItemCountManySmallLogs-10    	     338	   3,551,946 ns/op
BenchmarkSplittingBasedOnByteSizeManySmallLogs-10     	       9	 121,470,181 ns/op

Benchmark 2

The following benchmarks test the case where every request would result in a split.

Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit

BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit

BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      54	  22,659,051 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      37	  31,614,578 ns/op

Benchmark 3

This benchmark merge splits a request with 100000 logs / 9.6MB in to 10 batches. With the above mentioned assumption, this should rarely occur in practice.

BenchmarkSplittingBasedOnItemCountHugeLog-10    	      40	  30,575,706 ns/op
BenchmarkSplittingBasedOnByteSizeHugeLog-10     	       1	9,909,146,541 ns/op

Optimization 1

Instead of splitting precisely, simply put the new request into a new batch if it goes beyond capacity.
Running Benchmark 2 again with optimization:

Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      48	  24,296,155 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	11,746,665,625 ns/op

Takeaway: if size limit is not large enough, this optimization might hurt the performance.

Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      54	  22,659,051 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      37	  31,614,578 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyBelowLimit-10    	      52	  22,252,627 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyBelowLimit-10     	      43	  25,942,651 ns/op

Optimization 2

Optimization 2 is done at the cost of being inaccurate. For example, with Benchmark 1, the actual byte size is 1010000 while the estimated byte size is 1011010. Delta is expected to be larger in real situation as we will have more metadata with logs.

(The original cost is not high either, so optimization 2 is not essential)

// Before optimization
BenchmarkSplittingBasedOnItemCountManyLogsSlightlyAboveLimit-10    	      49	  24,918,513 ns/op
BenchmarkSplittingBasedOnByteSizeManyLogsSlightlyAboveLimit-10     	       1	3,388,820,625 ns/op

// After optimization
BenchmarkSplittingBasedOnItemCountManySmallLogs-10    	     340	   3470374 ns/op
BenchmarkSplittingBasedOnByteSizeManySmallLogs-10     	     493	   2784610 ns/op

Link to tracking issue

Fixes #3262

Testing

Documentation

TODO: ByteSize() should return int64 instead of int

Copy link

codecov bot commented Jan 5, 2025

Codecov Report

Attention: Patch coverage is 5.88235% with 192 lines in your changes missing coverage. Please review.

Project coverage is 90.98%. Comparing base (306c939) to head (43e811c).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
exporter/exporterhelper/logs_batch.go 3.75% 77 Missing ⚠️
exporter/exporterbatcher/config.go 0.00% 8 Missing and 4 partials ⚠️
exporter/internal/queue/default_batcher.go 75.00% 2 Missing and 1 partial ⚠️
exporter/exporterhelper/internal/request.go 0.00% 2 Missing ⚠️
exporter/exporterhelper/logs.go 0.00% 2 Missing ⚠️
exporter/exporterhelper/metrics.go 0.00% 2 Missing ⚠️
exporter/exporterhelper/traces.go 0.00% 2 Missing ⚠️
...xporter/exporterhelper/xexporterhelper/profiles.go 0.00% 2 Missing ⚠️
pdata/pcommon/generated_instrumentationscope.go 0.00% 2 Missing ⚠️
pdata/pcommon/generated_resource.go 0.00% 2 Missing ⚠️
... and 43 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12017      +/-   ##
==========================================
- Coverage   91.70%   90.98%   -0.73%     
==========================================
  Files         455      455              
  Lines       24053    24253     +200     
==========================================
+ Hits        22058    22066       +8     
- Misses       1625     1812     +187     
- Partials      370      375       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sfc-gh-sili sfc-gh-sili force-pushed the sili-byte-size branch 3 times, most recently from 812f050 to d156151 Compare January 8, 2025 02:49
@sfc-gh-sili sfc-gh-sili force-pushed the sili-byte-size branch 2 times, most recently from 6f4e460 to e86780d Compare January 14, 2025 02:19
@sfc-gh-sili
Copy link
Contributor Author

sfc-gh-sili commented Jan 14, 2025

Feedback from Bogdan:

  • Performance number looks bad.
  • We should be able to do Optimization 2 without compromise to accuracy (take a look at the Size() implementation)
  • Cache size in request (like Optimization 2)

@sfc-gh-sili sfc-gh-sili force-pushed the sili-byte-size branch 2 times, most recently from a8a28cd to 716bbda Compare January 15, 2025 02:53
@sfc-gh-sili sfc-gh-sili changed the title [POC] exporter batcher - byte size based batching [POC][DEPRECATED] exporter batcher - byte size based batching Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processors/batch] Size calculations inconsistent, causing unbounded batch size in bytes
1 participant