Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to vary the size of the byteArrayBuffer in source connectors #386

Open
wants to merge 2 commits into
base: s3-source-release
Choose a base branch
from

Conversation

aindriu-aiven
Copy link
Contributor

@aindriu-aiven aindriu-aiven commented Jan 13, 2025

This allows users to decide how large they wish to chunk the byte stream in the source connectors.
One change was to use the SourceCommonConfig instead of the abstractConfig in the transformers so that the new calls to get the maxbytebuffer would be available.

It also adds the stream length to the Transformer which should lead to improvements specifically in the ByteArrayTransformer.

@aindriu-aiven aindriu-aiven requested review from a team as code owners January 13, 2025 14:07
@aindriu-aiven aindriu-aiven changed the title Allow users to varify the size of the byteArrayBuffer in source connectors Allow users to varfy the size of the byteArrayBuffer in source connectors Jan 14, 2025
Copy link
Contributor

@Claudenw Claudenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

@aindriu-aiven aindriu-aiven changed the title Allow users to varfy the size of the byteArrayBuffer in source connectors Allow users to vary the size of the byteArrayBuffer in source connectors Jan 14, 2025
@aindriu-aiven aindriu-aiven force-pushed the aindriu-aiven/make-byte-buffer-configureable branch from 1821406 to e45686e Compare January 14, 2025 10:10
Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few minor comments.

@aindriu-aiven aindriu-aiven force-pushed the aindriu-aiven/make-byte-buffer-configureable branch 3 times, most recently from d662053 to 2348113 Compare January 14, 2025 12:51

final StreamSpliterator spliterator = createSpliterator(inputStreamIOSupplier, topic, topicPartition,
sourceConfig);
final StreamSpliterator spliterator = createSpliterator(inputStreamIOSupplier, streamLength, topic,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final StreamSpliterator spliterator = createSpliterator(inputStreamIOSupplier, streamLength, topic,
final StreamSpliterator spliterator = createSpliterator(inputStreamIOSupplier, objectSize, topic,

Stream does not have a length in general. Here iterator is sending object size, and in this repo, we are dealing with objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is referring to the stream which has been input into the createSpliterator method.
This is commons code and is not specific to the object size but the stream length.
changing the name would be a misnomer i feel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo mainly deals with cloud storage objects. Even if it's common, and when we use it for azure or gcp, I think objectSize or something similar looks better to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @Claudenw be the deciding vote on that one 👍

try {
final int bytesRead = IOUtils.read(inputStream, buffer);
if (bytesRead == 0) {
return false;
}
if (bytesRead < MAX_BUFFER_SIZE) {
if (bytesRead < maxBufferSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this condition

byte[] data = (bytesRead < maxBufferSize)
                        ? Arrays.copyOf(buffer, bytesRead)
                        : buffer;

                
                action.accept(new SchemaAndValue(null, data));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the readability of the code here is to be honest worth keeping the way it is, it is much more understandable what is happening

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a simple ternary operator. Should be ok to replace.

Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a few minor comments.

@aindriu-aiven aindriu-aiven force-pushed the aindriu-aiven/make-byte-buffer-configureable branch from 2348113 to 1ea3437 Compare January 15, 2025 13:38
@aindriu-aiven aindriu-aiven force-pushed the aindriu-aiven/make-byte-buffer-configureable branch from 1ea3437 to f6b6adf Compare January 15, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants