Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the S3 source connector to use a later version of Kafka #383

Open
wants to merge 1 commit into
base: s3-source-release
Choose a base branch
from

Conversation

aindriu-aiven
Copy link
Contributor

@aindriu-aiven aindriu-aiven commented Jan 3, 2025

Previously this was on Version 1.1.0 and upgrading to be on version 3.3.0.

This will allow us in the future to use the exactly once kip that was introduced in 3.3.0 without causing any upgrade path issues for users.

@aindriu-aiven aindriu-aiven requested review from a team as code owners January 3, 2025 15:04
Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think testKafkaVersion is still ok

@@ -21,7 +21,7 @@ plugins { id("aiven-apache-kafka-connectors-all.java-conventions") }
val amazonS3Version by extra("2.29.34")
val amazonSTSVersion by extra("2.29.34")
val s3mockVersion by extra("0.2.6")
val testKafkaVersion by extra("3.7.1")
val kafkaVersion by extra("3.7.2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this version is applied only on tests, we had it named with test prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I changed it to apply to all kafka versions so everything will stay in sync going forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing updates :
testImplementation(apache.kafka.connect.api)
testImplementation(apache.kafka.connect.runtime)
testImplementation(apache.kafka.connect.json)

Considering previous discussions on not doing this, tagging @ivanyu @AnatolyPopov

Other options :

  • Should we apply this change at repo root level (settings.gradle.kts) gets applied on all connectors
  • Apply on s3 source and s3 sink
  • not make this change at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @muralibasani
testImplementation(apache.kafka.connect.api)
testImplementation(apache.kafka.connect.runtime)
testImplementation(apache.kafka.connect.json)
are updated on lines 83-85

I decided here to make this change only to the S3-source-connector in this PR as I am working on verifying the changes at a root level for the sink connectors.

I was hoping to have a discussion on this on Tuesday when people are back from their holidays, as to the benefits for the sink connector.

But for the source connector which can operate independently with this change, we gain access to https://kafka.apache.org/documentation/#connect_exactlyoncesourceconnectors as long as upgrade to at least 3.3.0 which is a very powerful tool for the source connectors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aindriu-aiven While I am generally in favor of upgrades (and had proposed this myself three months ago), there are a couple of considerations to keep in mind:

Compatibility: We've been developing the S3 source connector to align (formats and everything) with the S3 sink (and not operate independently always). If Kafka clusters running the S3 sink are on an older version and they attempt to use this S3 source, the connector may fail to load. This backward compatibility is critical for users who wish to integrate both connectors.

Exactly-once semantics: This is a separate concern. Even with the relevant configurations enabled, the code would still require updates to handle operations like abort, commit, and transaction boundaries. Without these changes, the value of enabling this feature is limited.

@aindriu-aiven aindriu-aiven force-pushed the aindriu-aiven/upgrade-s3-kafka-version branch 2 times, most recently from cd43bb3 to b83f078 Compare January 13, 2025 08:56
Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants