-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java API] Rough edges when partitioning by time types #11899
Comments
I think the issue here is that the Copy constructor for GenericRecord does not do type checking. The accessor is failing because the Generic record has an illegal object in it. We should have failed when the LocalDateTime is inserted because that is not a valid Iceberg type that can be in a struct and doesn't match the Iceberg type of Timestamp (Long microseconds from epoch). See here for the Java Classes used in the reference lib for various Iceberg Types iceberg/api/src/main/java/org/apache/iceberg/types/Type.java Lines 31 to 49 in 821aec3
So to fix this you need to just convert LocalDateTime into the actual Iceberg Type before putting it in the generic record. LocalDateTime val = LocalDateTime.parse("2024-10-08T13:18:20.053");
Long epochMicros = DateTimeUtil.microsFromTimestamp(val)
Record rec = GenericRecord.create(schema).copy(
ImmutableMap.of(
"year", epochMicros,
"day", epochMicros)); |
@RussellSpitzer I noticed that in Kafka connect, the Value values corresponding to TimestampType are all LocalDateTime. Is there a problem with this approach. |
That code uses an additional conversion here Line 47 in e769add
To convert the Date Time objects to Long before determining the partitioning or writing iceberg/data/src/main/java/org/apache/iceberg/data/InternalRecordWrapper.java Lines 56 to 61 in c07f2aa
|
Apache Iceberg version
1.7.1 (latest release)
Query engine
Other
Please describe the bug 🐞
We've been developing an Iceberg connector at Apache Beam using the Java API, and I noticed some rough edges around partitioning by time types (i.e. year, month, day or hour).
See the following code:
I'm applying a simple partition to my original record and would expect it to work normally, but the last line fails with the following error:
We've been able to work around it with this logic, replicated below:
Work-around
So that instead we have this:
This feels a little hacky and I would expect the Iceberg API to handle this by itself. Let me know if I'm missing something!
Willingness to contribute
The text was updated successfully, but these errors were encountered: