Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[608] Ensure column stats for decimal fields have proper scale set #617

Conversation

the-other-tim-brown
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

Addresses bug found in #608

When converting the column stats to bytes in Iceberg, the BigDecimal#unscaledValue is called. If the value has the wrong scale, then the serialization and then deserialization will result in a different value.

Brief change log

  • Set the scale when reading back the column stats from the Hudi Metadata Table
  • Set the scale when parsing the Delta Log stats

Verify this pull request

  • Fixed incorrect unit test setup for Hudi column stats extraction
  • Added new tests for converting the column stats values for Delta Lake

@@ -242,18 +244,11 @@ private static Object castObjectToInternalType(Object value, InternalType valueT
return value;
}

private static BigDecimal numberTypeToBigDecimal(Object value) {
private static BigDecimal numberTypeToBigDecimal(Object value, InternalSchema schema) {
// BigDecimal is parsed as Integer, Long, BigInteger and double if none of the above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this comment ?
// BigDecimal is parsed as Integer, Long, BigInteger and double if none of the above.

int precision = (int) schema.getMetadata().get(InternalSchema.MetadataKey.DECIMAL_PRECISION);
int scale = (int) schema.getMetadata().get(InternalSchema.MetadataKey.DECIMAL_SCALE);
return new BigDecimal(String.valueOf(value), new MathContext(precision))
.setScale(scale, RoundingMode.CEILING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be RoundingMode.HALF_UP by default ? Or any reason for choosing RoundingMode.CEILING ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to switch this all to use UNNECESSARY where possible

? convertBytesToBigDecimal((ByteBuffer) maxValue, DECIMAL_WRAPPER_SCALE)
: maxValue;
? convertBytesToBigDecimal((ByteBuffer) maxValue, scale)
: ((BigDecimal) maxValue).setScale(scale, RoundingMode.CEILING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on choosing CEILING above HALF_UP for RoundingMode.

@the-other-tim-brown the-other-tim-brown force-pushed the 608-decimal-stats-parsing branch from cec320f to c8a825f Compare January 7, 2025 15:22
@the-other-tim-brown the-other-tim-brown merged commit 8c143a7 into apache:main Jan 7, 2025
2 checks passed
@the-other-tim-brown the-other-tim-brown deleted the 608-decimal-stats-parsing branch January 7, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants