-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Update catalog docs to show automatic catalog syncs to Snowflake and Glue #549
base: main
Are you sure you want to change the base?
[DOCS] Update catalog docs to show automatic catalog syncs to Snowflake and Glue #549
Conversation
@vinishjail97 can you review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
|
||
* Build Apache XTable™ (Incubating) from [source](https://github.com/apache/incubator-xtable) | ||
* Download `iceberg-aws-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Clarification] Are AWS libraries required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you suggest keeping it cloud agnostic? I have only tried with AWS S3 for Snowflake. I'm not even sure what libraries would be needed for GCP and Azure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Snowflake, we don't need iceberg-aws, it contains integrations with glue, dynamodb etc.
https://github.com/apache/iceberg/tree/main/aws/src/integration/java/org/apache/iceberg/aws
I'm not even sure what libraries would be needed for GCP and Azure
For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.
https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume#create-an-external-volume
XTable can already read from S3/GCS/Azure Blob/HDFS using the hadoop library dependencies.
https://github.com/apache/incubator-xtable/blob/main/pom.xml#L360
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.
Please confirm if my understanding below is correct.
Iceberg supports various catalogs, including JDBC and REST. The Snowflake catalog appears to be JDBC-based [1]. Therefore, when connecting XTable to the Snowflake catalog and updating Iceberg tables, a Snowflake JDBC driver should be a dependency [2]. Iceberg’s JDBC catalog clients should not need Spark or AWS dependencies. However, if someone wants to follow this tutorial end-to-end, they may need Spark runtime and AWS libraries.
If this is correct, it would be helpful to separate the prereqs into two sections: one for what XTable needs and another for the tutorial prerequisites.
[1] https://www.snowflake.com/en/blog/iceberg-tables-catalog-support-available-now/
[2] https://iceberg.apache.org/docs/1.5.0/jdbc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagarlakshmipathy Added comments.
|
||
**Pre-requisites:** | ||
* Download iceberg-aws-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws) | ||
* Download bundle-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download AWS Java SDK bundle-X.X.X.jar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Download `iceberg-spark-runtime-3.X_2.12/X.X.X.jar` from [here](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/1.4.2/) | ||
* Download `snowflake-jdbc-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include AWS Java SDK for aws bundle download.
Important Read
#548
What is the purpose of the pull request
Brief change log
Verify this pull request
npm start
locally