-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Update catalog docs to show automatic catalog syncs to Snowflake and Glue #549
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,11 +8,6 @@ title: "Snowflake" | |
Currently, Snowflake supports [Iceberg tables through External Tables](https://www.snowflake.com/blog/expanding-the-data-cloud-with-apache-iceberg/) | ||
and also [Native Iceberg Tables](https://www.snowflake.com/blog/iceberg-tables-powering-open-standards-with-snowflake-innovations/). | ||
|
||
:::note NOTE: | ||
Iceberg on Snowflake is currently supported in | ||
[public preview](https://www.snowflake.com/blog/build-open-data-lakehouse-iceberg-tables/) | ||
::: | ||
|
||
## Steps: | ||
These are high level steps to help you integrate Apache XTable™ (Incubating) synced Iceberg tables on Snowflake. For more additional information | ||
refer to the [Getting started with Iceberg tables](https://docs.snowflake.com/LIMITEDACCESS/iceberg-2023/tables-iceberg-getting-started). | ||
|
@@ -47,7 +42,7 @@ TABLE_FORMAT=ICEBERG | |
ENABLED=TRUE; | ||
``` | ||
|
||
### Create an Iceberg table from Iceberg metadata in object storage | ||
### Method 1: Create an Iceberg table from Iceberg metadata in object storage | ||
Refer to additional [examples](https://docs.snowflake.com/LIMITEDACCESS/iceberg-2023/create-iceberg-table#examples) | ||
in the Snowflake Create Iceberg Table guide for more information. | ||
|
||
|
@@ -58,4 +53,45 @@ CATALOG=<catalog_name> | |
METADATA_FILE_PATH='path/to/metadata/<VERSION>.metadata.json'; | ||
``` | ||
|
||
Once the table creation succeeds you can start using the Iceberg table as any other table in Snowflake. | ||
Once the table creation succeeds you can start using the Iceberg table as any other table in Snowflake. | ||
|
||
### Method 2: Using XTable APIs to sync with Snowflake Catalog directly | ||
|
||
#### Pre-requisites: | ||
|
||
* Build Apache XTable™ (Incubating) from [source](https://github.com/apache/incubator-xtable) | ||
* Download `iceberg-aws-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Clarification] Are AWS libraries required? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you suggest keeping it cloud agnostic? I have only tried with AWS S3 for Snowflake. I'm not even sure what libraries would be needed for GCP and Azure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For Snowflake, we don't need iceberg-aws, it contains integrations with glue, dynamodb etc.
For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup. XTable can already read from S3/GCS/Azure Blob/HDFS using the hadoop library dependencies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Please confirm if my understanding below is correct. If this is correct, it would be helpful to separate the prereqs into two sections: one for what XTable needs and another for the tutorial prerequisites. [1] https://www.snowflake.com/en/blog/iceberg-tables-catalog-support-available-now/ |
||
* Download `bundle-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle) | ||
* Download `iceberg-spark-runtime-3.X_2.12/X.X.X.jar` from [here](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/1.4.2/) | ||
* Download `snowflake-jdbc-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc) | ||
Comment on lines
+64
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Include AWS Java SDK for aws bundle download. |
||
|
||
Create a `snowflake-sync-config.yaml` file: | ||
|
||
```yaml md title="yaml" | ||
sourceFormat: DELTA | ||
targetFormats: | ||
- ICEBERG | ||
datasets: | ||
- | ||
tableBasePath: s3://path/to/table | ||
tableName: <table_name> | ||
namespace: <db_name>.<schema_name> | ||
``` | ||
|
||
Create a `snowflake-sync-catalog.yaml` file: | ||
|
||
```yaml md title="yaml" | ||
catalogImpl: org.apache.iceberg.snowflake.SnowflakeCatalog | ||
catalogName: <catalog_name> | ||
catalogOptions: | ||
io-impl: org.apache.iceberg.aws.s3.S3FileIO | ||
warehouse: s3://path/to/table | ||
uri: jdbc:snowflake://<account-identifier>.snowflakecomputing.com | ||
jdbc.user: <snowflake-username> | ||
jdbc.password: <snowflake-password> | ||
``` | ||
|
||
Sample command to sync the table with Snowflake: | ||
```shell md title="shell" | ||
java -cp /path/to/iceberg-spark-runtime-3.2_2.12-1.4.2.jar:/path/to/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/path/to/snowflake-jdbc-3.13.28.jar:/path/to/iceberg-aws-1.4.2.jar:/Users/sagarl/Downloads/bundle-2.23.9.jar org.apache.xtable.utilities.RunSync --datasetConfig snowflake-sync-config.yaml --icebergCatalogConfig snowflake-sync-catalog.yaml | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download AWS Java SDK bundle-X.X.X.jar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unclear from docs.