Guide to Create Snowflake managed Iceberg Tables

Iceberg Tables, a new type of Snowflake table in public preview, bring Snowflake’s easy platform management and great performance to data stored externally in the open source Apache Iceberg format. Here is the step by step guide for beginners to build, open data lakehouse with Snowflake, Iceberg, and Amazon.

To understand how Snowflake Iceberg Tables power open table format analytics do check here - https://medium.com/@snowflakewiki/snowflake-iceberg-tables-powering-open-table-format-analytics-784f3d276fb4

Here in this guide we create Snowflake managed Iceberg table, accessing Amazon S3 using external volume.

An external volume is a Snowflake object that stores information about your cloud storage locations and identity and access management (IAM) entities (for example, IAM roles). Snowflake uses an external volume to establish a connection with your cloud storage in order to access Iceberg metadata and Parquet table data.

1) Create Amazon S3 Bucket

Navigate to S3 service and create a bucket with unique name

Note: Snowflake managed iceberg tables are supported if S3 bucket location is in the same region that hosts your Snowflake account. Else you encounter an error mentioned in step 7a)

For this demo created amazon bucket as iceberg-sfdemo and within it a folder iceberg-folder

Make a note of the S3 URL and here it is s3://iceberg-sfdemo/iceberg-folder/

2) Create IAM policy to give access permissions to the S3 Bucket

Navigate to IAM service, click on Create policy and select JSON tab

Replace the existing code with that of below.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket>",
"Condition": {
"StringLike": {
"s3:prefix": [
"<prefix>/*"
]
}
}
}
]
}

Make further changes by replacing <bucket> to bucket name created that is iceberg-sfdemo and <prefix> to folder created that is iceberg-folder

Proceeding to next screen, enter policy name: iceberg-sfdemo-policy and select Create policy

3) Create IAM Role to grant privilege’s on S3 bucket

Navigate to IAM service, click on Roles and select Create role

Select the trusted entity as AWS account.

Following it allow entities to the existing AWS account and select Require external ID mentioning it as ‘0000’.

Note: The above options are for temporary purpose, which we would later modify in the later steps.

Next add permissions by locating the policy iceberg-sfdemo-policy created in step 2) and select the policy

Enter the name of the role as iceberg_sfrole and select Create role

Click on the role, and on Summary page copy role ARN. Here it is arn:aws:iam::530612297165:role/iceberg_sfrole

4) Create External Volume in Snowflake

Create external volume named iceberg_sfdemo_vol in Snowflake using the S3 URl created in Step 1) and role arn copied in step 3)

CREATE OR REPLACE EXTERNAL VOLUME iceberg_sfdemo_vol
STORAGE_LOCATIONS =
(
(
NAME = 'iceberg_sfdemo'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://iceberg-sfdemo/iceberg-folder/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::530612297165:role/iceberg_sfrole'
)
);

Describe the external volume created and make a note of below 2 properties -

STORAGE_AWS_IAM_USER_ARN: arn:aws:iam::211125741898:user/avfj0000-s
STORAGE_AWS_EXTERNAL_ID: ZS06294_SFCRole=2_XLdg4wLX0XKjQxFJ0ede06uRtJw=

5) Update the AWS role with snowflake user details

Back in AWS, access the role created and edit trust policy with the 2 properties noted in Step 4) and update the policy.

6) Create Iceberg Table in Snowflake

Create Iceberg table with snowflake as catalog using the external volume name created in step 4) and a base location (directory on the external volume) where Snowflake can write table data and metadata. Here it is left empty to write data and metadata to the location specified in the external volume definition in step 4)

CREATE OR REPLACE ICEBERG TABLE contacts_iceberg (
id INT,
lastname STRING,
firstname STRING,
company STRING,
email STRING,
workphone STRING,
cellphone STRING,
streetaddress STRING,
city STRING,
postalcode STRING
)
CATALOG='SNOWFLAKE'
EXTERNAL_VOLUME='iceberg_sfdemo_vol'
BASE_LOCATION='';

Check the cloud storage bucket to see a new folder created

Navigating inside the folder, a metadata folder will have 2 files v0.metadata.json and version-hint.text

7) Load data into Iceberg Table via Insert

We will load the data into Snowflake-managed Iceberg Tables through Insert command, while other ways include COPY INTO or Snowpipe.

 INSERT INTO contacts_iceberg
SELECT * FROM aws_load.public.contacts;

After loading data, navigate to cloud storage bucket to see new files.

A new folder data is created where loaded data is stored as parquet file.

Inside metadata folder, new metadata files are created -
Manifest File: avro file starting with number
Manifest List: avro file starting with snap-
Metadata File: json file having the number of manifest list

After data load into Iceberg table, try accessing it.

7) Common Errors you may encounter

7a) Error while creating a table if storage location is not in the same region of Snowflake account

7b) Error assuming AWS_ROLE when IAM role Trusted entities, refer to step 5) are not proper.

References: https://docs.snowflake.com/en/user-guide/tables-iceberg

Follow and Clap if you like the content and feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.

--

--

Snowflake Wiki
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Snowflake Basics | Features | New releases | Tricks & Tips | SnowPro Certifications | Solutions | Knowledge Sharing | ~~~ By satitiru (Snowflake DataSuperHero)