Snowflake Iceberg Tables — Powering open table format analytics

Snowflake Wiki
4 min readDec 29, 2023

--

Snowflake Data Cloud makes it easy to execute big data workloads using numerous file formats, including Parquet, Avro, ORC, JSON, and XML. While Snowflake’s internal, fully managed table format greatly simplify the storage maintenance like encryption, transactional consistency, versioning, fail-safe, and time travel, some organizations with regulatory or other constraints either are not able to store all of their data in Snowflake or prefer to store data externally in open formats.

Apache Iceberg is currently supported in public preview by the Snowflake Data Cloud with Iceberg Tables. Iceberg Tables combine the performance and familiar query semantics of Snowflake tables with customer-managed cloud storage.

Tables for different use cases in Snowflake

Snowflake users don’t have to contend with common barriers that stand in the way of realizing the true value of their data. Snowflake makes it possible to eliminate siloed data, securely share complex data sets internally and with outside data partners, and run large-scale analytics tasks on massive data sets quickly and efficiently.

Iceberg Catalog options in Snowflake

An Iceberg catalog enables a compute engine to manage and load Iceberg tables. Snowflake supports one object, multi catalog options.

You can use Snowflake-managed Snowflake as the Iceberg catalog, or use Externally managed catalog integration to connect Snowflake to an external Iceberg catalog like AWS Glue or to Iceberg metadata files in object storage.

Use Snowflake as the Iceberg catalog

~ An Iceberg table that uses Snowflake as the Iceberg catalog provides full Snowflake platform support with read and write access.
~ The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume.
~ Snowflake handles all life-cycle maintenance, such as compaction, for the table.

Use catalog integration

~ An Iceberg table that uses a catalog integration provides limited Snowflake platform support with read-only access.
~ The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume.
~ Snowflake does not assume any life-cycle management on the table. With this table type, Snowflake uses the catalog integration to retrieve information about your Iceberg metadata and schema.

Iceberg Table Concepts in Snowflake

=> Iceberg tables store their data and metadata files in an external cloud storage location (Amazon S3, Google Cloud Storage, or Azure Storage).

=> Snowflake securely connects to your cloud storage with an external volume to access table data, Iceberg metadata, and manifest files that store the table schema, partitions, and other metadata.

=> An external volume is a named, account-level Snowflake object that stores an identity and access management (IAM) entity for your external cloud storage. A single external volume can support one or more Iceberg tables.

=> A catalog integration is a named, account-level Snowflake object that defines the source of metadata and schema for an Iceberg table when you do not use Snowflake as the Iceberg catalog. A single catalog integration can support one or more Iceberg tables.

=> Create Iceberg Table -

CREATE ICEBERG TABLE <table name>
EXTERNAL_VOLUME='<external volume name>'
CATALOG= 'SNOWFLAKE'/'<catalog integration name>'
BASE_LOCATION/CATALOG_TABLE_NAME/METADATA_FILE_PATH ='<external volume directory>/<AWS Glue table>/iceberg metadata file';

References:

https://www.snowflake.com/build/asia/

https://docs.snowflake.com/en/user-guide/tables-iceberg

Follow and Clap if you like the content and feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.

--

--

Snowflake Wiki

Snowflake Basics | Features | New releases | Tricks & Tips | SnowPro Certifications | Solutions | Knowledge Sharing | ~~~ By satitiru (Snowflake DataSuperHero)