Create Table in Snowflake from Files — An Artistic Approach

Snowflake Wiki
4 min readJul 20, 2023

--

Schema Detection for CSV files is now a public preview feature and the best way to create snowflake tables is by using template with infer schema from the staged files. This feature was previously available for Parquet, Avro, ORC file formats and now enabled for CSV and JSON file formats. Here let us deep dive referencing CSV files -

Step 1) Data Preparation

To demonstrate the accuracy of this feature, I am creating the table with diverse data types, inserting sample values and downloading the table result as csv file.

create or replace table csv_demo 
(id int, name string, doj date, active_flag boolean,
record_timestamp timestamp default current_timestamp());

insert into csv_demo (id, name, doj, active_flag) values
(1, 'Name1', '2023–07–01', 'True'),
(2, 'Name2', '2023–07–02', 'false');

insert into csv_demo (id, name, doj) values
(3, 'Name3', '2023–07–03');

select * from csv_demo;

Step 2) Stage the files

Have the file staged. Either it can be internal (named or user) or external named stage.

I am creating a stage csv_stage and uploaded the downloaded file csv_file.csv using the web UI option referring to my previous post — https://snowflakewiki.medium.com/loading-files-into-stage-through-snowflake-ui-the-complete-guide-321b135f6175

Step 3) Create File Format

Create file format parsing the header without skipping it as below

create or replace file format csv_format_0 
type = csv field_delimiter = ',' parse_header = true;

Step 4) INFER SCHEMA

Execute infer schema table function to detect the file metadata schema and retrieve the column definitions in the staged file.

SELECT * FROM TABLE(INFER_SCHEMA(
LOCATION=>'@csv_stage/csv_file.csv'
, FILE_FORMAT=>'csv_format_0'));

Observe the column TYPE, with accurate data types that we defined the source table in Step 1)

Step 5) CREATE TABLE … USING TEMPLATE

Create table CSV_TBL with the column definitions derived from the staged file using template and infer schema function.

CREATE OR REPLACE TABLE CSV_TBL USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
WITHIN GROUP (ORDER BY ORDER_ID)
FROM TABLE (INFER_SCHEMA(
LOCATION=>'@csv_stage/csv_file.csv',
FILE_FORMAT=>'csv_format_0')));

Further validate the table structure using GET_DDL command

Step 6) GENERATE COLUMN DESCRIPTION

If you wish to create table manually based on the column definitions of the staged file, we can use the function GENERATE_COLUMN_DESCRIPTION.

SELECT GENERATE_COLUMN_DESCRIPTION
(ARRAY_AGG(OBJECT_CONSTRUCT(*)) WITHIN GROUP (ORDER BY ORDER_ID), 'table')
AS COLUMNS
FROM TABLE (INFER_SCHEMA(
LOCATION=>'@csv_stage/csv_file.csv',
FILE_FORMAT=>'csv_format_0'));

The function output can be used to define the columns in the table as below -

create table csv_tbl (
"ID" NUMBER(1, 0),
"NAME" TEXT,
"DOJ" DATE,
"ACTIVE_FLAG" BOOLEAN,
"RECORD_TIMESTAMP" TIMESTAMP_NTZ
);

This is very helpful if you want to just have just the syntax noted for deployment but not create the table in the snowflake environment.

Step 7) COPY into Table and Data validation

Copy the staged file from Step 2) into the CSV_TBL created in the Step 5) or Step 6) and validate the data.

copy into csv_tbl
from @csv_stage/csv_file.csv
file_format = (skip_header = 1);

select * from csv_tbl;

This way of creating the table is best put to use when you have the file with high number of columns and creating the table using template and infer schema makes work easy.

References:

https://docs.snowflake.com/en/sql-reference/sql/create-table

https://docs.snowflake.com/en/sql-reference/functions/infer_schema

Follow and Clap if you like the content and feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.

--

--

Snowflake Wiki
Snowflake Wiki

Written by Snowflake Wiki

Snowflake Basics | Features | New releases | Tricks & Tips | SnowPro Certifications | Solutions | Knowledge Sharing | ~~~ By satitiru (Snowflake DataSuperHero)

Responses (1)