Create Table when Loading File though Snowflake UI — The Complete Guide
We can now load the data from file by creating the table on the run using Snowsight UI and here is the step by step guide how it can be done. This is very beneficial for quick data analytics and data discover without pre-creating the table.
The discussed feature uses built in auto schema detection to suggest the column names, data types, file format, field delimiter etc for delimited files, JSON, Parquet, Avro, and ORC files. But at this point of time, it can be done only on one file at a time.
Review the recommendations and apply caution in below scenarios -
- If you want to use the created table based on one file, for the subsequent set of similar files.
- If the file is not properly delimited or say have error records, the column names may not be suggested, data types may not be appropriate and the field delimiter detected would be random from the file.
- Semi-Structured file at this point of time is not being inferred well in few cases and the data type suggested is not of semi structured data type. This would cause hindrance in downstream analytical queries.
Let us look into each aspect in detail and for demonstration let us use CSV files and JSON file with contact details accessed from here -
https://docs.snowflake.com/en/_downloads/22c3a6290f5d1f4d97075282729f3859/data-load-internal.zip
We can use this feature in 2 approaches -
- Create table directly under database schema, by adding a file and load it
- Create table from stage file to load it
Let us observe both in different scenarios with different files.
Scenario 1 — with proper delimited csv file
1a) In navigation menu after selecting Data → Databases, select the specific database and schema and select Create → Table → From File.
1b) Add a file via Browse from local or Drag & Drop or from Stage.
1c) Let us add contacts1.csv and enter the name of the table (test_tbl) to be created and select Next.
1d) In this scenario snowflake rightly inferred the file format, data types, column names (using File Format Header option — First line contains header). Review to appreciate it and press Load.
1e) 5 rows from contacts1.csv are loaded into test_tbl. Select Done
1f) Validate the data under the loaded table option of Data Preview
Scenario 2 — with improper delimited csv file
2a) In navigation menu after selecting Data → Databases, select the specific database and schema where there are files in the named stage (demo_stage) . Select contacts3.csv file and choose Load into table option.
2b) Enter the name of the table (sample_tbl) to be created and select Next.
2c) Observe this particular file schema is not inferred well. While the contacts3.csv file is pipe delimited with 10 columns, there are 2 error records out of 5.
2d) Manually change the delimiter to vertical bar and skip one line of header. Then we will see there are 11 columns as the file is not delimited properly with auto generated names as the file format header line is skipped.
2e) If we Load it, 5 rows from contacts3.csv are loaded into sample_tbl. Select Done
2f) Validate the data under the loaded table option of Data Preview and can find the error records loaded as well. This is because snowflake doesn’t consider them as error records as the additional data is loaded in 11the column that is inferred.
Scenario 3— with semi-structure json file
3a) In navigation menu after selecting Data → Databases, select the specific database and schema where there are files in the named stage (demo_stage) . Select contacts.json file and choose Load into table option.
3b) Enter the name of the table (sample_json) to be created and select Next.
3c) We get to see parsing error, as snowflake is not able to determine the semi structure data type while loading an array, even though the file format is determined as JSON.
3d) Manually change the file format option strip outer array to True. Then we get to see customer data is detected and though it’s an object type the data type is still inferred as VARCHAR. Here one would expect the data type as VARIANT or OBJECT, so manually change it.
3e) If we Load it, 3rows from contacts.json are loaded into sample_tbl. Select Done
Note: As this feature is currently in Public Preview, Snowflake could make a note of the short comings and fix them before pushing to General Availability.
References: https://docs.snowflake.com/en/user-guide/data-load-web-ui#label-data-load-from-stage-to-table
Follow and Clap if you like the content and feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.