Redundant Joins Elimination in Snowflake — The Complete Understanding Guide
SNOWFLAKE only enforces NOT NULL constraint and other constraints of UNIQUE, PRIMARY KEY and FOREIGN KEY are not enforced. Many a times the question is that what is the use of them apart form logical modeling aspect and why take an effort to define them and as a user manage them. The answer for it is to help snowflake queries run in an optimized manner when applicable. Lets see more abut it here, how Snowflake does this and in which cases it is applicable.
Creating emp_tbl & dept_tbl with PK and FK constraints -
create or replace table dept_tbl (
dept_id int, dept_name string,
constraint pk_dept primary key (dept_id)
create or replace table emp_tbl (
emp_id int, emp_name string, dept_id int,
constraint pk_emp primary key (emp_id),
constraint fk_deptno foreign key (dept_id) references dept_tbl (dept_id)
insert into dept_tbl values (101, ‘Dep1’), (102, ‘Dep2’), (103, ‘Dep3’);
insert into emp_tbl values (1, ‘Name1’, 101), (2, ‘Name2’, 102), (3, ‘Name3’, 103);
Checking the constraints
Describing the table shows if any constraints effective on the column, but not the name of the constraint or foreign key dependency.
desc table emp_tbl;
To get the constraint name and referential integrity constraint defined for a table, we can refer to the information schema view table_constraints
select constraint_name, table_name, constraint_type, enforced, rely
from information_schema.table_constraints order by created asc;
Lets consider below 3 illustrations where there are redundant joins that are not necessary to retrieve the data with the constraints defined.
1) SQL statement referring the columns of only 1 table, while joining both the tables on PK column (unnecessary to join on dept_tbl) -
select e.* from emp_tbl e
left outer join dept_tbl d
on e.dept_id = d.dept_id;
2) SQL statement with self join on PK column (unnecessary to join on the same table) -
select e1.emp_id, e2.emp_name
from emp_tbl e1, employee_tbl e2
where e1.emp_id = e2.emp_id;
3) SQL statement refering to only the PK column from table on the right, while joining on FK column and PK column -
select e.emp_id, e.emp_name, d.dept_id
from emp_tbl e, dept_tbl d
where e.dept_id = d.dept_id;
Below are the query plans for the above mentioned 3 illustrations
RELY Constraint Property
If the user is confident on the data integrity that the data is true to the nature of the constraints defined, the user can set the RELY Constraint property to eliminate redundant joins.
alter table dept_tbl alter constraint pk_dept rely;
alter table emp_tbl alter constraint pk_emp rely;
alter table emp_tbl alter constraint fk_deptno rely;
Validate the RELY property is enabled by quering on information schema view table_constraints.
By executing the same 3 illustration queries disabling the result cache (alter session set use_cached_result = false;), now we can observe that joins and references to tables that are not necessary are eliminated.
In the query plan also note that even on this very small amount of data, few milli-seconds performance improvement can be seen. The performance improvement gain will be huge when the tables are huge. As a user especially the illustration 3 queries are very common in real time project and you can benefit a lot by this optimization technique.
Important Note: This redundant join elimination is right now applicable for only select statements. If you use the same select statement for creation of table or insert into table, the redundant join elimination is not happening.
Example — Taking the same illustration query 1 after rely, can observe the join elimination doesn’t happen for insert statement.
Follow and Clap if you like the content and feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.