Snowflake Infrastructure with Security made easy
Designing a snowflake solution using a version control tool, integrating with external storages and workflow management platforms; managing the infrastructure of your cloud resources can be a challenge without compromising on security aspects.
It would no longer be that complex if you go through this POV.
Getting started, following are the services taken into consideration:
Dbt Cloud: Integrated Developer Environment to transform, test and document data in the Snowflake environment
Github: Version controlling platform for storing code, tracking changes and collaborating with development team
GCP: Cloud platform on which snowflake is hoisted and holds the data in GCS buckets
Airflow: Management platform for scheduling and orchestration of data pipelines or workflows.
Terraform: Infrastructure as Code tool used to manage Snowflake objects like databases, schemas, warehouses, roles, grants, and users in an automated and source-controlled way
Vault by HashiCorp: Secrets management tool controlling access to passwords, certificates and API keys integrating with Snowflake, Github and Airflow
To set up the infrastructure for Snowflake Deployment, follow the below steps:
1. Deploy Category Specific Repo
i. Deploy GithubRepository
ii. Integrate dbtCloud account to Github account
iii. Install terraform and set up
2. Category Specific GCP Project
i. Request GCP project if not already created
ii. Request Namespace and haveAirflow instance created
iii. Create GCS Buckets
iv. Set up Google Artifact Registry
3. Snowflake Setup
i. Deploy Snowflake Databases and Roles
ii. Deploy Storage Integration
4. Fork Vault Repo to set up for service accounts
Let us go through in detail how the Snowflake Development Project is setup and stabilized -
New Snowflake Project Setup through Terraform:
In order to start a new project in Snowflake, you must first have a database created. Along with the creation of a database, you will also need a Snowflake role, some permissions, or grants to that role, as well as a Warehouse (which is a compute cluster). The following outlines the files that need to be created or modified in order to get started:
- Create a terraform file called terraform/databases/<database_name>.tf
This file should contain the necessary terraform resources to have the database created along with some grants. The following is an example of the resources needed in this file:
// Creates Database
resource snowflake_database “<db_name>_db” {
name = “<DB_NAME>”
}
// Creates Grant which provides USAGE privilege to both DEVELOPER role and the newly created project role
resource snowflake_database_grant<db_name>_usage_grant {
database_name = “<DB_NAME>”
privilege = “USAGE”
roles = [“DEVELOPER”, “<ROLE_NAME>”]
depends_on = [snowflake_database.<db_name>]
}
// Creates Grant which provides MODIFY privilege to the newly created project role
resource snowflake_database_grant<db_name>_modify_grant {
database_name = “<DB_NAME>”
privilege = “MODIFY”
roles = [“<ROLE_NAME>”]
depends_on = [snowflake_database.<db_name>]
}
// Creates Grant which provides CREATE SCHEMA privilege to the newly created project role
resource snowflake_database_grant<db_name>_create_schema_grant {
database_name = “<DB_NAME>”
privilege = “CREATE SCHEMA”
roles = [“<ROLE_NAME>”]
depends_on = [snowflake_database.<db_name>]
}
// Create Grant which provides USAGE privilege on all future schemas to the DEVELOPER role
resource snowflake_schema_grant<db_name>_schemas_grant {
database_name = “<DB_NAME>”
privilege = “USAGE”
roles = [“DEVELOPER”, ]
on_future = true
depends_on = [snowflake_database.<db_name>]
}
// Create Grant which provides USAGE privilege on all future tables to the DEVELOPER role
resource snowflake_table_grant<db_name>_tables_grant {
database_name = “<DB_NAME>”
privilege = “SELECT”
roles = [“DEVELOPER”, ]
on_future = true
depends_on = [snowflake_database.<db_name>]
}
// Create Grant which provides USAGE privilege on all future views to the DEVELOPER role
resource snowflake_view_grant<db_name>_views_grant {
database_name = “<DB_NAME>”
privilege = “SELECT”
roles = [“DEVELOPER”, ]
on_future = true
depends_on = [snowflake_database.<db_name>]
}
2. Modify the terraform/roles.tf file to include the new role to be created for your database. The role name should be the same as the database name. Follow this example replacing <db_name> and <role_name> with your database name:
“<DB_NAME>” = {
name = “<ROLE_NAME>”
comment = “Role for <db_name>.”
}
3. If the previously created role needs to be assigned to any users, such as an admin for this project, modify the terraform/role_grants.tf file to include the role grants required. Follow this example:
“<ROLE_NAME>” = {
role_name = “<ROLE_NAME>”
roles = [ ]
users = [
“<USER1>@MAIL.COM”,
“<USER2>@MAIL.COM”,
]
}
4. Modify the terraform/warehouses.tf file to include the new warehouse that is to be created for the new database. The name of the warehouse should be the same as the database with a suffix of ‘_WH’. Follow this example:
“<DB_NAME>_WH” = {
name = “<DB_NAME>_WH”
comment = “Warehouse for <db_name> database”
warehouse_size = “small”
auto_suspend = 300
auto_resume = true
}
5. Modify the terraform/warehouse_grants.tf file to allow the newly created role to use the newly created warehouse. Follow this example:
“<DB_NAME>_WH_GRANT” = {
warehouse_name = “<DB_NAME>_WH”
privilege = “USAGE”
roles = [
“<ROLE_NAME>”
]
with_grant_option = false
}
Snowflake Service Account Setup through Vault:
In order to start building pipelines and to execute code programmatically, you will need to set up a Snowflake service account. This is done dynamically through Vault after some initial setup in the vault repo:
- Create a file called “database/snowflake-dev — <sf-role>.tf” that contains the following terraform resource. Be sure to replace <sf-role> and <sf_role> with your Snowflake Role (please keep dashes and underscores consistent) and <sf_warehouse> with the Snowflake Warehouse:
resource “vault_database_secret_backend_role” “snowflake-dev — <sf-role>” {
backend = vault_mount._.path
db_name = local.snowflake_dev
name = “snowflake-dev — <sf-role>”
default_ttl = 1 * 60 * 60
max_ttl = 24 * 60 * 60
creation_statements = [
“CREATE USER \”{{name}}\” PASSWORD = \”{{password}}\” DAYS_TO_EXPIRY = {{expiration}} DEFAULT_ROLE=<sf_role> DEFAULT_WAREHOUSE=<sf_warehouse>”,
“GRANT ROLE <sf_role> TO USER \”{{name}}\””,
]
}
2. Create a new policy that contains the path to where the credentials can be accessed. The file name should follow this format “policies/database — snowflake-dev — <sf-role>.hcl” and the contents of the file should be as follows:
path “database/creds/snowflake-dev — <sf-role>” {
capabilities = [“read”]
}
3. If you do not already have an existing app-role(ex: Github Actions), you will be required to create one app-role to add the policy to.
(ex. Github Actions app-role looks like :github — <my_repo_name>.yaml)
Once the app-role is created, add this policy to the vault app-role.
policies:
- database — snowflake-dev — <sf-role>
Note : if you already have a file, append the above line to the policies under github — <my_repo_name>.yaml.yaml,
4. Once the app-role is merged into Vault, add the secret-id to your Github repositories secret storage. The role-id by comparison is not as sensitive, and can be used as plaintext in your Github Action workflow (if you so choose).
VAULT_ROLE : role-id
VAULT_SECRET : secret-id
Snowflake Service Account usage:
Once setup is complete, user credentials will be dynamically created and passed through to Github Actions or Kubernetes as with any other secrets stored in Vault. Please see the following example to understand how to get dynamic credentials for Github Actions.
- In your Actions workflow file, be sure to have the following 2 Actions that are required to connect to Vault. All that needs to be added are the last 2 lines that are in italics (for username and password):
- id: oidc
uses: colpal/actions-google-sa-to-oidc@v1
with:
sa_key: ${{ secrets.VAULT_IAP_SA }} # COPY
target_audience: ${{ secrets.VAULT_IAP_CLIENT_ID }} # COPY
- id: vault
uses: hashicorp/vault-action@v2.1.0
with:
url: https://public.vault.colpal.cloud # COPY
method: approle
roleId: ${{ secrets.VAULT_ROLE }}
secretId: ${{ secrets.VAULT_SECRET }}
exportEnv: false
extraHeaders: “Authorization: Bearer ${{ steps.oidc.outputs.token }}”
secrets: |
database/creds/snowflake-dev — <sf-role>username;
database/creds/snowflake-dev — <sf-role>password;
2. In the action that needs to access the generated credentials, the username and password from the previous step can be referenced and injected into the environment as follows:
- id: Python Script
env:
SF_USERNAME: ${{ steps.vault.outputs.username }}
SF_PASSWORD: ${{ steps.vault.outputs.password }}
run: python3 ./snowflake_connection.py
3. Now you will be able to access the newly created Snowflake username and password as an environment variable. These credentials will expire after 24 hours but every time the workflow runs, new credentials will be generated.
Loading data from GCS to Snowflake:
In order to start loading data from a GCS bucket to a Snowflake database, you will have to:
- Create a Snowflake Storage integration. In order to create this integration, modify the terraform/storage_integrations.tf file to include a storage integration terraform resource. Follow this example:
resource snowflake_storage_integration<db_name>_integration {
name = “<DB_NAME>_GCS_STOR_INT”
comment = “GCS Storage integration for <db_name>”
type = “EXTERNAL_STAGE”
enabled = true
storage_allowed_locations = [“gcs://<bucket_name/path>”]
storage_provider = “GCS”
}
- Grant your newly created role permission to use the newly created Storage integration. In order to do this you must modify the terraform/storage_integration_grants.tf file to include the grant required. Follow this example:
resource snowflake_integration_grant<db_name>_stor_int_grant {
integration_name = “<DB_NAME>_GCS_STOR_INT”
privilege = “USAGE”
roles = [“<ROLE_NAME>”]
with_grant_option = false
}
2. See to that the appropriate Snowflake GCP service account have the necessary permissions on the source GCS bucket.
3. Once permissions are granted on the storage integration and grant have been created, you can now create an External Stage in your project and load data from GCS to Snowflake.
Deploying Snowflake Objects:
You can start deploying the snowflake objects seamlessly with security well taken care -
1) The code is maintained in the Github repository by the users
a) Any change to the code is then managed by branching/forking the repository and then added to the master branch of the github repository.
2) Object Deployment
a) Snowflake Objects: Once in the master branch, github action is configured to apply the changes through Terraform to the objects in Snowflake database objects.
b) Airflow: Changes in the code is directly pused to the kubernetes cluster through github actions as well. There is a Category / Function specific GCP project on which the airflow instance is hosted
3) GCS Buckets: These are managed outside the Githubrepository and the following conditions are met
a) Read/Write Access to the Snowflake Service Account
b) Create Storage Integration through Terraform
Clap if you like the content and follow me for more such content. Feel free to ask if you have any questions in the comments. I will be more than happy to assist and guide you.