Member-only story
Running FASTQC in Snowflake
If you aren’t a member, you can read for free here.
Disclaimer: The views expressed in this article are mine alone and do not necessarily reflect the view of my current, former, or future employers. The content of this article is for demonstration purpose only.
This article explains how to run FASTQC in Snowflake. FASTQC is program that identifies potential problems in high throughput sequencing datasets — that is, in datasets generated by machines that sequence DNA and RNA molecules.
We are going to analyse a sample dataset by running FASTQC as a docker container directly inside Snowflake. Then we will build a Streamlit applications to visualise the results of the analysis.
This article assume that the reader has good knowledge of Snowflake. To run the code, the reader must have access to a paid Snowflake account, and have installed both the Snowflake CLI and snowsql.
We start by setting up our environment in Snowflake.
Initial setup
Let’s start by logging into Snowflake and creating a database, a schema, an image repository, a stage, and a warehouse using Snowsight.
use role sysadmin;
create database demo_fastqc;
create schema fastqc;
CREATE IMAGE REPOSITORY img_repo;
CREATE STAGE file DIRECTORY = ( ENABLE = true );
CREATE OR REPLACE WAREHOUSE xs WAREHOUSE_SIZE=XSMALL INITIALLY_SUSPENDED=TRUE;