Member-only story

Running FASTQC in Snowflake

Daniel Mannino
5 min readNov 28, 2024

If you aren’t a member, you can read for free here.

Disclaimer: The views expressed in this article are mine alone and do not necessarily reflect the view of my current, former, or future employers. The content of this article is for demonstration purpose only.

This article explains how to run FASTQC in Snowflake. FASTQC is program that identifies potential problems in high throughput sequencing datasets — that is, in datasets generated by machines that sequence DNA and RNA molecules.

We are going to analyse a sample dataset by running FASTQC as a docker container directly inside Snowflake. Then we will build a Streamlit applications to visualise the results of the analysis.

This article assume that the reader has good knowledge of Snowflake. To run the code, the reader must have access to a paid Snowflake account, and have installed both the Snowflake CLI and snowsql.

We start by setting up our environment in Snowflake.

Initial setup

Let’s start by logging into Snowflake and creating a database, a schema, an image repository, a stage, and a warehouse using Snowsight.

use role sysadmin;
create database demo_fastqc;
create schema fastqc;
CREATE IMAGE REPOSITORY img_repo;
CREATE STAGE file DIRECTORY = ( ENABLE = true );
CREATE OR REPLACE WAREHOUSE xs WAREHOUSE_SIZE=XSMALL INITIALLY_SUSPENDED=TRUE;

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Daniel Mannino
Daniel Mannino

Written by Daniel Mannino

I am a cloud-native analytics architect and my goal is to bring companies from drowning in data to swimming in innovation

No responses yet

Write a response