Running Workflows on ISB-CGC¶
The ISB-CGC platform is intentionally designed to be a light-weight infrastructure above the Google Cloud Platform (GCP). As a result, our end-users have at their disposal all of the tools and technologies that the GCP has to offer. This includes full sudo acccess to Google Cloud Compute Engines and Virtual Machines. For more detailed information about GCP tools and technologies, please see their in-depth documentation page.
We have put together some basic documentation and workflow examples for ISB-CGC users who are both new to the GCP as well as to the commonly used workflow languages in -omics research. We have chosen the workflow languages of CWL, NextFlow, Snakemake, WDL, and GeneFlow. If there are other workflow languages you are interested in, please let us know and we’ll put together some examples for them as well (email us at email@example.com).
The links in this section: (1) help new users get familiar with the Google Cloud Platform tools and technologies (virtual machines and cloud storage) necessary to run workflows with your ISB-CGC Google Cloud Platform project, (2) provide a cheatsheet of the packages and dependencies required on the virtual machines to successfully execute the workflow examples provided below. (3) Calculating cloud costs before running large workflows is very crucial. We provide here some tips and tricks to determining costs before running workflows.
Example introductory bioinformatics workflows¶
We have put together these examples to (1) demonstrate the syntax of these workflow languages (2) illustrate them with two commonly used bioinformatics and -omics use-cases (RNAseq and Blast) (3) allow users to adapt these examples to their own use-cases.
Example workflows accessing data in Google Cloud Storage¶
We demonstrate here: (1) how to use the Google Cloud Storage FUSE (GCSFuse) tool to mount a Cloud Storage bucket to your virtual machine instance. Despite the fact that the Cloud storage buckets are object store, once mounted it behaves like a persistent disk. (2) how to use GCSFuse to mount a cloud storage bucket that contains CCLE open-access bam files to a virtual machine to run on your workflow of choice.