ISB-CGC Notebooks

What’s a notebook?

Notebooks provide an interface to an interactive analysis environment. They are a mix of code (usually R or Python), descriptive explanations, and visualizations. They’re often used to demonstrate an analysis in a step by step fashion. We provide a set of notebooks below as tutorials for several frequently run analyses. You can run these through Jupyter Lab, R Studio, or Google Colaboratory.

I’m a novice, how do I…

Get started fast? Python R
Find GDC file locations? Python R
Plot a BigQuery result? Python R
Plot a heatmap using data in BigQuery? Python R
Work with cloud storage? Python  
Create cohorts of patients? Python R
Use PyPika or dbplyr to build a query? Python R
Create a complex cohort? Python R
Join multiple tables? Python  
Get started working with the COSMIC datasets? Python  
Convert a .bam file to a .fastq file with samtools? Python  
Find a GA4GH Tool Repository Service (TRS) tool? Python  
Run workflow execution service (WES) tools? Python  
Use the ISB-CGC APIs? Python R
Explore CPTAC protein abundances? Python  
Compare protein and gene expression in CPTAC? Python  

I’m an advanced user, how do I…

Make a BigQuery table from an NCBI GEO data set? Python  
Compare cohorts with survival analysis and feature comparison? Python R
Run an ANOVA with BigQuery?* Python R
Score gene sets in BigQuery?* Python R
Correlate gene expression and copy number variation? Python  
Compute gene-gene expression correlation using BigQuery? Python  
Create randomized subsets of patients using BigQuery? Python R
Convert a 10X scRNA-seq bam file to fastq with dsub? Python  
Quantify 10X scRNA-seq gene expression with Kallisto and BUStools? Python  
Compute Nearest Centroid Classification using BigQuery? Python R
Analyze data in the COSMIC Cancer Gene Census dataset? Python  
Use a BigQuery user defined function to perform k-means clustering? Python  
Compute correlations of protein and gene expression in CPTAC? Python  
Compare protein expression from different pipelines using CPTAC data? Python  
Calculate associations between radiomics tumor imaging features and gene expression? Python  
Analyze the correlation between gene mutations and tumor imaging features? Python  
Compare gene expression in tumor against gene expression in normal tissue? Python  

*Notebook inspired by a Query of the Month Blog post

