Query of the Month

Welcome to the ‘Query of the Month’ where we’ll be creating a collection of new and interesting queries to demonstrate the powerful combination of BigData from the NCI cancer programs like TCGA, and BigQuery from Google.

NOTE! We mostly spend time producing notebooks for our community collection. Check it out: https://github.com/isb-cgc/Community-Notebooks ReadTheDocs: https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowTos.html

Query of the Month is produced by the ISB-CGC team, with special effort by:

  • David L Gibbs (david.gibbs ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)

  • Kawther Abdilleh (kawther.abdilleh ( ~ at ~ ) gdit (~ dot ~) com)

  • Sheila M Reynolds (sheila.reynolds ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)

Table of Contents


  • July2019: New notebooks added, cohorts and GEO data

  • June2019: Community Notebooks launched!

  • February2019: BigQuery in R - a refresher

  • January2019: Bam slicing in a cloud hosted python notebook.


  • December2018: BigQuery Tips & Tricks

  • November2018: Transform VCF (DNA variants) files to BigQuery.

  • October2018: Jupyter notebooks & Dataproc clusters … in the cloud.

  • September2018: R scripts in the cloud.

  • August2018: Using BigQuery ML in a shiny app.

  • July2018: First look: BigQuery ML.

  • June2018: Processing bam files using WDL ‘scatter and gather’.

  • May2018: Processing bam files using CWL ‘scatter and gather’.

  • April2018: Running CWL workflows in the cloud.

  • March2018: Machine learning classifer in BigQuery?! Top Scoring Pairs implementation.

  • February2018: BioCircos shiny app, showing pairwise correlations within a pathway.

  • January2018: Gene Set Scoring in BigQuery, using the new hg38 mutation tables.


  • December2017: BigQuery comparing TCGA samples to GTEx tissues with Spearman correlation.

  • November2017: Run an R (or python) script in batch mode using dsub on the google cloud.

  • October2017: Using plotly for visualization in Shiny apps. We implement an interatictive heatmap using heatmaply

  • September2017: We implement a new statistical test in BigQuery: the one-way ANOVA.

  • August2017: A small demo application using BigQuery as the backend for a Shiny app.

  • July2017: Look at the BigQuery RECORD data type in methylation tables from the GDC.

  • March2017: BigQuery to compute a pairwise distance matrix and a heatmap in R

  • February2017: Using BigQuery, define K-means clustering as a user defined (javascript) function

  • January2017: Comparing Standard SQL and Legacy SQL.


  • December2016: Spearman correlation in BigQuery to compare the new hg38 expression data to the hg19 data