Query of the Month

Welcome to the ‘Query of the Month’ where we’ll be creating a collection of new and interesting queries to demonstrate the powerful combination of BigData from the NCI cancer programs like TCGA, and BigQuery from Google.

Please let us know if you have an idea or a suggestion for our next QotM!

Query of the Month is produced by:

  • David L Gibbs (david.gibbs ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)
  • Sheila M Reynolds (sheila.reynolds ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)

Table of Contents

2018

  • October: Jupyter notebooks & Dataproc clusters … in the cloud.
  • September: R scripts in the cloud.
  • August: Using BigQuery ML in a shiny app.
  • July: First look: BigQuery ML.
  • June: Processing bam files using WDL ‘scatter and gather’.
  • May: Processing bam files using CWL ‘scatter and gather’.
  • April: Running CWL workflows in the cloud.
  • March: Machine learning classifer in BigQuery?! Top Scoring Pairs implementation.
  • February: BioCircos shiny app, showing pairwise correlations within a pathway.
  • January: Gene Set Scoring in BigQuery, using the new hg38 mutation tables.

2017

  • December2017: BigQuery comparing TCGA samples to GTEx tissues with Spearman correlation.
  • November2017: Run an R (or python) script in batch mode using dsub on the google cloud.
  • October2017: Using plotly for visualization in Shiny apps. We implement an interatictive heatmap using heatmaply
  • September2017: We implement a new statistical test in BigQuery: the one-way ANOVA.
  • August2017: A small demo application using BigQuery as the backend for a Shiny app.
  • July2017: Look at the BigQuery RECORD data type in methylation tables from the GDC.
  • May2017: Continued from April: estimating the distance between samples based on shared mutations in pathways.
  • April2017: BigQuery compute a similarity metric on overlapping mutations between samples. Uses MC3 mutation table and data from COSMIC.
  • March2017: BigQuery to compute a pairwise distance matrix and a heatmap in R
  • February2017: Using BigQuery, define K-means clustering as a user defined (javascript) function
  • January2017: Comparing Standard SQL and Legacy SQL.

2016

  • December2016: Spearman correlation in BigQuery to compare the new hg38 expression data to the hg19 data