Cost Management

This section details a few use cases and their approximate costs in order to help users estimate cloud costs for their analyses.

Estimating Costs for Common Bioinformatics and Data Analysis Tasks

The following table summarizes order-of-magnitude costs for common data analysis tasks. For example an order-of-magnitude cost of $10 indicates that the cost can be up to $10, estimated from the given example notebooks. Estimated costs between $10 and $100 are reported as the next order of magnitude, $100:

Bioinformatics / Data Analysis Task

Dataset(s)

Tools

Approx. Cost (Max)

Identify differentially expressed genes

TCGA

BigQuery, Colab, Python, R

$1

TCGA

BigQuery, BigQuery ML, Colab, Python, R

$1 ($100) *

Train a linear regression model using gene expression data

TCGA

BigQuery, BigQuery ML, Colab, Python

$1 ($100) *

Train a deep neural network (DNN) regression model using gene expression data

TCGA

BigQuery, Colab, TensorFlow, Compute Engine w/ GPUs

$1 **

Analyze RNA-seq data using the GDC workflow

TCGA

Compute Engine, Cloud Storage, CWL

$10 ***

  • *BigQuery ML costs depend on data size. In these examples, a subset of data was extracted to a temporary table, which was used as input to BigQuery ML. This reduces costs substantially. If using all gene features of a TCGA dataset, costs can grow to the order of $100.

  • **With small datasets, use of GPUs in Colab does not cost extra (unless using Colab Pro). However, if TensorFlow code is executed in a VM with GPUs, the hourly cost can range from $1 to $10.

  • ***Cost per sample depends on sample size (i.e., number of reads) and processing time.

  • BigQuery ML vs. TensorFlow w/ Compute Engine or Colab GPUs: When choosing between these tools for machine learning, consider the following guidelines:

    • TensorFlow w/ Compute Engine or Colab GPUs: Appropriate for data exploration or parameter tuning requiring multiple iterations of training and evaluation.

    • BigQuery ML: Appropriate for production deployment of machine learning models. For example, after optimizing model parameters, train and deploy the final model with BigQuery ML.


Have feedback or corrections? Please email us at feedback@isb-cgc.org. Follow us on BlueSky and X!