Quick-Start Guide

ISB-CGC provides both interactive (through a web application) and programmatic access to data hosted by institutes such as the Genomic Data Commons (GDC) and the Protemoic Data Commons (PDC) of the National Cancer Institute (NCI) and Human Tumor Atlas Network (HTAN), leveraging many aspects of the Google Cloud Platform. To get started, you’ll need a Google Cloud Project. Additionally, to access controlled data, you’ll also need dbGaP authorization.

Google Cloud Project Setup and Data Access

A Google Cloud Project (GCP) is required to make use of all of the data, tools, and Google Cloud functionality.

Obtain a Google identity

Do you or your institution already have a Google identity, such as a Gmail account? If so, you can proceed to the next step.

If not, it only takes a minute to create a Google identity. You can even link a non-Gmail account (eg. scientist@nih.gov) as a Google identity by this method.

Request Google Cloud Credits

Take advantage of a one-time $300 Google Credit.

If you have already used this one-time offer (or there is some other reason you cannot use it), see this information about how to request ISB-CGC Cloud Credits.

Set up a Google Cloud Project

See Google’s documentation about how to create a Google Cloud Project.

Learn about how to add members and roles to a project.

If you’ll be accessing controlled data, register the GCP project.

Enable Required Google Cloud APIs

Connect to ISB-CGC’s cancer data tables in Google BigQuery

To obtain access to the ISB-CGC open access project tables in BigQuery, users can link these tables to their GCP project as described here.

To obtain access to the ISB-CGC controlled access project tables in BigQuery, users can link these tables to their GCP project as described here.

Access open-access data

All individual processed data files are accessible through GDC Google Cloud Storage buckets; ISB-CGC provides pointers to these files. Examples of how to find these URLs are in this section, on each Program’s documentation page; these SQL queries can also be incorporated into notebooks or workflows.

Access controlled data (with proper authorization)

To access controlled data (primarily raw data files in the GDC Google Cloud Storage buckets), users must first be authenticated by NIH (via the ISB-CGC web-app). Upon successful authentication, user dbGaP authorization will be verified. These two steps are required before the user’s Google identity is added to the access control list (ACL) for the controlled data. At this time, this access must be renewed every 24 hours.

Getting Started with Analysis

Now you’re ready to perform analysis. ISB-CGC offers web-based interactive interactive analysis, analysis with Google BigQuery and analysis using APIs and VMs. Please see the next section Getting Started with Analysis to learn more.

Have feedback or corrections? Please email us at feedback@isb-cgc.org. Follow us on BlueSky and X!