Cloud-Hosted Data SetsΒΆ

The ISB-CGC platform hosts the majority of the TCGA data set as well as other reference and annotation datasets in different appropriate Google Cloud technologies:

  • low-level DNA- and RNA-Seq data are stored primarily in Google Cloud Storage;
  • some open-access CCLE sequence data is also available in Google Genomics, where it can be queried using the GA4GH API;
  • high-level clinical, biospecimen, and molecular data are available in a series of carefully curated datasets and tables backed by the massively-parallel analytics engine Google BigQuery;
  • TCGA radiology and tissue image data are now also available in Google Cloud Storage;
  • TCGA proteomics (CPTAC PhaseII) data has also been uploaded to Google Cloud Storage;

The original mission of the ISB-CGC was to host the TCGA dataset. We are now in midst of adding data from the TARGET pediatric cancer. Stay tuned for updates.

Have feedback or corrections? You can file an issue here or email us at