Cloud-Hosted Data SetsΒΆ

The ISB-CGC platform hosts the majority of the TCGA data set as well as other reference and annotation datasets in different appropriate Google Cloud technologies:

  • low-level DNA- and RNA-Seq data are stored primarily in Google Cloud Storage;
  • some open-access CCLE sequence data is also available in Google Genomics, where it can be queried using the GA4GH API;
  • high-level clinical, biospecimen, and molecular data are available in a series of carefully curated datasets and tables backed by the massively-parallel analytics engine Google BigQuery;
  • TCGA radiology and tissue image data are now also available in Google Cloud Storage;
  • TCGA proteomics (CPTAC PhaseII) data has also been uploaded to Google Cloud Storage;

The original mission of the ISB-CGC was to host the TCGA dataset. We are now in midst of adding data from the TARGET pediatric cancer. Stay tuned for updates.

