Programs and Data Sets

The National Cancer Institute (NCI) Genomic Data Commons (GDC) and Proteomics Data Commons (PDC) provide the cancer research community with data repositories that enables data sharing across cancer genomic and proteomic studies (known as Programs) in support of precision medicine.

The ISB-CGC started with The Cancer Genome Atlas (TCGA) data sets but has expanded to include other data sets from programs such as Therapeutically Applicable Research to Generate Effective Treatments (TARGET). Along with the NCI GDC and PDC data sets, ISB-CGC hosts data sets from programs such as Catalogue Of Somatic Mutations In Cancer (COSMIC) from the Wellcome Trust Sanger Institute. We are always interested in adding new data sets, so if you have any suggestions or requests for additional data, please let us know (feedback@isb-cgc.org).

Clinical, Biospecimen and Processed -Omics Data Sets

From Genomic Data Commons

../_images/omicsData.png

Between ISB-CGC and the NCI GDC, there are many cancer data sets available on the Google Cloud Platform. ISB-CGC hosts some carefully curated, high-level clinical, biospecimen and molecular data sets and tables in Google BigQuery as well as radiology and pathology images in Google Cloud Storage. The GDC hosts several more data sets that include low-level sequencing data. For more information about the GDC, see the GDC Overview.

Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are available in the GDC Cloud Storage buckets, in ISB-CGC BigQuery tables and through ISB-CGC web tools. The table below lists each Program and where (through ISB-CGC) that you can find its data.

  • Within the detailed documentation on each Program (click on the Program name), there is an example of how to use the metadata stored in ISB-CGC BigQuery tables to locate the Program’s files on the GDC Google Cloud Storage buckets.

  • To learn more about using this data with ISB-CGC web tools, go to the ISB-CGC Web Interface section of this document.

  • To locate these tables in the ISB-CGC BigQuery project, use the ISB-CGC BigQuery Table Search.

Program

GDC Google Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

BEATAML

checkmark

checkmark

checkmark

CCLE

checkmark

checkmark

checkmark

CGCI

checkmark

checkmark

CMI

checkmark

checkmark

CPTAC

checkmark

checkmark

CTSP

checkmark

checkmark

FM

checkmark

checkmark

checkmark

GENIE

checkmark

checkmark *

HCMI

checkmark

checkmark

MMRF

checkmark

checkmark

checkmark

NCICCR

checkmark

checkmark *

OHSU

checkmark

checkmark *

checkmark

ORGANOID

checkmark

checkmark

TARGET

checkmark

checkmark

checkmark

TCGA

checkmark

checkmark

checkmark

TCGA Pathology and Radiology images

checkmark

checkmark

checkmark

VAREPOP

checkmark

checkmark

WCDT

checkmark

checkmark

*Clinical and metadata only available

From Proteomics Data Commons

PDC protein expression data are available in ISB-CGC BigQuery tables. The table below lists each Program.

Program

PDC AWS Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

CBTN

checkmark

CPTAC

checkmark

Georgetown Proteomics Research Program

checkmark

checkmark

ICPC

checkmark

Quantitative Digital Maps of Tissue Biopsies

checkmark

From Other Sources

Program

GDC Google Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

COSMIC

No, the COSMIC database is maintained by the Wellcome Sanger Institute, UK

Yes, COSMIC data is in BigQuery for registered users. Learn more about how to gain access to the COSMIC data here

Pan-Cancer Atlas

checkmark

Reference Data Sets

ISB-CGC hosts reference tables in BigQuery with information that describes or annotates human or other genomes, or is necessary to work with data generated by specific platforms.

File Metadata Data Sets

ISB-CGC hosts metadata tables in BigQuery with information that points to the raw and processed cancer data in the NCI GDC Google Cloud Storage buckets.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.