**********************
Programs and Data Sets
**********************
The National Cancer Institute (NCI) `Genomic Data Commons `_ (GDC) and `Proteomics Data Commons (PDC) `_ provide the cancer research community with data repositories that enables data sharing across cancer genomic and proteomic studies (known as Programs) in support of precision medicine.
The ISB-CGC started with The Cancer Genome Atlas (TCGA) data sets but has expanded to include other data sets from programs such as Therapeutically Applicable Research to Generate Effective Treatments (TARGET). Along with the NCI GDC and PDC data sets, ISB-CGC hosts data sets from programs such as Pan-Cancer Atlas and HTAN. We are always interested in adding new data sets, so if you have any suggestions or requests for additional data, please let us know (feedback@isb-cgc.org).
Clinical, Biospecimen and Processed -Omics Data Sets
----------------------------------------------------
From Genomic Data Commons
~~~~~~~~~~~~~~~~~~~~~~~~~
.. figure:: omicsData.png
:align: right
:figwidth: 300px
Between ISB-CGC and the NCI GDC, there are many cancer data sets available on the Google Cloud Platform. ISB-CGC hosts some carefully curated, high-level clinical, biospecimen and molecular data sets and tables in Google BigQuery as well as radiology and pathology images in Google Cloud Storage. The GDC hosts several more data sets that include low-level sequencing data. For more information about the GDC, see the `GDC Overview `_.
Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are available in the GDC Cloud Storage buckets, in ISB-CGC BigQuery tables and through ISB-CGC web tools. The table below lists each Program and where (through ISB-CGC) that you can find its data.
- Within the detailed documentation on each Program (click on the Program name), there is an example of how to use the metadata stored in ISB-CGC BigQuery tables to locate the Program's files on the GDC Google Cloud Storage buckets.
- To learn more about using this data with ISB-CGC web tools, go to the ISB-CGC Web Interface section of this document.
- To locate these tables in the ISB-CGC BigQuery project, use the ISB-CGC BigQuery Table Search.
.. list-table::
:widths: 10 3 3 3
:header-rows: 1
:stub-columns: 1
* - Program
- GDC Google Cloud Storage
- ISB-CGC BigQuery Tables
- ISB-CGC Cohort Builder
* - `APOLLO `_
- |checkmark|
- |checkmark| *
-
* - `BEATAML `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `CCLE `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `CDDP EAGLE `_
- |checkmark|
- |checkmark| *
-
* - `CGCI `_
- |checkmark|
- |checkmark|
-
* - `CMI `_
- |checkmark|
- |checkmark|
-
* - `CPTAC `_
- |checkmark|
- |checkmark|
-
* - `CTSP `_
- |checkmark|
- |checkmark|
-
* - `Exceptional Responders `_
- |checkmark|
- |checkmark|
-
* - `FM `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `GENIE `_
- |checkmark|
- |checkmark| *
-
* - `HCMI `_
- |checkmark|
- |checkmark|
-
* - `MATCH `_
- |checkmark|
- |checkmark| *
-
* - `MMRF `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `MP2PRT `_
- |checkmark|
- |checkmark| *
-
* - `NCICCR `_
- |checkmark|
- |checkmark|
-
* - `OHSU `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `ORGANOID `_
- |checkmark|
- |checkmark|
-
* - `REBC `_
- |checkmark|
- |checkmark| *
-
* - `TARGET `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `TCGA `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `TCGA Pathology and Radiology images `_
- |checkmark|
- |checkmark|
- |checkmark|
* - `TRIO `_
- |checkmark|
- |checkmark| *
-
* - `VAREPOP `_
- |checkmark|
- |checkmark|
-
* - `WCDT `_
- |checkmark|
- |checkmark|
-
.. |checkmark| image:: CheckMark.png
*Clinical and metadata only available
**Clinical data only available
.. toctree::
:maxdepth: 1
:hidden:
data/BEATAML_about
data/CCLE_top
data/CDDP_EAGLE_about
data/CGCI_about
data/CMI_about
data/CPTAC_about
data/CTSP_about
data/EXC_RESPOND_about
data/FM_about
data/GENIE_about
data/HCMI_about
data/MATCH_about
data/MMRF_about
data/MP2PRT_about
data/NCICCR_about
data/OHSU_about
data/ORGANOID_about
data/REBC_about
data/TARGET_top
data/TCGA_top
data/TCGA-images
data/TRIO_about
data/VAREPOP_about
data/WCDT_about
From Proteomics Data Commons
~~~~~~~~~~~~~~~~~~~~~~~~~
PDC protein expression data are available in ISB-CGC BigQuery tables. The table below lists each Program.
.. list-table::
:widths: 10 3 3 3
:header-rows: 1
:stub-columns: 1
* - Program
- PDC AWS Cloud Storage
- ISB-CGC BigQuery Tables
- ISB-CGC Cohort Builder
* - `APOLLO `_
-
- |checkmark|
-
* - Broad Institute
-
- |checkmark|
-
* - `CBTN `_
-
- |checkmark|
-
* - `CPTAC `_
-
- |checkmark|
-
* - `Georgetown Proteomics Research Program `_
-
- |checkmark|
- |checkmark|
* - `ICPC `_
-
- |checkmark|
-
* - `Quantitative Digital Maps of Tissue Biopsies `_
-
- |checkmark|
-
.. toctree::
:maxdepth: 1
:hidden:
data/APOLLO_about
data/CBTN_about
data/CPTAC_about
data/GPRP_about
data/ICPC_about
data/Quant_Maps_Tissue_Biopsies_about
From Other Sources
~~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
:stub-columns: 1
* - Program
- GDC Google Cloud Storage
- ISB-CGC BigQuery Tables
- ISB-CGC Cohort Builder
* - `Pan-Cancer Atlas `_
-
- |checkmark|
-
* - `HTAN `_
-
- |checkmark|
-
.. toctree::
:maxdepth: 1
:hidden:
PanCancer-Atlas-Mirror
data/HTAN_about
Reference Data Sets
-------------------
ISB-CGC hosts `reference tables `_ in BigQuery with information that describes or annotates human or other genomes, or is necessary to work with data generated by specific platforms.
.. toctree::
:maxdepth: 1
:hidden:
data/Reference-Data
File Metadata Data Sets
------------------------
ISB-CGC hosts `metadata tables `_ in BigQuery with information that points to the raw and processed cancer data in the NCI GDC Google Cloud Storage buckets.
.. toctree::
:maxdepth: 1
:hidden:
data/FileMetadata