NCI GDC Overview

The NCI hosts a variety of data from cancer genomic studies in the Genomic Data Commons (GDC) providing the cancer research community a unified data repository enabling data sharing to support precision medicine.

The GDC Data Portal allows users to search for and download data directly via your web browser or using the GDC Data Transfer Tool. There are two sets of data available in the GDC: legacy data and harmonized data. The legacy data is from previous data coordinating centers, such as TCGA-DCC and CGHub, that the GDC inherited. The current data available in the GDC is harmonized data from the coordination centers that were realigned to GRCh38/hg38 and reprocessed by GDC along with new data sets.

If you have used the GDC portal to create cohorts or file lists, you can follow these tutorials to bring that information into ISB-CGC for use.

A note about legacy and harmonized data sets

Programs like TCGA that predate the Genomic Data Commons will have both legacy data sets (data as originally generated by the program) and harmonized data sets created by the Genomic Data Commons. While these data sets do have much in common, as part of the GDC harmonization process several changes can occur including removal or addition of cases and samples or changes in terminology. One of the goals of the ISB-CGC is to stay current with changes introduced by GDC and therefore you may find differences between legacy data and harmonized data.


Have feedback or corrections? Please email us at feedback@isb-cgc.org. Follow us on BlueSky and X!