TCGA Radiology and Pathology Image Data Set

The TCGA images from The Cancer Imaging Archive (TCIA) as well as the pathology and diagnostic images previously available from the Cancer Digital Slide Archive (CDSA) are all now available in open-access Google Cloud Storage (GCS) buckets and can be explored through the Web App.

Metadata for these files can be found in BigQuery, in the ISB-CGC metadata data sets.

Radiology Images

Over 1.4 million radiology image files in DICOM format, grouped together into over 20,000 ZIP files are available in a GCS bucket called gs://isb-tcia-open/. Each ZIP file may contain hundreds of images or just a single image.

The BigQuery metadata table, isb-cgc.metadata.TCGA_radiology_images contains the full URLs to these ZIP files, e.g.:

gs://isb-tcia-open/images/TCGA-GBM/TCGA-06-5413/TCIA.image.1.3.6.1.4.1.14519.5.2.1.4591.4001.275342915307453440215680715165.zip

The metadata table also includes the patient identifier in TCGA “barcode” format, e.g. TCGA-06-5413 (which is also part of the GCS URL). Other information available in the table includes the body part examined, image modality, patient age, etc.

Pathology Images

Over 30,000 TCGA tissue slide images in SVS format, are also available in GCS, in the open-access bucket gs://gdc-tcga-phs000178-open/.

These files were uploaded from the GDC legacy archive.

The BigQuery metadata table, isb-cgc.metadata.TCGA_slide_images contains the full URLs to these SVS files, e.g.:

gs://gdc-tcga-phs000178-open/9c4b1b5c-b5cf-48f6-bf41-047ceb8c883c/TCGA-CR-7365-01A-01-TS1.811bb2b7-66e3-4694-891b-10b436ec300d.svs

as well as image metadata and the TCGA case and sample “barcode” which can be used to join this table with other TCGA clinical, biospecimen and molecular data tables.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.