TCGA Radiology and Pathology Image Data

The TCGA images from The Cancer Imaging Archive (TCIA) as well as the pathology and diagnostic images previously available from the Cancer Digital Slide Archive (CDSA) are all now available in open-access ISB-CGC Google Cloud Storage (GCS) buckets, as described below.

Metadata for these files can be found in BigQuery, in the ISB-CGC metadata dataset.

Radiology Images

Over 1.4 million radiology image files in DICOM format, grouped together into over 20,000 zip files are available in a GCS bucket called gs://isb-tcia-open/. Each zip file may contain hundreds of images, or just a single image.

The BigQuery metadata table, isb-cgc.metadata.TCGA_radiology_images contains the full urls to these zip files, eg:


The metadata table also includes the patient identifier in TCGA “barcode” format, eg TCGA-06-5413 (which is also part of the GCS url). Other information available in the table includes the body part examined, image modality, patient age, etc.

Pathology Images

Over 30,000 TCGA tissue slide images in SVS format, are also available in GCS, in the open-access bucket gs://isb-tcga-phs000178-open/. . These files were uploaded from the NCI-GDC legacy archive.

The BigQuery metadata table, isb-cgc.metadata.TCGA_slide_images contains the full urls to these SVS files, eg gs://isb-tcga-phs000178-open/gdc/208fa2ac-69a8-4851-b13e-1f000872bf7f/TCGA-06-5413-01Z-00-DX1.6c5e8a47-c2d0-4873-9b32-36857c5f67ac.svs, as well as image metadata, the TCGA case and sample “barcode” which can be used to join this table with other TCGA clinical, biospecimen and molecular data tables.

