Case and File Metadata

The ISB-CGC hosts several metadata tables in Google BigQuery to help users find GDC files in Google Cloud Storage (GCS) or PDC files in Amazon Web Services (AWS) cloud storage. Preview and query these tables from the BigQuery web UI or scripting languages such as R and Python, or the command-line using the cloud SDK utility bq.

For additional details about each of these tables, please use the BigQuery Table Search. To find the metadata tables, select File Metadata under Category.

Below, the ‘#’ represents the GDC release number and should be replaced by it when using the tables, for example: isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r28. The metadata is split up into several tables per GDC release as follows in the isb-cgc-bq project. (Older metadata is in the isb-cgc project and follows a slightly different table naming format.)

Table Description
caseData_r# List of all of the cases in GDC
fileData_active_r# List of the currently active cases in GDC along with information related to those cases
fileData_legacy_r# Same as the previous table but with legacy data instead
aliquot2caseIDmap_r# “helper” table to map between identifiers at different levels of aliquot data. The intrinsic hierarchy is program > project > case > sample > portion > analyte > aliquot
slide2caseIDmap_r# “helper” table to map between identifiers at different levels of tissue slide data. The intrinsic hierarchy is program > project > case > sample > portion > slide
GDCfileID_to_GCSurl_r# Gives the Google Cloud Storage location for each file
per_sample_file_metadata_hg19_gdc_r# or per_sample_file_metadata_hg38_gdc_r# Provides file ids and other metadata for samples. Information is stored in these tables by program and these tables are in the respective program data set.

PDC metadata file and case metadata are stored in data sets isb-cgc-bq.PDC_metadata_versioned and isb-cgc-bq.PDC_metadata.

Table Description
file_associated_entity_mapping_V# List of PDC entitites mapped to cases and file IDs
file_metadata_V# Gives the AWS location for each file, study information, as well as an embargo date if it applies

For examples of querying the metadata tables, please see the ISB-CGC Community Notebook GitHub Repository.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.