Case and File Metadata
The ISB-CGC hosts several metadata tables in Google BigQuery to help users find GDC files in Google Cloud Storage (GCS) or PDC files in Amazon Web Services (AWS) cloud storage. Preview and query these tables from the BigQuery web UI or scripting languages such as R and Python, or the command-line using the cloud SDK utility bq.
For additional details about each of these tables, please use the BigQuery Table Search. To find the metadata tables, select File Metadata under Category.
Below, the ‘#’ represents the GDC release number and should be replaced by it when using the tables, for example: isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r28. The metadata is split up into several tables per GDC release as follows in the isb-cgc-bq project. (Older metadata is in the isb-cgc project and follows a slightly different table naming format.)
Table |
Description |
---|---|
caseData_r# |
List of all of the cases in GDC |
fileData_active_r# |
List of the currently active cases in GDC along with information related to those cases |
fileData_legacy_r# |
Same as the previous table but with legacy data instead |
aliquot2caseIDmap_r# |
“helper” table to map between identifiers at different levels of aliquot data. The intrinsic hierarchy is program > project > case > sample > portion > analyte > aliquot |
slide2caseIDmap_r# |
“helper” table to map between identifiers at different levels of tissue slide data. The intrinsic hierarchy is program > project > case > sample > portion > slide |
GDCfileID_to_GCSurl_r# |
Gives the Google Cloud Storage location for each file |
per_sample_file_metadata_hg19_gdc_r# or per_sample_file_metadata_hg38_gdc_r# |
Provides file ids and other metadata for samples. Information is stored in these tables by program and these tables are in the respective program data set. |
PDC metadata file and case metadata are stored in data sets isb-cgc-bq.PDC_metadata_versioned and isb-cgc-bq.PDC_metadata.
Table |
Description |
---|---|
file_associated_entity_mapping_V# |
List of PDC entitites mapped to cases and file IDs |
file_metadata_V# |
Gives the AWS location for each file, study information, as well as an embargo date if it applies |
For examples of querying the metadata tables, please see the ISB-CGC Community Notebook GitHub Repository.