TARGET Data Set¶
About the Therapeutically Applicable Research to Generate Effective Treatments¶
The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program applied a comprehensive genomic approach to determine the molecular changes driving childhood cancers. Investigators formed a collaborative network to facilitate the discovery of molecular targets and translate those findings into the clinic. TARGET is managed by NCI’s Office of Cancer Genomics and Cancer Therapy Evaluation Program.
About the Therapeutically Applicable Research to Generate Effective Treatments Data¶
The ISB-CGC currently hosts several TARGET data sets in BigQuery. TARGET controlled-access data is available to authorized users in Genomic Data Commons and open-access data includes RNA-seq and miRNA-seq expression levels, and is available in BigQuery, along with the open-access clinical and biospecimen information.
The TARGET data is available at the GDC in the legacy archive which contains over 10,000 files for over 5,000 cases. Virtually all of this data is low-level (and controlled-access) sequence data (including 1702 RNA-seq files, 765 miRNA-seq, with the remainder being WXS or WGS DNA-seq BAMs). Some of this data has been reprocessed and is available on the main GDC Data Portal. This newer dataset so far includes 33,402 files representing 6,197 cases and totaling over 200 TB. Over half of the files are controlled-access files, including BAM, VCF, and MAF file types, based on WXS, RNA-seq, and miRNA-seq data. The remaining files are open-access files, including RNA-seq and miRNA-seq quantification, as well as clinical and biospecimen supplement files.
BigQuery Therapeutically Applicable Research to Generate Effective Treatments Data¶
The open-access TARGET data hosted by the ISB-CGC Platform includes:
- Clinical (de-identified) and Biospecimen data: these data were originally provided in XML files (Level-1)
- Gene (mRNA) expression data: these data were originally provided as TSV files (Level-3)
- microRNA expression data: these data were originally provided as TSV files (Level-3)
The information scattered over thousands of XLSX and TSV files at the GDC is provided in a much more accessible form in a series of BigQuery tables.
For more information on TARGET data, please refer to the site below:
Accessing the Therapeutically Applicable Research to Generate Effective Treatments data on the Cloud¶
Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the
isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.
- To access these metadata files, go to the Google BigQuery console.
- Perform SQL queries to find the TARGET files. Here is an example:
SELECT active.*, file_gdc_url FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl WHERE program_name = 'TARGET' AND active.file_gdc_id = GCSurl.file_gdc_id