ISB-CGC Hosted Reference Data¶
To facilitate working with the TCGA data tables that the ISB-CGC is hosting in BigQuery, additional reference data tables have also been created, others are hosted by Google Genomics, and suggestions for more are welcome at email@example.com.
Genome Reference Data¶
Reference data that describes or annotates the human (or other) genome(s) is described in this section.
Reference data hosted by the ISB-CGC in BigQuery tables are available in the
data set. Tables based on
gene-sets such as Ensembl and GENCODE can be used to find the genomic coordinates and identifiers
for genes of interest, to perform queries that join tables with gene-symbol based data
to tables with genomic-coordinate based data or tables that use other gene identifiers, for example.
For additional details about each of these tables, please use the BigQuery web UI to access each of these tables and look at the information on the Details page. (Look for the Details button between the Schema and Preview buttons, beneath the table name.)
- Gene Ontology Consortium: Tables based on GO annotations and the GO ontology.
- Kaviar: The latest hg19- and hg38-based Kaviar databases are available. Kaviar is a compilation of SNVs, indels, and complex variants observed in humans, designed to facilitate testing for the novelty and frequency of observed variants.
- liftOver_hg19_to_hg38: This table provides a mapping of each hg19 position to the corresponding position in hg38, and can be used to perform a liftOver operation in BigQuery.
- miRTarBase: The recently updated miRTarBase database (release 6.1)
- -Ensembl2Reactome - miRBase2Reactome
Platform Reference Data¶
Some reference data is necessary to work with data generated by specific platforms such as the Illumina DNA Methylation array, or the Affymetrix Genome-Wide Human SNP Array 6.0. This section will provide links to existing sources of information elsewhere on the web, or will describe additional resources that are hosted by the ISB-CGC. If there are additional platform reference sources that you would like to see hosted in BigQuery tables, please let us know at firstname.lastname@example.org.
- DNA Methylation Platform:
- Most of the DNA Methylation data produced by the TCGA project was obtained using the Illumina Infinium HumanMethylation450 (aka 450k) BeadChip array. Some of the earlier tumor types were assayed on the older, 27k array.
- Although additional details can be found at the Illumina webpage, we have uploaded the platform annotation information into the BigQuery table
- Each CpG locus is uniquely identified as described in this technical note and this unique identifier can be used to look up and cross-reference data between the TCGA DNA methylation data table and the platform annotation table.
- The original Illumina-provided CpG coordinates have been “lifted over” from hg19 to hg38
- Genome-Wide SNP Array:
- The technical documentation for the Affymetrix Genome-Wide Human SNP Array 6.0 array can be found here.