ISB-CGC

Democratizing access to cancer data in the cloud

Contained within this documentation are descriptions of ISB-CGC features along with guides and tips for exploring data sets hosted on the Google Cloud Platform.

_images/overview_image.png

The ISB-CGC aims to serve the needs of a broad range of cancer researchers ranging from scientists or clinicians who prefer to use an interactive web-based application to access and explore the rich TCGA, TARGET, CCLE, COSMIC and other data sets, to computational scientists who want to write their own custom scripts using languages such as R or Python, accessing the data through APIs, and to algorithm developers who wish to spin up thousands of virtual machines to analyze hundreds of terabytes of sequence data.

– the ISB-CGC team

About the ISB-CGC Platform

The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three National Cancer Institute (NCI) Cloud Resources tasked with bringing cancer data and computation power together through cloud platforms. It is a collaboration between the Institute for Systems Biology (ISB) and General Dynamics Information Technology Inc. (GDIT). Since starting in 2014 as part of NCI’s Cloud Pilot Resource initiative, ISB-CGC has provided access to increasing amounts of cancer data in the cloud.

Exploring Cancer Data

The ISB-CGC Platform enables a wide range of users to bring their analysis tools to the data in the cloud, eliminating the need to download and store large data sets. Built with the Google Cloud Platform, it provides several entry points for exploring and analyzing cancer data:

  • The ISB-CGC Web Application allows users to interactively create and explore cohorts of interest. It includes the functionality of the Cancer Data File Browser and the Cohort Builder/Data Explorer as well as other tools.

    • The Cancer Data File Browser allows users to explore a comprehensive selection of cancer related data files in Google Cloud Storage Buckets, such as raw sequencing, cancer nucleotide variation, pathology or radiology images.

    • The Cohort Builder/Data Explorer is a web interface which builds cohorts based on clinical demographics and molecular filters. Compare patient cohorts with various exploration tools including IGV viewer, image viewers, and analytical visualization.

  • The ISB-CGC API gives users the ability to programmatically work with data such as cases, samples, cohorts, files and cloud projects.

  • The ISB-CGC BigQuery Table Search is a discovery tool that allows the user to explore and search for ISB-CGC Google BigQuery tables.

  • On the Google Cloud Platform BigQuery Console, ISB-CGC tables can be viewed and queried directly.

  • Python and R can interface with the ISB-CGC tables, retrieving and analyzing data.

  • Using Google Compute Engines and VMs, workflows can be run to perform data analysis.

Please see the USER GUIDE section to learn more about each of these tools and to see Jupyter and R Notebook examples. See the MORE INFORMATION section for tutorials, release notes, Frequently Asked Questions and more.

_images/ToolsForISBCGC.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Data Overview

ISB-CGC provides access to data from several research programs, such as The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), Cancer Cell Line Encyclopedia (CCLE) and Catalogue of Somatic Mutations in Cancer (COSMIC). The full list is available here.

The majority of the data made available through ISB-CGC originates from NCI Genomic Data Commons (GDC). Users can access GDC data on the cloud through ISB-CGC. Users have access to both raw and processed data from cancer patients.

NCI Proteomics Data Commons (PDC) data is also available in ISB-CGC Google BigQuery tables.

In general, almost all raw data is controlled-access and is accessible through Google Cloud Storage buckets; only those users with proper authorization can access them. The GDC has established bioinformatics workflows/pipelines executed on the raw data to generate processed data. In this way, users can directly access the processed data without having to run compute-intensive workflows themselves. However, users who wish to run their own workflows/pipelines still have access to the raw data as well.

Processed data, however, are generally open-access. ISB-CGC allows users to utilize this processed data in two ways on the platform:

  • Google Cloud Storage: Individual GDC processed data files are accessible through GDC Google Cloud Storage buckets; ISB-CGC provides pointers to these files.

  • Google BigQuery: Processed data are consolidated by datatype (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, etc.) and transformed into ISB-CGC Google BigQuery tables for ease of access and analysis. This novel approach allows our users to quickly analyze information from thousands of patients in our curated BigQuery tables.

_images/DataStorageOnISBCGC.png

Google Cloud Storage

Google Cloud Storage (GCS) is a cloud-based object-store that is used to store many types of (usually binary) data, typically processed by custom software pipelines. The data hosted by GDC is contained within Google Cloud Storage. Metadata stored within ISB-CGC BigQuery tables contains pointers to file locations in this GDC data.

Google BigQuery

Google BigQuery (BQ) is a columnar database ideal for storing tabular data. Its query speed is automatically scaled by multiprocessing. Data is accessed using a powerful SQL language interface.

ISB-CGC stores high-level clinical, biospecimen, and molecular data from the main NCI programs in the BigQuery projects isb-cgc-bq and isb-cgc. It also stores a large amount of metadata about files that are stored in the GDC Google Cloud Storage, as well as genome reference sources (e.g. GENCODE, miRBase, etc.). Most of these data sets and tables are completely open access and available to the research community.

Terms of Use

Using the data stored in ISB-CGC is subject to the terms of use of its origin.

For reference and other tables, see the table description for specific information.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Quick-Start Guide

ISB-CGC provides both interactive (through a web application) and programmatic access to data hosted by institutes such as the Genomic Data Commons (GDC) of the National Cancer Institute (NCI), and the Wellcome Trust Sanger Institute, leveraging many aspects of the Google Cloud Platform. To get started, you’ll need a Google Cloud Project. Additionally, to access controlled data, you’ll also need dbGaP authorization.

_images/GettingStarted.png

Google Cloud Project Setup and Data Access

A Google Cloud Project (GCP) is required to make use of all of the data, tools, and Google Cloud functionality.

Obtain a Google identity

  • Do you or your institution already have a Google identity, such as a Gmail account? If so, you can proceed to the next step.

  • If not, it only takes a minute to create a Google identity. You can even link a non-Gmail account (eg. scientist@nih.gov) as a Google identity by this method.

Request Google Cloud Credits

  • Take advantage of a one-time $300 Google Credit.

  • If you have already used this one-time offer (or there is some other reason you cannot use it), see this information about how to request ISB-CGC Cloud Credits.

Set up a Google Cloud Project

Connect to ISB-CGC’s cancer data tables in Google BigQuery

  • To obtain access to the ISB-CGC open access project tables in BigQuery, users can link these tables to their GCP project as described here.

  • To obtain access to the ISB-CGC controlled access project tables in BigQuery, users can link these tables to their GCP project as described here.

Access open-access data

  • All individual processed data files are accessible through GDC Google Cloud Storage buckets; ISB-CGC provides pointers to these files. Examples of how to find these URLs are in this section, on each Program’s documentation page; these SQL queries can also be incorporated into notebooks or workflows.

Access controlled data (with proper authorization)

  • To access controlled data (primarily raw data files in the GDC Google Cloud Storage buckets), users must first be authenticated by NIH (via the ISB-CGC web-app). Upon successful authentication, user dbGaP authorization will be verified. These two steps are required before the user’s Google identity is added to the access control list (ACL) for the controlled data. At this time, this access must be renewed every 24 hours.

Getting Started with Analysis

Now you’re ready to perform analysis. ISB-CGC offers web-based interactive interactive analysis, analysis with Google BigQuery and analysis using APIs and VMs. Please see the next section Getting Started with Analysis to learn more.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Getting Started with Analysis

ISB-CGC enables researchers to analyze cloud-based cancer data through a collection of powerful web-based tools and Google Cloud technologies. Learn more about the different analytical methods ISB-CGC users employ on their research projects.

Interactive web-based Cancer Data Analysis & Exploration

Explore and analyze ISB-CGC cancer data through a suite of graphical user interfaces (GUIs) that allow users to select and filter data from one or more public data sets (such as TCGA, CCLE, and TARGET), combine these with your own uploaded data and analyze using a variety of built-in visualization tools.

Cohort Builder/Data Explorer
Create and explore cohorts of interest
Interactive Pathology and Radiology Image Viewers
View images from cancer patients using integrated image viewers
Integrative Genomics Viewer (IGV)
Explore and visualize genomic data
Cancer Data File Browser
Browse and identify files associated with cohorts of interest
Mitelman Database for Chromosome Aberrations and Gene Fusions in Cancer
Explore relationships between chromosomal changes and cancer
The TP53 Database
Explore TP53 variant data that have been reported in the published literature or are available in other public databases.

Cancer data analysis using Google BigQuery

Processed data are consolidated by data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, Protein Expression, etc.) from sources including the Genomics Data Commons (GDC) and Proteomics Data Commons (PDC) and transformed into ISB-CGC Google BigQuery tables. This allows users to quickly analyze information from thousands of patients in curated BigQuery tables using Structured Query Language (SQL). SQL can be used from the Google BigQuery Console but can also be embedded within Python, R and complex workflows, providing users with flexibility. The easy, yet cost effective, “burstability” of BigQuery allows you to, within minutes (as compared to days or weeks on a non-cloud based system), calculate statistical correlations across millions of combinations of data points.

BigQuery Table Search User Interface
Learn more about ISB-CGC hosted BigQuery tables
Google BigQuery Console
Use SQL to analyze and query ISB-CGC cancer data stored in Google’s cloud-based data warehouse
Notebooks
Seamlessly integrate ISB-CGC tables with R and Python to conduct robust analyses

Cancer data analysis using APIs & Google Cloud Virtual Machines

ISB-CGC enables the use of as many workflow technologies as possible through documentation, support, and necessary infrastructure.

ISB-CGC APIs
Programmatically access data and user-generated cancer patient cohort information
Connecting to GA4GH and Cloud Life Sciences APIs:
Easily connect to APIs from ISB-CGC
Running workflows on ISB-CGC
Execute open-source and custom pipelines/algorithms on scalable virtual machines

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

How to Request Cloud Credits

ISB-CGC offers free Google Cloud credits for researchers to try out our platform! To get trial funding in an ISB-CGC funded Google Cloud Platform (GCP) project, please send your request to request-gcp@isb-cgc.org.

Note: an ISB-CGC funded GCP project is not required to work with the ISB-CGC platform.

In your request, please:

  • Describe your research goals in some detail

  • Include information such as

  • The type of data that you plan to use

  • The algorithms and/or methods you plan to apply

  • Estimate of the storage

  • Computing costs you expect to incur

  • Include if you have students or collaborators who will also be accessing the same cloud project

Teams working on a single project should all use the same cloud project

  • If you have previous experience using the Google Cloud Platform along with which specific components (eg Compute Engine, BigQuery, Cloud Datalab, etc).

All reasonable requests will receive an initial allocation of $300 towards storage and compute costs. We expect that this amount of funding will be more than enough for you to become familiar with the platform. If you expect that you will need additional funding to complete your planned research, this initial amount may be used to perform prototype analyses and to better estimate your total costs. At that time, you may request additional funding.

Please be aware that we will be monitoring your cloud resource usage on a daily basis and will alert you as you begin to approach your funding limit. If you exceed your allocation limit and we are not able to contact you by email for several days, we may need to take action to shut your project down which could cause you to lose work and data.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Best Practices

Don’t Download the Data

The ISB-CGC platform is one of NCI’s Cancer Cloud Resources and our mission is to host cancer data (such as TCGA and TARGET data) in the cloud so that researchers around the world may work with data without needing to download and store the data at their own local institutions.

Remember those times when you had to wait weeks to download the data - you don’t need to do that any more! The data is already on the cloud, so you can collaborate with other researchers much more easily. Be mindful that if you download data, you’ll incur egress charges. Google egress charges information

Computing on the Cloud

Most of the same linux commands, scripts, pipelines/workflows, genomics software packages and docker containers that you run on your local machine can be executed on virtual machines on Google Cloud.

  1. The basics and best practices on how to launch virtual machines (VMs) are described here in our documentation. NOTE: When launching VMs, please maintain the default firewall settings.

  2. Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images that you can create or import from your existing systems.

Be careful as you spin up a machine, as larger machines cost you more. If you are not using a machine, shut it down. You can always restart it easily when you need it.

Example use-case: You would like to run Windows-only genomics software package on the TCGA data. You can create a Windows based VM instance.

  1. More details on how to deploy docker containers on VMs are described here in Google’s documentation: deploying containers

  2. A good way to estimate costs for running a workflow/pipeline on large data sets is to test them first on a small subset of data.

  3. There are different VM types depending on the sort of jobs you wish to execute. By default, when you create a VM instance, it remains active until you either stop it or delete it. The costs associated with VM instances are detailed here: compute pricing

  4. If you plan on running many short compute-intensive jobs (for example indexing and sorting thousands of large bam files), you can execute your jobs on preemptible virtual machines. They are 80% cheaper than regular instances. preemptible vms

Example use-cases:

Storage on the Cloud

The Google Cloud Platform offers a number of different storage options for your virtual machine instances: disks

  1. Block Storage:

  • By default, each virtual machine instance has a single boot persistent disk that contains the operating system. The default size is 10GB but can be adjusted up to 64TB in size. (Be careful! High costs here, spend wisely!)

  • Persistent disks are restricted to the zone where your instance is located.

  • Use persistent disks if you are running analyses that require low latency and high-throughput.

  1. Object Storage: Google Cloud Storage (GCS) buckets are the most flexible and economical storage option.

  • Unlike persistent disks, Cloud Storage buckets are not restricted to the zone where your instance is located.

  • Additionally, you can read and write data to a bucket from multiple instances simultaneously.

  • You can mount a GCS bucket to your VM instance when latency is not a priority or when you need to share data easily between multiple instances or zones.

    An example use-case: You want to slice thousands of bam files and save the resulting slices to share with a collaborator who has instances in another zone to use for downstream statistical analyses.

  • You can save objects to GCS buckets including images, videos, blobs and unstructured data.

    A comparison table detailing the current pricing of Google’s storage options can be found here: storage features


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Benefits of Using The Cloud

Working in the cloud is exceptionally scalable and versatile; you only use as much as you need, whether that’s in terms of storage space or processing cores. Cloud-based data is easily read by massively parallel processes, expediting results. When you’re done, resources disappear! You don’t have idle resources sitting around collecting dust.

Don’t be intimidated by the cloud! Scale your analyses using the data on ISB-CGC. If you’ve conducted bioinformatics before using the command line or SQL, this will be just as easy (if not easier) and we are also here to help. Email feedback@isb-cgc.org or visit our Community Notebooks page for guides and tutorials.

Most bioinformaticians today are likely accustomed to using the high performance compute (HPC) resources provided by their institution to conduct high-throughput bioinformatics analyses. Here’s a breakdown on how the Google Cloud Platform compares to your institution’s HPC resources.

Your University’s HPC Resource

Google Cloud Platform

Operating Systems

Linux, Windows

Virtual machines can run Linux and Windows

Compute

Virtual machines not determined by you

You can sign up with you own virtual machines*

Storage

Block Storage

  • Small storage is available in your home directory (usually around 1TB)

  • Some Scratch storage that is often deleted after a certain amount of time

  • Storage is usually a shared resource

Block Storage & Object Storage

  • Each virtual machine instance has a single boot persistent disk with a default size of 10GB that can be adjusted up to 64TB*

  • For storage that needs IO, consider persistent disks

  • Google Cloud Storage (GCS) buckets are the most flexible and economical storage option

  • You can save objects to GCS buckets including images, videos, blobs, and unstructured data

Pricing

Depends on the institution:

  • Institution provides basic HPC resources for researchers free of charge

  • PIs requiring larger-scale resources must purchase clusters and storage space

Pay as you go

  • You pay for the compute resources and storage that you use*

Do you have to wait?

Yes

  • Resources are shared among users

  • Scheduler systems used to schedule jobs based on resource availability

No

  • Once you’ve set up a Google Cloud Platform account, you can spin up a virtual machine and begin computing quickly

Is machine powerful enough?

Yes and no, depends on what you’re trying to do; often it’s a no

Compute resources and storage are unlimited, but you have to pay for it*

Accessing Cancer Genomics Data

Typically not stored on the HPC; you have to download to your local machine

Data is stored on the cloud

How to connect

Log in using Secure Shell protocol (SSH)

Log in using Secure Shell protocol (SSH)

*Be careful of costs, See the Cost Management page for more information.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Cost Management

This section details a few use cases and their approximate costs in order to help users estimate cloud costs for their analyses.

Estimating Costs for Common Bioinformatics and Data Analysis Tasks

The following table summarizes order-of-magnitude costs for common data analysis tasks. For example an order-of-magnitude cost of $10 indicates that the cost can be up to $10, estimated from the given example notebooks. Estimated costs between $10 and $100 are reported as the next order of magnitude, $100:

Bioinformatics / Data Analysis Task

Dataset(s)

Tools

Approx. Cost (Max)

Identify differentially expressed genes

TCGA

BigQuery, Colab, Python, R

$1

TCGA

BigQuery, BigQuery ML, Colab, Python, R

$1 ($100) *

Train a linear regression model using gene expression data

TCGA

BigQuery, BigQuery ML, Colab, Python

$1 ($100) *

Train a deep neural network (DNN) regression model using gene expression data

TCGA

BigQuery, Colab, TensorFlow, Compute Engine w/ GPUs

$1 **

Analyze RNA-seq data using the GDC workflow

TCGA

Compute Engine, Cloud Storage, CWL

$10 ***

  • *BigQuery ML costs depend on data size. In these examples, a subset of data was extracted to a temporary table, which was used as input to BigQuery ML. This reduces costs substantially. If using all gene features of a TCGA dataset, costs can grow to the order of $100.

  • **With small datasets, use of GPUs in Colab does not cost extra (unless using Colab Pro). However, if TensorFlow code is executed in a VM with GPUs, the hourly cost can range from $1 to $10.

  • ***Cost per sample depends on sample size (i.e., number of reads) and processing time.

  • BigQuery ML vs. TensorFlow w/ Compute Engine or Colab GPUs: When choosing between these tools for machine learning, consider the following guidelines:

    • TensorFlow w/ Compute Engine or Colab GPUs: Appropriate for data exploration or parameter tuning requiring multiple iterations of training and evaluation.

    • BigQuery ML: Appropriate for production deployment of machine learning models. For example, after optimizing model parameters, train and deploy the final model with BigQuery ML.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Office Hours

Do you need assistance with getting started? Questions on merging your research with cancer data in the cloud? Or possibly help with troubleshooting?

We have virtual Office Hours on Tuesdays and Thursdays for any questions on ISB-CGC functionality or data that you may have. We look forward to speaking with you.

Day of the Week

Time

Host

Link

Tuesday

2:00pm – 3:00pm Eastern

Poojitha Gundluru

http://meet.google.com/jkg-cxke-yzs

Thursday

11:00am – 12:00pm Eastern

Poojitha Gundluru

http://meet.google.com/jai-kgkg-sii

Note

If you are unable to join either meeting link, please email feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Programs and Data Sets

The National Cancer Institute (NCI) Genomic Data Commons (GDC) and Proteomics Data Commons (PDC) provide the cancer research community with data repositories that enables data sharing across cancer genomic and proteomic studies (known as Programs) in support of precision medicine.

The ISB-CGC started with The Cancer Genome Atlas (TCGA) data sets but has expanded to include other data sets from programs such as Therapeutically Applicable Research to Generate Effective Treatments (TARGET). Along with the NCI GDC and PDC data sets, ISB-CGC hosts data sets from programs such as Catalogue Of Somatic Mutations In Cancer (COSMIC) from the Wellcome Trust Sanger Institute. We are always interested in adding new data sets, so if you have any suggestions or requests for additional data, please let us know (feedback@isb-cgc.org).

Clinical, Biospecimen and Processed -Omics Data Sets

From Genomic Data Commons

_images/omicsData.png

Between ISB-CGC and the NCI GDC, there are many cancer data sets available on the Google Cloud Platform. ISB-CGC hosts some carefully curated, high-level clinical, biospecimen and molecular data sets and tables in Google BigQuery as well as radiology and pathology images in Google Cloud Storage. The GDC hosts several more data sets that include low-level sequencing data. For more information about the GDC, see the GDC Overview.

Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are available in the GDC Cloud Storage buckets, in ISB-CGC BigQuery tables and through ISB-CGC web tools. The table below lists each Program and where (through ISB-CGC) that you can find its data.

  • Within the detailed documentation on each Program (click on the Program name), there is an example of how to use the metadata stored in ISB-CGC BigQuery tables to locate the Program’s files on the GDC Google Cloud Storage buckets.

  • To learn more about using this data with ISB-CGC web tools, go to the ISB-CGC Web Interface section of this document.

  • To locate these tables in the ISB-CGC BigQuery project, use the ISB-CGC BigQuery Table Search.

Program

GDC Google Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

BEATAML

checkmark

checkmark

checkmark

CCLE

checkmark

checkmark

checkmark

CGCI

checkmark

checkmark

CMI

checkmark

checkmark

CPTAC

checkmark

checkmark

CTSP

checkmark

checkmark

Exceptional Responders

checkmark **

FM

checkmark

checkmark

checkmark

GENIE

checkmark

checkmark *

HCMI

checkmark

checkmark

MMRF

checkmark

checkmark

checkmark

MP2PRT

checkmark **

NCICCR

checkmark

checkmark *

OHSU

checkmark

checkmark *

checkmark

ORGANOID

checkmark

checkmark

REBC

checkmark

checkmark *

TARGET

checkmark

checkmark

checkmark

TCGA

checkmark

checkmark

checkmark

TCGA Pathology and Radiology images

checkmark

checkmark

checkmark

TRIO

checkmark

checkmark *

VAREPOP

checkmark

checkmark

WCDT

checkmark

checkmark

*Clinical and metadata only available

**Clinical data only available

BEATAML1.0 Data Set
About the BEATAML1.0

The BEATAML1.0 data is from several studies focused on acute myeloid leukemia (AML) and the effect of different therapies such as the drug Crenolanib. The implementation of targeted therapies for AML was challenging due to two reasons. The first was due to the intricate mutational patterns within and across patients and the second, was a shortage of pharmacologic agents for most mutational events.

The Crenolanib drug was studied because it is a potent type I pan-FLT3 (GeneID:2322) inhibitor, and FLT3 mutations are associated with poor prognosis and commonly detected in AML patients.

About the BEATAML1.0 Data

The BEATAML1.0 consists of over 220 files with 56 phenotyped subjects, 672 tumor specimens collected from 562 cases, and over 36 TB of data. The data is made up of mainly BAM, VCF, TXT, and TSV files. The majority of the data is whole-exome sequencing along with RNA sequencing. The Project ID in the GDC is BEATAML1.0-CRENOLANIB and BEATAML1.0-COHORT.

For more information on the BEATAML1.0 data, please refer to these sites:

Accessing the BEATAML1.0 Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the BEATAML1.0 files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'BEATAML1.0'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the BEATAML1.0 Data in Google BigQuery

ISB-CGC has BEATAML data, such as clinical and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with BEATAML selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The BEATAML tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.BEATAML contains the latest tables for each data type.

  • Data set isb-cgc-bq.BEATAML_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

CCLE Data Set
About the Cancer Cell Line Encyclopedia

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines.

About the Cancer Cell Line Encyclopedia Data

The CCLE aligned reads (BAM files) are currently available in an open-access Cloud Storage bucket which you can browse here. CCLE data is also available in ISB-CGC Google BigQuery tables.

Accessing the Cancer Cell Line Encyclopedia Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the CCLE files. Here is an example:

SELECT legacy.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current` as legacy, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CCLE'
AND legacy.file_gdc_id = GCSurl.file_gdc_id
Accessing the CCLE Data in Google BigQuery

ISB-CGC has CCLE data, such as clinical, biospecimen, copy number segment, RMA Expression and somatic mutation, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with CCLE selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CCLE tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.CCLE contains the latest tables for each data type.

  • Data set isb-cgc-bq.CCLE_versioned contains previously released tables, as well as the most current table.

Note that some of the tables in the isb-cgc-bq project were migrated from the isb-cgc project. If you were using data sets isb-cgc.ccle_201602_alpha, isb-cgc.CCLE_bioclin_v0 and isb-cgc.CCLE_hg19_data_v0, they still exist but are deprecated.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

CGCI Data Set
About the Cancer Genome Characterization Initiative

The Cancer Genome Characterization Initiative is a series of studies sponsored by the Office of Cancer Genomics (OCG) at the National Cancer Institute (NCI). This program utilizes molecular characterization to uncover distinct features of rare cancers such as HIV+ associated cancers and rare pediatric cancers. The Burkitt Lymphoma Genome Sequencing Project (BLGSP) is one of the projects available through GDC. It explores genetic changes in patients with Burkitt lymphoma (BL) that could lead to better prevention, detection, and treatment of this rare and aggressive cancer.

About the Cancer Genome Characterization Initiative Data

CGCI data consists of 120 cases with RNA sequencing, miRNA sequencing, and whole-genome sequencing data. The NCI GDC houses all the clinical, biospecimen, and molecular characterization data with over 589 BAM, 339 TXT, 402 TSV, 237 BRC XML, 120 BRC PPS XML, and 93 BCR SSF XML files in around 50.28 TB of data. The Project ID in the GDC Data Portal is CGCI-BLGSP.

For more information on the CGCI data, please refer to these sites:

Accessing Cancer Genome Characterization Initiative Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the CGCI files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CGCI'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the CGCI Data in Google BigQuery

ISB-CGC has CGCI data, such as clinical, RNA-seq and masked somatic mutations, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with CGCI selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CGCI tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.CGCI contains the latest tables for each data type.

  • Data set isb-cgc-bq.CGCI_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

CMI Data Set
About the CMI

Count Me In (CMI) is a non-profit organization that is stewarded by four organizations: Emerson Collective, Broad Institute of MIT and Harvard, the Biden Cancer Initiative, and the Dana-Farber Cancer Institute. Count Me In works to engage patients via online platforms and share clinical and genomic data through public databases. Count Me In: The Metastatic Breast Cancer (MBC) Project is a patient-driven research initiative to accelerate metastatic breast cancer research. Count Me In: The Angiosarcoma (ASC) Project focuses on angiosarcoma, which is an exceedingly rare soft tissue sarcoma making it difficult to conduct large-scale research studies. Thus, this project demonstrates the feasibility of directly engaging patients to democratize research and create a large patient cohort.

About the CMI Data

The CMI consists of almost 5,000 files with 236 phenotyped subjects and over 16.09 TB of data. The data is made up of mainly BAM, VCF, TXT, and TSV files. The majority of the data is whole-exome sequencing along with RNA sequencing. The Project ID in the GDC is CMI-MBC and CMI-ASC.

For more information on the CMI data, please refer to these sites:

Accessing the CMI Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the CMI files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CMI'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the CMI Data in Google BigQuery

ISB-CGC has CMI data, such as clinical and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with CMI selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CMI tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.CMI contains the latest tables for each data type.

  • Data set isb-cgc-bq.CMI_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

CPTAC Data Set
About the NCI Clinical Proteomic Tumor Analysis Consortium

The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis or proteogenomics.

About the NCI Clinical Proteomic Tumor Analysis Consortium Data Set

From GDC

CPTAC data obtained through the Genomic Data Commons (GDC) consists of whole-genome sequencing, whole-exome sequencing, RNA sequencing, and miRNA sequencing. The program analyzed more than 700 cases. The GDC currently has controlled VCF, TSV, and BAM data available. The Project ID in the GDC Data Portal is CPTAC-2 and CPTAC-3.

For more information on the CPTAC data, please refer to these sites:

From PDC

ISB-CGC also has proteomic CPTAC data, obtained from the Proteomics Data Commons (PDC) API. This includes clinical and protein expression data for breast, ovarian, colon, liver, lung, uterine and other cancers.

The NCI CPTAC has generated a tremendous amount of valuable quantitative proteomics data derived from clinical cancer specimens and makes them publicly accessible to the community. We have imported the data into Google BigQuery, where they can be queried via SQL and easily joined with data tables from TCGA using the BigQuery interface or programmatically with the BigQuery API.

Which studies are available?

  • CCRCC - Clear cell renal cell carcinoma

  • GBM - glioblastoma multiforme

  • HNSCC - Head and neck squamous cell carcinoma

  • LUAD - lung adenocarcinoma

  • UCEC - Uterine Corpus Endometrial Carcinoma

  • Breast cancer

  • Colon cancer

  • Ovarian cancer

Most studies have both whole proteome as well as phosphoproteome. A few studies also have acetylome and glycoproteome data.

What processing of the raw data is available here?

  • Most data have been processed by the original producers and presented in publications.

  • The same raw data have been processed uniformly through the CPTAC Common Data Analysis Pipeline (CDAP).

  • We provide here the results from the CDAP sourced from the PDC API.

Important considerations:

  • All abundances are presented as log2 ratios as computed by the CDAP.

  • Abundances are comparable within each study since the same reference was used within each study.

  • However, different controls were used for different studies, and therefore extreme caution should be used when comparing abundance values between different studies.

  • Some PDC datasets are embargoed, which means that the data may be examined prior to the end of the embargo period, but no manuscripts may be published until the embargo expires. Currently, ISB-CGC does not host any embargoed data in our BQ datasets.

Python Jupyter Notebooks showing examples of queries of PDC CPTAC data are available at:

Accessing the NCI Clinical Proteomic Tumor Analysis Consortium Data on the Cloud

Besides accessing the GDC files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the CPTAC files. Here is an example to find CPTAC-2 GDC files:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CPTAC' and project_short_name = 'CPTAC-2'
AND active.file_gdc_id = GCSurl.file_gdc_id

Here is an example to find CPTAC-3 GDC files:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CPTAC' and project_short_name = 'CPTAC-3'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the CPTAC Data in Google BigQuery

ISB-CGC has GDC CPTAC data, such as clinical, RNA-Seq and somatic mutation, and PDC CPTAC data, such as clinical and protein expression, stored in Google BigQuery tables.

Information about these tables can be found using the ISB-CGC BigQuery Table Search with CPTAC2 and/or CPTAC3 selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CPTAC tables are in project isb-cgc-bq.

  • Data set isb-cgc-bq.CPTAC contains the latest tables for each data type.

  • Data set isb-cgc-bq.CPTAC_versioned contains previously released tables, as well as the most current table.

Note that some data are part of a CPTAC2 retrospective study of TCGA data. These tables are labeled as both program CPTAC2 and TCGA and can be found by filtering for either. The tables are in project isb-cgc-bq.

  • Data set isb-cgc-bq.TCGA contains the latest tables for each data type.

  • Data set isb-cgc-bq.TCGA_versioned contains previously released tables, as well as the most current table.

In addition, there are some tables with CPTAC data derived from the 2017 paper Proteogenomics connects somatic mutations to signalling in breast cancer. These are in data set isb-cgc.hg19_data_previews. They are labeled with programs CPTAC2 and TCGA and source LIT (for literature).

To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation. Here is an example of a PDC CPTAC table viewed in the Google BigQuery console: quant_acetylome_prospective_breast_BI_pdc_current


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

CTSP Data Set
About the Clinical Trials Sequencing Project

The Clinical Trials Sequencing Project (CTSP) is a joint collaboration from the National Cancer Institute (NCI) and the Division of Cancer Treatment and Diagnosis (DCTD) to promote the use of genomics in NCI-sponsored clinical trials and elucidate the molecular basis of response and resistance to therapies studied. Breast cancer, renal cell carcinoma, and diffuse large B-cell lymphoma are the cancer types that are currently under study.

About the Clinical Trials Sequencing Project Data

NCI utilized whole genome sequencing and/or whole-exome sequencing in conjunction with transcriptome sequencing to try to identify recurrent genetic alterations (mutations, deletions, amplifications, rearrangements) and/or gene expression signatures that would be important to the hypothesis(es) submitted by the investigators. The samples are processed and submitted for genomic characterization using pipelines and procedures established within The Cancer Genome Analysis (TCGA) project. There are 89 controlled access BAM files along with TSV files in the GDC.

For more information on the CTSP data, please refer to these sites:

Accessing the Clinical Trials Sequencing Project Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the CTSP files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CTSP'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the CTSP Data in Google BigQuery

ISB-CGC has CTSP data, such as clinical and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with CTSP selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CTSP tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.CTSP contains the latest tables for each data type.

  • Data set isb-cgc-bq.CTSP_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Exceptional Responders Data Set
About Exceptional Responders

The Exceptional Responders Initiative is a pilot study to investigate the underlying molecular factors driving exceptional treatment responses of cancer patients to drug therapies.

About Exceptional Responders Data

Exceptional Responders has one project EXCEPTIONAL_RESPONDERS-ER with 84 cases spanning nine disease types and 20 primary sites. Data categories include sequencing reads, transcriptome profiling and simple nucleotide variation.

For more information on Exceptional Responders data, please refer to the site below:

Accessing the Exceptional Responders Data in Google BigQuery

ISB-CGC has Exceptional Responders data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with EXCEPTIONAL RESPONDERS selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The Exceptional Responders tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.EXC_RESPONDERS contains the latest tables for each data type.

  • Data set isb-cgc-bq.EXC_RESPONDERS_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

FM Data Set
About the Foundation Medicine

The Foundation Medicine Adult Cancer Clinical Data Set (FM) was a study conducted by Foundation Medicine Inc. (FMI), which is a molecular information company that specializes in precision medicine. FMI has generated genomic profiles for thousands of cancer patients, which they designed to match each patient with a personalized treatment plan.

About the Foundation Medicine Data

FM data set consists of more than 18,000 unique solid tumor samples that underwent genomic profiling on a single uniform platform as part of standard clinical care. The data set is derived from the FoundationOne® genomic profiling assay version 2 that interrogates exonic regions of 287 cancer-related genes and selected introns from 19 genes known to undergo rearrangements in human cancer. The Genomic Data Commons (GDC) currently has VCF, TSV, and MAF data available. There are more than 36,008 VCF files, 84 TSV files, and 42 MAF files available with, 36,050 of them that are Controlled Access and 84 files, which are Open Access. The project identification in the GDC Data Portal is FM-AD.

For more information on the FM data, please refer to these sites:

Accessing the Foundation Medicine Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the FM files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'FM'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the FM Data in Google BigQuery

ISB-CGC has FM data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with FM selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The FM tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.FM contains the latest tables for each data type.

  • Data set isb-cgc-bq.FM_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

GENIE Data Set
About the AACR Project Genomics Evidence Neoplasia Information Exchange

The AACR Project Genomics Evidence Neoplasia Information Exchange contains data generated from an international pan-cancer registry to serve as an evidence base for the entire cancer community. Genomic and baseline clinical data from more than 40,000 tumors has been made available in the GDC, following the efforts of AACR’s strategic and technical partners, Sage Bionetworks and Memorial Sloan Kettering Cancer Center.

About the AACR Project Genomics Evidence Neoplasia Information Exchange Data

The GENIE data set includes masked annotations, somatic mutations, gene level copy number scores, and transcript fusion analysis. The program analyzed more than 44,000 cases. The Genomic Data Commons (GDC) currently has MAF, TXT, and TSV controlled data available.

For more information on GENIE data, please refer to the site below:

Accessing the AACR Project Genomics Evidence Neoplasia Information Exchange Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the GENIE files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'GENIE'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the GENIE Data in Google BigQuery

ISB-CGC has GENIE data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with GENIE selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The GENIE tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.GENIE contains the latest tables for each data type.

  • Data set isb-cgc-bq.GENIE_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

HCMI Data Set
About the Human Cancer Models Initiative

The Human Cancer Models Initiative (HCMI) is a collaborative international consortium that is generating novel, next-generation, tumor-derived culture models annotated with genomic and clinical data. The collaborating institutions are the National Cancer Institute (NCI), Cancer Research UK (CRUK), Wellcome Sanger Institute (WSI), and foundation Hubrecht Organoid Technology (HUB). The four Cancer Model Development Centers (CMDCs), which are supported by the NCI as part of the HCMI, are Broad Institute of MIT and Harvard (BROAD), Cold Spring Harbor Laboratory (CSHL), Stanford University, and Weill Cornell Medical College.

About the Human Cancer Models Initiative Data

HCMI data consists of 23 cases with over 450 phenotyped subjects with whole-exome sequencing, RNA sequencing, and whole-genome sequencing data. The NCI GDC houses all the clinical, biospecimen, and molecular characterization data with over 460 VCF, 261 BAM, 123 TXT, 57 TSV, and 23 BRC XML files. The Project ID in the GDC Data Portal is HCMI-CMDC.

For more information on the HCMI data, please refer to these sites:

Accessing the Human Cancer Models Initiative Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the HCMI files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'HCMI'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the HCMI Data in Google BigQuery

ISB-CGC has HCMI data, such as clinical, RNA-seq and somatic mutation, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with HCMI selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The HCMI tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.HCMI contains the latest tables for each data type.

  • Data set isb-cgc-bq.HCMI_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

MMRF Data Set
About the Multiple Myeloma Research Foundation

The Multiple Myeloma Research Foundation (MMRF) seeks to provide resources and research for patients with multiple myeloma. MMRF began a ten-year study to track new Multiple Myeloma patients to create a rich data set which was led by researchers from the Dana Farber Cancer Institute, Celgene Corp., and the University of Arkansas for Medical Sciences. This trial was named CoMMpass.

About the Multiple Myeloma Research Foundation Data

The CoMMpass trial is a longitudinal observation study of 1000 newly diagnosed myeloma patients receiving various standard approved treatments. This trial aims to collect tissue samples and genetic information along with quality of life, disease, and clinical outcomes. CoMMpass data consists of 995 cases with RNA, whole-exome, and whole-genome sequencing data. The NCI GDC houses all the molecular characterization data with over 10,918 VCF, 6,577 BAM, 2,577 TXT, and 1718 TSV files in around 206.63 TB of data. The Project ID in the GDC Data Portal is MMRF-COMMPASS.

For more information on the MMRF data, please refer to these sites:

Accessing Multiple Myeloma Research Foundation Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the MMRF files. Here is an example:

SELECT active.*, file_gdc_url
FFROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'MMRF'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the MMRF Data in Google BigQuery

ISB-CGC has MMRF data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with MMRF selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The MMRF tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.MMRF contains the latest tables for each data type.

  • Data set isb-cgc-bq.MMRF_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

MP2PRT Data Set
About MP2PRT

The Molecular Profiling to Predict Response to Treatment (MP2PRT) program is part of the NCI’s Cancer Moonshot Initiative. This study “Identification of Genetic Changes Associated with Relapse and/or Adaptive Resistance in Patients Registered as Favorable Histology Wilms Tumor on AREN03B2” performs genomic characterization on trio cases (normal tissue, tumor tissue at time of diagnosis, tumor tissue at time of relapse) from patients who relapsed with Favorable Histology Wilms Tumor.

About MP2PRT Data

The MP2PRT data set includes one project MP2PRT-WT with 52 cases. Data categories include sequencing reads, transcriptome profiling, simple nucleotide variation and copy number variation.

For more information on MP2PRT data, please refer to the site below:

Accessing the MP2PRT Data in Google BigQuery

ISB-CGC has MP2PRT data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with MP2PRT selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The MP2PRT tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.MP2PRT contains the latest tables for each data type.

  • Data set isb-cgc-bq.MP2PRT_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

NCICCR Data Set
About the NCI Center for Cancer Research

The NCI Center for Cancer Research (NCICCR) conducted a study on the Genomic Variation in Diffuse Large B Cell Lymphomas (DLBCL) through an integrative analysis of genetic lesions in 574 diffuse large B cell lymphomas (DLBCL). The study investigated genomic structural variation, genetic alteration, and its effect on the development and biology of lymphomas by using high throughput sequencing, gene expression, and methylation status.

About the NCI Center for Cancer Research Data

There were around 489 cases that were phenotyped, contributing Authorized-Access, individual-level data. The Genomic Data Commons currently has around 957 controlled access BAM files available. The Project ID in the GDC Data Portal is NCICCR-DLBCL.

For more information on the NCICCR data, please refer to these sites:

Accessing the NCI Center for Cancer Research Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the NCICCR files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'NCICCR'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the NCICCR Data in Google BigQuery

ISB-CGC has NCICCR data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with NCICCR selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The NCICCR tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.NCICCR contains the latest tables for each data type.

  • Data set isb-cgc-bq.NCICCR_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

OHSU Data Set
About the Oregon Health & Science University

The Oregon Health & Science University contains data generated from chronic neutrophilic leukemia (CNL), atypical chronic myeloid leukemia (aCML), and unclassified myelodysplastic syndrome/myeloproliferative neoplasms (MDS/MPN-U), which are a group of rare, heterogeneous myeloid disorders.

About the Oregon Health & Science University Data

The data set consists of whole-exome and RNA sequencing on a cohort of over 100 cases of these rare hematologic malignancies. It presents the complete survey of the genomic landscape of these diseases to date. The Project ID in the GDC Data Portal is OHSU-CNL.

For more information on the OHSU data, please refer to these sites:

Accessing the Oregon Health & Science University Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the OHSU files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'OHSU'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the OHSU Data in Google BigQuery

ISB-CGC has OHSU data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with OHSU selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The OHSU tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.OHSU contains the latest tables for each data type.

  • Data set isb-cgc-bq.OHSU_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ORGANOID Data Set
About the Pancreas Cancer Organoid Profiling Program

The Pancreas Cancer Organoid Profiling Program is from the Organoid Profiling Identifies Common Responders to Chemotherapy in Pancreatic Cancer study and contains data generated from a collection of patient-derived pancreatic normal and cancer organoids.

About the Pancreas Cancer Organoid Profiling Data Set

The data set consists of 70 cases and includes whole-genome, targeted exome, and RNA sequencing data on organoids as well as matched tumor and normal tissues. This data set is a valuable resource for pancreas cancer researchers, and those looking to compare primary tissue to organoid culture. The NCI GDC houses all the clinical, biospecimen, and molecular characterization data with over 130 VCF, 298 BAM, 165 TXT, and 110 TSV files in around 21.89 TB of data. The Project ID in the GDC Data Portal is ORGANOID-PANCREATIC.

For more information on the ORGANOID data, please refer to these sites:

Accessing Pancreas Cancer Organoid Profiling Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the ORGANOID files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'ORGANOID'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the ORGANOID Data in Google BigQuery

ISB-CGC has ORGANOID data, such as clinical and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with ORGANOID selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The ORGANOID tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.ORGANOID contains the latest tables for each data type.

  • Data set isb-cgc-bq.ORGANOID_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

REBC Data Set
About REBC

REBC studies comprehensive genomic characterization of radiation-related papillary thyroid cancer in the Ukraine after the 1986 Chernobyl nuclear power plan accident. This accident released radioactive contaminants into the surrounding areas in Ukraine, Belarus, and Russia, causing an increased occurrence of thyroid cancer among individuals who were children at the time of the accident or born not long afterwards.

About REBC

The REBC data set includes one project REBC-THYR with 440 cases. Data categories include sequencing reads, transcriptome profiling, simple nucleotide variation and copy number variation.

For more information on REBC data, please refer to the site below:

Accessing the REBC Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the REBC files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'REBC'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the REBC Data in Google BigQuery

ISB-CGC has REBC data, such as clinical and metadata, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with REBC selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The REBC tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.REBC contains the latest tables for each data type.

  • Data set isb-cgc-bq.REBC_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

TARGET Data Set
About the Therapeutically Applicable Research to Generate Effective Treatments

The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program applied a comprehensive genomic approach to determine the molecular changes driving childhood cancers. Investigators formed a collaborative network to facilitate the discovery of molecular targets and translate those findings into the clinic. TARGET is managed by NCI’s Office of Cancer Genomics and Cancer Therapy Evaluation Program.

About the Therapeutically Applicable Research to Generate Effective Treatments Data

The ISB-CGC currently hosts several TARGET data sets in BigQuery. TARGET controlled-access data is available to authorized users in Genomic Data Commons and open-access data includes RNA-seq and miRNA-seq expression levels, and is available in BigQuery, along with the open-access clinical and biospecimen information.

The TARGET data is available at the GDC in the legacy archive which contains over 10,000 files for over 5,000 cases. Virtually all of this data is low-level (and controlled-access) sequence data (including 1702 RNA-seq files, 765 miRNA-seq, with the remainder being WXS or WGS DNA-seq BAMs). Some of this data has been reprocessed and is available on the main GDC Data Portal. This newer dataset so far includes 33,402 files representing 6,197 cases and totaling over 200 TB. Over half of the files are controlled-access files, including BAM, VCF, and MAF file types, based on WXS, RNA-seq, and miRNA-seq data. The remaining files are open-access files, including RNA-seq and miRNA-seq quantification, as well as clinical and biospecimen supplement files.

BigQuery Therapeutically Applicable Research to Generate Effective Treatments Data

The open-access TARGET data hosted by the ISB-CGC Platform includes:

  • Clinical (de-identified) and Biospecimen data: these data were originally provided in XML files (Level-1)

  • Gene (mRNA) expression data: these data were originally provided as TSV files (Level-3)

  • microRNA expression data: these data were originally provided as TSV files (Level-3)

The information scattered over thousands of XLSX and TSV files at the GDC is provided in a much more accessible form in a series of BigQuery tables.

For more information on TARGET data, please refer to the site below:

Accessing the Therapeutically Applicable Research to Generate Effective Treatments data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the TARGET files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'TARGET'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the TARGET Data in Google BigQuery

ISB-CGC has TARGET data, such as clinical, biospecimen, miRNA and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with TARGET selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

ISB_CGC also has controlled access TARGET VCF data in Google BigQuery tables; see here for more information.

The TARGET tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.TARGET contains the latest tables for each data type.

  • Data set isb-cgc-bq.TARGET_versioned contains previously released tables, as well as the most current table.

Note that some of the tables in the isb-cgc-bq project were migrated from the isb-cgc project. If you were using data sets isb-cgc.TARGET_bioclin_v0 and isb-cgc.TARGET_hg38_data_v0, they still exist but are deprecated.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

TCGA Data Set
About The Cancer Genome Atlas

The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

The overarching goal of TCGA is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) used a phased-in strategy to launch TCGA. A pilot project developed and tested the research framework needed to systematically explore the entire spectrum of genomic changes involved in more than 20 types of human cancer.

This massive effort was launched in 2006. The final samples were shipped in mid-2014, and analysis of the data produced by this program continues to this day.

About The Cancer Genome Atlas Data

The ISB-CGC hosts several TCGA datasets in BigQuery and more data is available through the Genomic Data Commons (GDC).

The vast majority (over 99%) of this petabyte of data consists of low-level sequence data, currently stored as files in the GDC (see figure below). Over the course of the TCGA project, this low-level (“Level 1”) data has been processed through a set of standardized pipelines and the resulting high-level (“Level 3”) data is frequently the data that is used in most downstream analyses. The ISB-CGC platform aims to make these different types of data accessible to the widest possible variety of users within the cancer research community.

_images/TCGASizeandComplexity.PNG
The Cancer Genome Atlas Data Platforms

When working with any of the data types, it is important to also be aware of both the platform that was used to generate the underlying raw data as well as the pipeline that was used to process the data. For example, over the course of the TCGA study, DNA methylation data were obtained using first the Illumina HumanMethylation27 platform, and later using the HumanMethylation450 platform. Any analysis that combines data from these two platforms across a cohort of samples should take this into consideration. Another example where multiple platforms and/or pipelines were used to produce a single data type is the Level-3 gene expression data: most tumor samples were processed at UNC and the normalized gene-expression values are based on the RSEM method, while some tumor samples were processed at BCGSC and the normalized gene-expression values are based on RPKM.

The Cancer Genome Atlas Data Levels

For each type of data, there are typically three levels of data:

  • Level 1 typically represents raw, unnormalized data

  • Level 2 typically represents an intermediate level of processing and/or normalization of the data

  • Level 3 typically represents aggregated, normalized, and/or segmented data

The results of integrative or pan-cancer analyses are sometimes referred to as “Level 4” data. More information about Data Level Classification can be found on the NCI page.

The Cancer Genome Atlas Data Types

The TCGA data set is unique in that the tumor samples were assayed using a standard set of platforms and pipelines in order to produce a comprehensive data set including:

  • DNA sequencing of tumor samples and matched-normals (typically blood samples) in order to detect somatic mutations

  • SNP array-based DNA copy-number and genotyping analysis of tumor samples and matched-normals

  • DNA methylation of tumor samples

  • messenger RNA (mRNA) expression analysis of the tumor samples to capture the gene expression profile

  • microRNA (miRNA) expression profiling of the tumor samples

In addition, protein expression for a significant fraction (~20%) of all tumor samples was obtained using RPPA (reverse phase protein array).

Open-Access The Cancer Genome Atlas Data

The open-access TCGA data includes:

  • Clinical (de-identified) and Biospecimen data: these data were originally provided in XML files (Level-1)

  • Somatic mutation data: these data were originally provided in MAF files (Level-2)

  • DNA copy-number segments: these data were originally provided as segmentation files (Level-3)

  • DNA methylation data: these data were originally provided as TSV files (Level-3)

  • Gene (mRNA) expression data: these data were originally provided as TSV files (Level-3)

  • microRNA expression data: these data were originally provided as TSV files (Level-3)

  • Protein expression data: these data were originally provided as TSV files (Level-3)

  • TCGA Annotations data: annotations were originally obtained from the TCGA Annotations Manager, and can be found on the GDC Data Portal

The information scattered over tens of thousands of XML and TSV files at the GDC is provided in a much more accessible form in a series of BigQuery tables; see the Accessing the TCGA Data in Google BigQuery section below.

For more details, please see our Community Notebook Repository for tutorials and code examples in Python and R.

Controlled-Access The Cancer Genome Atlas Data

The controlled-access TCGA data includes:

  • SNP array CEL files: these Level-1 data files were provided by the DCC and include over 22,000 files for both tumor and matched-normal samples

  • VCF files: these Level-2 data files were provided by the DCC and include over 15,000 files produced by several different centers (primarily Broad and BCGSC)

  • MAF files: these “protected” mutation files (Level-2) were provided by the DCC (note that these files were not generated uniformly for all tumor types)

  • DNA-seq BAM files: these Level-1 data files were provided by CGHub

    • over 37,000 of these files are available in Google Cloud Storage (GCS)

    • roughly 90% of these BAM files contain exome data, the remaining 10% contain whole-genome data

    • BAM index (BAI) files are also available for all BAM files

  • mRNA- and microRNA-seq BAM files: these Level-1 data files were provided by CGHub

    • over 13,000 mRNA-seq BAM files are available in GCS

    • over 16,000 miRNA-seq BAM files are available in GCS

  • mRNA-seq FASTQ files: these Level-1 data files were provided by CGHub and include over 11,000 tar files

The Cancer Genome Atlas Data Repository History

Historically, the data was obtained from two former TCGA data repositories:

  • TCGA DCC: the TCGA Data Coordinating Center which provided a Data Portal from which users could download open-access or controlled-access data. This portal provided access to all TCGA data except for the low-level sequence data.

  • CGHub: the Cancer Genomics Hub was NCI’s secure data repository for all TCGA BAM and FASTQ sequence data files.

In June of 2016, the official data repository for all TCGA and other NCI CCG data is the NCI’s Genomic Data Commons (GDC). The original TCGA data, aligned to the hg19 human reference genome is available from the GDC’s legacy archive while the new “harmonized” data, realigned to hg38 is available from the GDC’s main data portal.

Accessing The Cancer Genome Atlas Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the TCGA files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'TCGA'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the TCGA Data in Google BigQuery

ISB-CGC has TCGA data, such as clinical, biospecimen, miRNA and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with TCGA selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The TCGA tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.TCGA contains the latest tables for each data type.

  • Data set isb-cgc-bq.TCGA_versioned contains previously released tables, as well as the most current table.

Note that some of the tables in the isb-cgc-bq project were migrated from the isb-cgc project. If you were using data sets isb-cgc.TCGA_bioclin_v0, isb-cgc.TCGA_hg19_data_v0 and isb-cgc.TCGA_hg38_data_v0, they still exist but are deprecated.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

TCGA Radiology and Pathology Image Data Set

The TCGA images from The Cancer Imaging Archive (TCIA) as well as the pathology and diagnostic images previously available from the Cancer Digital Slide Archive (CDSA) are available in open-access Google Cloud Storage (GCS) buckets and can be explored through the ISB-CGC Web App.

Metadata for these files can be found in ISB-CGC Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with TCGA selected for filter PROGRAM and FILE METADATA selected for filter CATEGORY. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

Radiology Images

Over 1.4 million radiology image files in DICOM format, grouped together into over 20,000 ZIP files are available in a GCS bucket called gs://isb-tcia-open/. Each ZIP file may contain hundreds of images or just a single image.

The BigQuery metadata table, isb-cgc-bq.TCGA.radiology_images_tcia_current contains the full URLs to these ZIP files, e.g.:

gs://isb-tcia-open/images/TCGA-GBM/TCGA-06-5413/TCIA.image.1.3.6.1.4.1.14519.5.2.1.4591.4001.275342915307453440215680715165.zip

The metadata table also includes the patient identifier in TCGA “barcode” format, e.g. TCGA-06-5413 (which is also part of the GCS URL). Other information available in the table includes the body part examined, image modality, patient age, etc.

Pathology Images

Note

All tissue slide images from the TCGA program are currently unavailable for viewing. (Diagnostic images will display.)

Over 30,000 TCGA tissue slide images in SVS format, are also available in GCS, in the open-access bucket gs://gdc-tcga-phs000178-open/.

These files were uploaded from the GDC legacy archive.

The BigQuery metadata table, isb-cgc-bq.TCGA.slide_images_gdc_current contains the full URLs to these SVS files, e.g.:

gs://gdc-tcga-phs000178-open/9c4b1b5c-b5cf-48f6-bf41-047ceb8c883c/TCGA-CR-7365-01A-01-TS1.811bb2b7-66e3-4694-891b-10b436ec300d.svs

as well as image metadata and the TCGA case and sample “barcode” which can be used to join this table with other TCGA clinical, biospecimen and molecular data tables.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

TRIO Data Set
About TRIO

The Ukrainian National Research Center for Radiation Medicine Trio Study contains epidemiologic data of trios of parents (exposed to the radiation from the Chernobyl accident) and their unexposed offspring. The purpose of the study is to investigate the transgenerational effects following nuclear accidents to understand the consequences of parental exposure to ionizing radiation.

About the TRIO Data

The TRIO data set includes whole genome sequencing (WGS) sequencing reads for 339 cases in the project TRIO-CRU.

For more information on TRIO data, please refer to the site below:

Accessing the TRIO Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the TRIO files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'TRIO'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the TRIO Data in Google BigQuery

ISB-CGC has TRIO data, such as clinical and metadata, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with TRIO selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The TRIO tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.TRIO contains the latest tables for each data type.

  • Data set isb-cgc-bq.TRIO_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

VAREPOP Data Set
About the VA Research for Precision Oncology Program

The Research for Precision Oncology Program (RePOP) is a research activity that established a cohort of Veterans diagnosed with cancer and had genomic analyses performed on their tumor tissue as part of the standard of care. All data relevant to a patient’s cancer and cancer care was collected under RePOP, including patient demographics, comorbidities, genomic analysis, treatments, medications, lab values, imaging studies, and outcomes. All RePOP participants signed/verbal informed consent and signed HIPAA authorization to have their data stored and shared from RePOP’s Precision Oncology Program Data Repository (PODR).

About the VA Research for Precision Oncology Program Data

The VARePOP data set consists of 7 cases with somatic mutation and targeted sequencing data. The Genomic Data Commons currently has controlled access BAM and VCF files. The Project ID in the GDC Data Portal is VAREPOP-APOLLO.

For more information on the VAREPOP data, please refer to these sites:

Accessing the VA Research for Precision Oncology Program on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the VAREPOP files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'VAREPOP'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the VAREPOP Data in Google BigQuery

ISB-CGC has VAREPOP data, such as clinical, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with VAREPOP selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The VAREPOP tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.VAREPOP contains the latest tables for each data type.

  • Data set isb-cgc-bq.VAREPOP_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

WCDT Data Set
About the Genomic Characterization of Metastatic Castration Resistant Prostate Cancer

The overarching goal of the Genomic Characterization of Metastatic Castration-Resistant Prostate Cancer study is to illuminate molecular mechanisms of acquired resistance to therapeutic agents, and particularly androgen signaling inhibitors, in the treatment of metastatic castration-resistant prostate cancer (mCRPC).

About the Genomic Characterization of Metastatic Castration Resistant Prostate Cancer Data

West Coast Prostrate Cancer Dream Team (WCDT) data is available from the biopsies of castration-resistant prostate cancer metastases collected during the study. The data consists of 101 cases with over 202 whole-genome sequencing files and 792 RNA sequencing files consisting of 83TB of data. The Project ID in the GDC Data Portal is WCDT-MCRPC.

For more information on the WCDT data, please refer to these sites:

Accessing Genomic Characterization of Metastatic Castration Resistant Prostate Cancer Data on the Cloud

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the isb-cgc-bq.GDC_case_file_metadata data set in BigQuery.

  • To access these metadata files, go to the Google BigQuery console.

  • Perform SQL queries to find the WCDT files. Here is an example:

SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'WCDT'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the WCDT Data in Google BigQuery

ISB-CGC has WCDT data, such as clinical and RNA-seq, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with WCDT selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The WCDT tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.WCDT contains the latest tables for each data type.

  • Data set isb-cgc-bq.WCDT_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

From Proteomics Data Commons

PDC protein expression data are available in ISB-CGC BigQuery tables. The table below lists each Program.

Program

PDC AWS Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

CBTN

checkmark

CPTAC

checkmark

Georgetown Proteomics Research Program

checkmark

checkmark

ICPC

checkmark

Quantitative Digital Maps of Tissue Biopsies

checkmark

CBTN Data Set
About the Children’s Brain Tumor Network

The Children’s Brain Tumor Network (CBTN) seeks to innovate discoveries, to pioneer new treatments and to support open science to improve the health of every child and young adult diagnosed with a brain tumor. Previously, it was named the Children’s Brain Tumor Tissue Consortium (CBTTC).

About the Children’s Brain Tumor Network Data Set

CBTN has the Pediatric Brain Cancer Pilot Study available at the Proteomics Data Commons (PDC). ISB-CGC has procured this data through the PDC API.

Accessing the CBTN Data in Google BigQuery

ISB-CGC has CBTN data, such as clincial and protein expression, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with CBTTC selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The CBTTC tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.CBTTC contains the latest tables for each data type.

  • Data set isb-cgc-bq.CBTTC_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Georgetown Proteomics Research Program Data Set
About the Georgetown Proteomics Research Program

The Georgetown Proteomics Research Program is part of the Georgetown Lombardi Comprehensive Cancer Center at Georgetown University.

About the Georgetown Proteomics Research Program Data Set

The Georgetown Lung Cancer Proteomics Study is available at the Proteomics Data Commons (PDC). ISB-CGC has procured data from this study through the PDC API.

Accessing the Georgetown Proteomics Research Program Data in Google BigQuery

ISB-CGC has Georgetown Proteomics Research Program (GPRP) clinical data stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with GEORGETOWN PROTEOMICS selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The GPRP tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.GPRP contains the latest tables for each data type.

  • Data set isb-cgc-bq.GPRP_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ICPC Data Set
About the International Cancer Proteogenome Consortium

The International Cancer Proteogenome Consortium (ICPC) is a voluntary scientific organization that provides a forum for collaboration among some of the world’s leading cancer and proteogenomic research centers. Launched in late 2016, the ICPC includes researchers from over a dozen countries sharing data and results of proteogenomic analysis.

About the International Cancer Proteogenome Consortium Data Set

ICPC has several studies available at the Proteomics Data Commons (PDC). ISB-CGC has procured data for the following studies through the PDC API:

  • Proteogenomics of Gastric Cancer

  • HBV-Related Hepatocellular Carcinima

  • Oral Squamous Cell Carcinoma Study

  • Academia Sinica LUAD100

Accessing the ICPC Data in Google BigQuery

ISB-CGC has ICPC clinical and protein expression data stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with ICPC selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The ICPC tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.ICPC contains the latest tables for each data type.

  • Data set isb-cgc-bq.ICPC_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Quantitative Digital Maps of Tissue Biopsies Data Set
About the Quantitative Digital Maps of Tissue Biopsies Program

The program and its data are described in the paper Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps.

About the Quantitative Digital Maps of Tissue Biopsies Data Set

The PCT SWATH Kidney Study is available at the Proteomics Data Commons (PDC). ISB-CGC has procured data from this study through the PDC API.

Accessing the Quantitative Digital Maps of Tissue Biopsies Data in Google BigQuery

ISB-CGC has Quantitative Digital Maps of Tissue Biopsies clinical data stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with QUANT MAPS TISSUE BIOPSIES selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.Quant_Maps_Tissue_Biopsies contains the latest tables for each data type.

  • Data set isb-cgc-bq.Quant_Maps_Tissue_Biopsies_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

From Other Sources

Program

GDC Google Cloud Storage

ISB-CGC BigQuery Tables

ISB-CGC Cohort Builder

COSMIC

No, the COSMIC database is maintained by the Wellcome Sanger Institute, UK

Yes, COSMIC data is in BigQuery for registered users. Learn more about how to gain access to the COSMIC data here

Pan-Cancer Atlas

checkmark

HTAN

checkmark

COSMIC Data Set
About the Catalog Of Somatic Mutations In Cancer

The Catalogue Of Somatic Mutations In Cancer (COSMIC) is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. The COSMIC tables in BigQuery are produced in collaboration with the Wellcome Trust Sanger Institute to provide a new way to explore and understand the mutations driving cancer.

About the Catalog Of Somatic Mutations In Cancer Data

The BigQuery data sets contain all of the CSV and TSV files available for download from the COSMIC Download page for versions 85-92 except version 88. ISB-CGC will not be hosting any version of COSMIC data past version 92 due to licensing costs.

Please explore the tables at (after registering for access):

Note: the project isb-cgc contains versions 85-91 except 88 and project isb-cgc-bq contains versions 92

Accessing the Catalog Of Somatic Mutations In Cancer Data

To access the BigQuery tables, you will need to link your Google identity with a COSMIC account.

  • New COSMIC User: Register for a new COSMIC account. During registration, fill in the ‘Google ID’ field with your base* Google Identity.

A COSMIC account and academic use of the data is free, though commercial use of the COSMIC data is subject to licensing fees. Please review the COSMIC terms for more information.

  • Registered COSMIC User: After logging in, navigate to the Account Settings page and fill in the ‘Google ID’ field with your base* Google Identity.

Once you have linked your Google identity to a COSMIC account, ISB-CGC will obtain your Google Identity. After a short delay, you will have “viewer” access to the COSMIC tables in BigQuery. You will then be able to view the data sets in the BigQuery UI under the isb-cgc-bq Google Cloud project and query the tables with your own Google Cloud Project.

We also have tutorials on using the COSMIC data sets with BigQuery in our Community Notebook Repository that you can check out.

If you are new to using ISB-CGC Google BigQuery data sets, see the Quickstart Guide to learn how to obtain a Google identity and how to set up a Google Cloud Project. Additionally, we offer free cloud credits for cancer research; you can find out more here.

If you can’t successfully run a query or see the COSMIC tables under the isb-cgc-bq project, please verify that the Google ID you have provided is a valid Google account. If you are still unable to run a query or view the data sets under the isb-cgc-bq Google Cloud Project, please contact us at feedback@isb-cgc.org.

* e.g. the base account tb@mylab.org might have a longer-form alias like thomas.brown@mylab.org


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Pan-Cancer Atlas BigQuery Data

The Pan-Cancer Atlas BigQuery data set was produced in collaboration with the TCGA research network, the GDC, and the NCI. This rich data set allows for an integrated examination of the full set of tumors characterized in the robust TCGA dataset and provides a new way to explore and analyze the processes driving cancer.

The availability of Pan-Cancer Atlas data in BigQuery enables easy integration of this resource with other public data sets in BigQuery, including other open-access datasets made available by the ISB-CGC (see this and that for more details on other publicly accessible BigQuery data sets).

About Pan-Cancer Atlas Data

The Pan-Cancer Atlas BigQuery tables (accessed here) mirror most of the files shared by the Pan-Cancer Atlas initiative on the GDC PanCanAtlas Publications page.

The tables are generally unmodified uploads of the files in the GDC Pan-Cancer Atlas. The Filtered_* tables were annotated as appropriate with ParticipantBarcode, SampleBarcode, AliquotBarcode, SampleTypeLetterCode, SampleType and TCGA Study. Subsequently the tables were filtered using the Pan-Cancer Atlas whitelist (which is the list of TCGA barcodes included in the Pan-Cancer Atlas). Two exceptions are the (public) MC3 MAF file and the TCGA-CDR resource.

Use of the tables starting with Filtered_* is recommended.

See examples of statistical Jupyter notebooks using the Pan-Cancer Atlas data here.

Adding the Pan-Cancer Atlas tables to your workspace

If you are new to using ISB-CGC Google BigQuery data sets, see the Quickstart Guide to learn how to obtain a Google identity and how to set up a Google Cloud Project.

To add public BigQuery data sets and tables to your “view” in the Google BigQuery Console you need to know the name of the GCP project that owns the dataset(s). To add the publicly accessible ISB-CGC datasets (project name: isb-cgc-bq) which includes the Pan-Cancer Atlas data set ( dataset name: pancancer_atlas) follow these steps. (Note that these tables also exist in project isb-cgc, but that ISB-CGC is migrating current data to project isb-cgc-bq. If you are using the pancancer_atlas tables in isb-cgc, they are still available for you.)

You should now be able to see and explore all of the Pan-Cancer Atlas tables and also tables of other ISB-CGC data sets. Clicking on the blue triangle next to a dataset name will open it and show the list of tables in the data set. Clicking on a table name will open up information about the table in main panel, where you can view the Schema, Details, or a Preview of the table.

Additional projects with public BigQuery data sets which you may want to explore (repeating the same process will add these to your BigQuery side-panel) include genomics-public-data and google.com:biggene.

You can also search for and learn about Pan-Cancer Atlas tables through the ISB-CGC BigQuery Table Search UI. Type ‘pancancer’ in the Search box in the upper right-hand corner to filter for them.

Pan-Cancer Atlas BigQuery Query Example

Ready to query? Follow the steps below to run a query in the Google BigQuery Console. More details are here.

Let’s query using the MC3 somatic mutation table.

  • Click on COMPOSE NEW QUERY button.

  • Paste the sample query below into the text-box.

  • Within a second or two you should see a green circle with a checkmark below the lower right corner of the New Query text-box. – If instead you see a red circle with an exclamation mark, click on it to see what your Syntax Error is.

  • Once you do have the green circle, you can click on it to see a message like: “Valid: This query will process 76.3 MB when run.”

  • To execute the query, click on RUN!

WITH
mutCounts AS (
  SELECT
     COUNT(DISTINCT( Tumor_SampleBarcode )) AS CaseCount,
     Hugo_Symbol,
     HGVSc
  FROM
     `isb-cgc-bq.pancancer_atlas.Filtered_MC3_MAF_V5_one_per_tumor_sample`
  GROUP BY
     Hugo_Symbol,
     HGVSc
),
mutRatios AS (
  SELECT
     HGVSc,
     Hugo_Symbol,
     CaseCount,
     (CaseCount/SUM(CaseCount) OVER (PARTITION BY Hugo_Symbol)) AS ratio
  FROM
     mutCounts
)
SELECT  *
FROM
   mutRatios
WHERE
   CaseCount>=10
   AND ratio>=0.2
   AND HGVSc is not null
ORDER BY
   ratio DESC

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

HTAN Data Set
About HTAN

The Human Tumor Atlas Network (HTAN) is focused on transitions in cancer. Funded by the National Cancer Institute (NCI) Cancer Moonshot initiative, its mission is to construct three-dimensional atlases of the dynamic cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease (Cell April 2020). Many HTAN studies focus on single cell and multiplex imaging modalities.

About the HTAN Data

HTAN data encompasses at least 11 atlases and 17 primary tumor sites. Data (Release 2) was extracted from the HTAN Data Portal via Synapse (https://humantumoratlas.org/data-download).

To explore HTAN data, please see the HTAN Data Portal.

Accessing the HTAN Data in Google BigQuery

ISB-CGC has HTAN data, such as single cell RNA-Seq, clinical, biospecimen, and metadata, stored in Google BigQuery tables. Information about these tables can be found using the ISB-CGC BigQuery Table Search with HTAN selected for filter PROGRAM. To learn more about this tool, see the ISB-CGC BigQuery Table Search documentation.

The HTAN tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the ISB-CGC BigQuery Tables documentation.

  • Data set isb-cgc-bq.HTAN contains the latest tables for each data type.

  • Data set isb-cgc-bq.HTAN_versioned contains previously released tables, as well as the most current table.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Reference Data Sets

ISB-CGC hosts reference tables in BigQuery with information that describes or annotates human or other genomes, or is necessary to work with data generated by specific platforms.

Reference Data

ISB-CGC Hosted Reference Data

To facilitate working with the TCGA and other program data tables that the ISB-CGC is hosting in BigQuery, additional reference data tables have been created. Others are hosted by Google Cloud Life Sciences. Suggestions for more are welcome at feedback@isb-cgc.org.

For additional details about each of these tables, please use the BigQuery Table Search. To find the reference tables, select Genomic Reference Database under Category.

Genome Reference Data

Reference data that describes or annotates the human or other genomes is described in this section. Reference data hosted by the ISB-CGC in BigQuery tables are available in the isb-cgc.genome_reference data set. Tables based on gene-sets such as Ensembl and GENCODE can be used to find the genomic coordinates and identifiers for genes of interest, to perform queries that join tables with gene-symbol based data to tables with genomic-coordinate based data or tables that use other gene identifiers, for example.

Program/Source

Description

ClinVar

  • ClinVar contains reports of the relationships among human variations and phenotypes.

  • GRCh37

  • GRCh38

Cytoband/UCSC

  • Cytoband to Genomic Coordinate Conversion

  • liftOver_hg19_to_hg38 - This table provides a mapping of each hg19 position to the corresponding position in hg38, and can be used to perform a liftOver operation in BigQuery.

dbSNP

  • dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations

  • B150 GRCH37P13

  • B151 GRCH37P13

Ensembl

  • GRCh37: Release 75, the final build of the Ensembl gene-set mapped to GRCh37

  • GRCh38: Release 87, the most recent Ensembl gene-set mapped to GRCh38

GENCODE

  • GRCh37: Release 19, the final build of the GENCODE gene-set mapped to GRCH37

  • GRCh38: Releases 22, 23, and 24 from GENCODE are all available (because the TCGA data has been reprocessed by at least one center using each of these three different releases)

Gene Ontology Consortium

  • Tables based on GO annotations and the GO ontology.

Genome-Wide SNP Array

  • The technical documentation for the Affymetrix Genome-Wide Human SNP Array 6.0 array can be found here.

gnomAD

  • gnomAD aggregates and harmonizes both exome and genome sequencing data from a wide variety of large-scale sequencing projects.

  • GRCH37

ICD

Infinium

  • Infinium EPIC HG19 and HG38 Manifests

  • Infinium HM27 HG19 and HG38 Manifests

  • Infinium HM450 HG19 and HG38 Manifests

ISB-CGC

  • Gene Names Mapping: Data was loaded from multiple sources including NCBI, HGNC, ENSEMBL in Feb 2018 to simplify mapping between HGNC IDs, HGNC symbols, Entrez Gene IDs, Ensembl Gene IDs, Pubmed IDs,and RefSeq IDs.

Kaviar

  • The latest hg19- and hg38-based Kaviar databases are available. Kaviar is a compilation of SNVs, indels, and complex variants observed in humans, designed to facilitate testing for the novelty and frequency of observed variants.

miRBase

  • GRCh37: The human portion of version 20 of the miRBase database; including genomic coordinates for human microRNAs.

  • GRCh38: The human portion of version 21 of the miRBase database; including genomic coordinates for human microRNAs.

  • GRCh38: The human portion of version 22 of the miRBase database; including genomic coordinates for human microRNAs.

miRTarBase

Reactome

  • Ensembl2Reactome

  • miRBase2Reactome

UniProtKB

  • UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation.

  • UniProtKB Mapping

Platform Reference Data

Some reference data is necessary to work with data generated by specific platforms such as the Illumina DNA Methylation array. The platform_reference data set contains information on the Illumina DNA Methylation Platform.

Program/Source

Description

GDC

  • HG38 DNA Methylation - Most of the DNA Methylation data produced by the TCGA project was obtained using the Illumina Infinium HumanMethylation450 (aka 450k) BeadChip array. Some of the earlier tumor types were assayed on the older, 27k array.

Infinium

  • Illumina DNA Methylation Annotation - Platform annotation information has been uploaded into BigQuery; each CpG locus is uniquely identified as described in this technical note and this unique identifier can be used to look up and cross-reference data between the TCGA DNA methylation data table and the platform annotation table.

Cytoband/UCSC

  • DNA Methylation Annotation Liftover to HG38 Coordinates - The original Illumina-provided CpG coordinates have been “lifted over” from hg19 to hg38.

Genotype Tissue Expression (GTEx) Project Data

The GTEx_v7 data set contains tables with molecular and clinical data (gene read, gene expression, sample attributes, subject phenotype) loaded from the Genotype-Tissue Expression (GTEx) Project Data Portal on November 2017. See the GTEx Portal for more information.

University of California Santa Cruz (UCSC) TOIL RNA-seq recompute project Data

The Toil_recompute data set contains data made available by the UCSC TOIL RNA-seq recompute project. The goal of the project was to process ~20,000 RNA-seq samples to create a consistent meta-analysis of four datasets free of computational batch effects. This is best used to compare TCGA cohorts to TARGET or GTEx cohorts. For more details, see the Zena Browser Data Pages.

Other Reference Data Sources

Google Cloud Life Sciences maintains a list of publicly available data sets, including Reference Genomes, the Illumina Platinum Genomes, information about the Tute Genomics Annotation table, etc.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

File Metadata Data Sets

ISB-CGC hosts metadata tables in BigQuery with information that points to the raw and processed cancer data in the NCI GDC Google Cloud Storage buckets.

Case and File Metadata

The ISB-CGC hosts several metadata tables in Google BigQuery to help users find GDC files in Google Cloud Storage (GCS) or PDC files in Amazon Web Services (AWS) cloud storage. Preview and query these tables from the BigQuery web UI or scripting languages such as R and Python, or the command-line using the cloud SDK utility bq.

For additional details about each of these tables, please use the BigQuery Table Search. To find the metadata tables, select File Metadata under Category.

Below, the ‘#’ represents the GDC release number and should be replaced by it when using the tables, for example: isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r28. The metadata is split up into several tables per GDC release as follows in the isb-cgc-bq project. (Older metadata is in the isb-cgc project and follows a slightly different table naming format.)

Table

Description

caseData_r#

List of all of the cases in GDC

fileData_active_r#

List of the currently active cases in GDC along with information related to those cases

fileData_legacy_r#

Same as the previous table but with legacy data instead

aliquot2caseIDmap_r#

“helper” table to map between identifiers at different levels of aliquot data. The intrinsic hierarchy is program > project > case > sample > portion > analyte > aliquot

slide2caseIDmap_r#

“helper” table to map between identifiers at different levels of tissue slide data. The intrinsic hierarchy is program > project > case > sample > portion > slide

GDCfileID_to_GCSurl_r#

Gives the Google Cloud Storage location for each file

per_sample_file_metadata_hg19_gdc_r# or per_sample_file_metadata_hg38_gdc_r#

Provides file ids and other metadata for samples. Information is stored in these tables by program and these tables are in the respective program data set.

PDC metadata file and case metadata are stored in data sets isb-cgc-bq.PDC_metadata_versioned and isb-cgc-bq.PDC_metadata.

Table

Description

file_associated_entity_mapping_V#

List of PDC entitites mapped to cases and file IDs

file_metadata_V#

Gives the AWS location for each file, study information, as well as an embargo date if it applies

For examples of querying the metadata tables, please see the ISB-CGC Community Notebook GitHub Repository.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC BigQuery Tables

Google BigQuery (BQ) is a massively-parallel analytics engine ideal for working with tabular data. Leveraging the power of BigQuery, we have made the information scattered over tens of thousands of XML and tabular data files in legacy and active archives at the NCI GDC and PDC much more accessible in the form of open-access BigQuery tables.

We have made the ability to explore and learn more about the ISB-CGC hosted BigQuery tables easy via an interactive BigQuery Table Search UI (https://isb-cgc.appspot.com/bq_meta_search/). Users can find tables of interest based on program, category, reference genome build, data type and free-form text search.

Using SQL in the Google BigQuery Console, in Juypter notebooks or in R, users with Google Cloud Platform (GCP) projects can analyze patient, biospecimen, and molecular data for many cancer programs such as TCGA, TARGET, CCLE, GTEx from ISB-CGC’s BigQuery tables.

Note that dbGaP authorization is not required to access most tables.

Additional Support

For more information about Google BigQuery, see the following Google support pages:

BigQuery on Google Cloud Platform

In order to use BigQuery, you must have access to a Google Cloud Platform (GCP) project. Your GCP project must be associated with a billing account in order to gain full access to all of products and services that make up the Google Cloud. Contact us at request-gcp@isb-cgc.org for more information on how to to request cloud credits.

Additionally, you will need a Google account identity (freely available with a new account or by linking to an existing email account).

When first logging into the Google Cloud Platform, you will be presented with this page:

_images/NewSignIntoGCP.png

You will be presented with the sign in page, prompting you to enter a Google account log in and password:

_images/SignInPage.png

Once you sign in, click on Console at the top of the screen (see arrow in image below) to access a full range of Google cloud products and services including BigQuery.

_images/AfterSignInPage.png

At the home button, scroll down to open BigQuery.

_images/AccessingBigQuery.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC BigQuery Projects

ISB-CGC has two open-access Google BigQuery projects. To quickly access the ISB-CGC tables from your project on the Google BigQuery Console, you’ll need to link to these projects. This process, known as “pinning a project”, is described here.

  • isb-cgc - This project has been in use since ISB-CGC’s inception.

  • isb-cgc-bq - This is a new project as of July 2020. It will hold all new ISB-CGC tables, and many of the tables in the isb-cgc project will be migrated here over time.

_images/ISBCGC-BQ-projects.png

isb-cgc project

The isb-cgc project contains all of the ISB-CGC BigQuery tables created before July 2020.

Tables in isb-cgc will be retired and labeled as deprecated as we copy them over to the new project. Table descriptions will include the new table location. Eventually they will be turned into only views (with no preview ability) to ensure that existing references will continue to work correctly. Many older tables with light usage may remain in isb-cgc and not be copied over; tables with no logged recent usage may be deleted. When using the BigQuery Table Search UI to find these retired tables, select Status of Deprecated.

Many tables will continue to have the status of Current, at least for the time being, until they are copied to the new project. In addition, there are tables with the status of Archived in the isb-cgc project and more may become archived. Archived indicates that the table contains an older version of data; a newer version of the same data exists in another table.

isb-cgc-bq project

The isb-cgc-bq project contains all new ISB-CGC BigQuery tables created after July 1, 2020 as well as tables that have been migrated from project isb-cgc. It features a more intuitive data set and table organization, as well as consistent table naming both within and across cancer research programs.

This new project is a work in progress. The migration of existing tables from the isb-cgc project will be occurring over time, and will not be all at once. See the Migration to Project isb-cgc-bq Release Notes to find out which tables have been migrated as of this time.

All new tables will be created in this project.

isb-cgc-bq Data Set and Table Organization

Each Program has two data sets, one containing the most current data that ISB-CGC has, and one containing versioned tables, which serves as an archive of previously released tables.

As new data releases occur, the data in the “_current” tables will be replaced with this new data. If you want the most up-to-date data, use these tables in your queries. However, if you want to ensure that your queries create a reproducible result, use a table from the “_versioned” data set. The most current data is also in this data set; however, the name of the table will end with the release number or year and not “current”.

See below for more details.

Data Set Name

Data Set Contents

Table Name Format

Table Status

<Program>

Latest tables for each data type (ex. miRNA Expression, File Metadata) that ISB-CGC has, per Program

Data Type, Reference Genome, Source, Current. Ex. TARGET.miRNAseq_hg38_gdc_current

When using the BigQuery Table Search UI to find these tables, select Status of Current.

<Program>_versioned

Previously released tables, as well as the most current table

Data Type, Reference Genome, Source, Release Number or Year. Ex. TARGET_versioned.miRNAseq_hg38_gdc_r22. Here, the name of the most current table will end with the release number or year and not “current”.

Previously released tables have status of Archived. The most current table has the status of Current.

See below for a snapshot of the isb-cgc-bq data set and table organization in the Google BigQuery Console.

_images/ISBCGC-BQ-tables.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Linking to ISB-CGC BigQuery tables

Follow the images below to link the ISB-CGC BigQuery tables in projects isb-cgc and isb-cgc-bq to your Google Cloud Project. Click on image to zoom in.

When you access BigQuery from your Google Cloud Platform Console, you will see an “Add Data” box with a “Pin a Project option”.

_images/AddDataBox.png

When you click on “Pin a Project”, you will be presented with a pop-up box that allows you to either enter a project name or to select one from a list. Choose the “Enter a Project Name” and enter in “isb-cgc” and then click “Pin”.

Note

If you are have Editor Tabs enabled, the “Pin a Project” options are a little different. When you click on “Add Data”, select “Pin a Project” and then “Enter Project Name” from the menu. Then enter the project name in the pop-up box and click “Pin”.

_images/PinAProject.png

You will now see the isb-cgc open access BigQuery tables on the left-hand side pinned to your project. Repeat these steps for “isb-cgc-bq”.

Note

If the data sets and tables within the project don’t display immediately, refresh your screen until they do. They may take a couple of minutes to appear.

_images/PinnedProject.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

BigQuery SQL Examples

_images/BigQuery-menuItem.png

You can write SQL queries to retrieve data from ISB-CGC BigQuery tables directly in the Google BigQuery console. To get to the console from within the Google Cloud Platform, click the Navigation menu in the upper left-hand corner. Expand PRODUCTS and find BigQuery in the BIG DATA section. (If you pin BigQuery, BigQuery will also display in the upper part of the navigation menu, making it easier to find next time.)

These instructions from Google will tell you more about using the BigQuery console.

Query versus Preview

Here is a simple query which retrieves all columns in a table.

SELECT *
FROM `isb-cgc-bq.TCGA_versioned.clinical_gdc_r24`
LIMIT 1000

You can use the “Preview” feature in the BigQuery web UI, at no cost, instead of doing a SELECT * which will do a full table scan! See the picture below.

_images/BQ-console-tablePreview.png

Simple Query Examples

Let’s start with a few simple examples to get some practice using BigQuery. You can copy and paste any of the SQL queries on this page into the BigQuery web console at https://console.cloud.google.com/bigquery.

1. How many mutations have been observed in KRAS?

SELECT COUNT(DISTINCT(sample_barcode_tumor)) AS numSamples
FROM `isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r10`
WHERE Hugo_Symbol="KRAS"

The screenshot below shows the query in the “Query Editor” box, and the results down below. Just click on the “RUN QUERY” button to run the query. Notice the green checkmark indicating that the SQL query syntax looks good.

_images/SimpleSQLExample1.png

2. What other information is available about these KRAS mutant tumors?

In addition to answering the question above, this next query also illustrates usage of the WITH construct to create an intermediate table on the fly, and then use it in a follow-up SELECT:

WITH temp1 AS (
   SELECT
     project_short_name,
     sample_barcode_tumor,
     Hugo_Symbol,
     Variant_Classification,
     Variant_Type,
     SIFT,
     PolyPhen
   FROM  `isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r10`
   WHERE Hugo_Symbol="KRAS"
   GROUP BY
     project_short_name,
     sample_barcode_tumor,
     Hugo_Symbol,
     Variant_Classification,
     Variant_Type,
     SIFT,
     PolyPhen )
SELECT
   COUNT(*) AS num,
   Hugo_Symbol,
   Variant_Classification,
   Variant_Type,
   SIFT,
   PolyPhen
FROM temp1
GROUP BY
   Hugo_Symbol,
   Variant_Classification,
   Variant_Type,
   SIFT,
   PolyPhen
ORDER BY num DESC
_images/SimpleSQLExample2.png

3. What are the most frequently observed mutations and how often do they occur?

WITH temp1 AS (
   SELECT
     sample_barcode_tumor,
     Hugo_Symbol,
     Variant_Classification,
     Variant_Type,
     SIFT,
     PolyPhen
   FROM `isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r10`
   GROUP BY
     sample_barcode_tumor,
     Hugo_Symbol,
     Variant_Classification,
     Variant_Type,
     SIFT,
     PolyPhen)
SELECT
  COUNT(*) AS num,
  Hugo_Symbol,
  Variant_Classification,
  Variant_Type,
  SIFT,
  PolyPhen
FROM temp1
GROUP BY
  Hugo_Symbol,
  Variant_Classification,
  Variant_Type,
  SIFT,
  PolyPhen
ORDER BY num DESC
_images/SQLSimpleExample3.png

Querying from more than one table (Joining)

Q: For bladder cancer patients who have mutations in the CDKN2A (cyclin-dependent kinase inhibitor 2A) gene, what types of mutations are they, what is their gender, vital status, and days to death - and for three downstream genes (MDM2 (MDM2 proto-oncogene), TP53 (tumor protein p53), CDKN1A (cyclin-dependent kinase inhibitor 1A)), what are the gene expression levels for each patient?

This question was chosen as an interesting example because the p53/Rb pathway is commonly involved in bladder cancer (see TCGA Network paper “Comprehensive Molecular Characterization of Urothelial Bladder Carcinoma”, Figure 4).

This is a complex question that requires information from four tables. We will build up this complex query in three steps.

Step 1

Find the patients with bladder cancer who have mutations in the CDKN2A gene, and display the patient ID and the type of mutation.

SELECT
  mutation.case_barcode,
  mutation.Variant_Type
FROM
  `isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_DCC_2017_02` AS mutation
WHERE
  mutation.Hugo_Symbol = 'CDKN2A'
  AND project_short_name = 'TCGA-BLCA'
GROUP BY
  mutation.case_barcode,
  mutation.Variant_Type
ORDER BY
  mutation.case_barcode
_images/BigQueryExample1.png

We now have the list of patients who have a mutation in the CDKN2A gene and the type of mutation.

Notice that we have named the “isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_DCC_2017_02” table “mutation” using the AS statement. This is useful for easier reading and composing of complex queries.

Step 2

Bring in the patient data from the ISB-CGC TCGA Clinical table so that we can see each patient’s gender, vital status and days to death.

SELECT
  case_list.case_barcode AS case_barcode,
  case_list.Variant_Type AS Variant_Type,
  clinical.demo__gender,
  clinical.demo__vital_status,
  clinical.demo__days_to_death
FROM
  /* this will get the unique list of cases having the TP53 gene mutation in BRCA cases*/
  ( SELECT
    mutation.case_barcode,
    mutation.Variant_Type
  FROM
    isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_DCC_2017_02 AS mutation
  WHERE
    mutation.Hugo_Symbol = 'CDKN2A'
    AND project_short_name = 'TCGA-BLCA'
  GROUP BY
    mutation.case_barcode,
    mutation.Variant_Type
  ORDER BY
    mutation.case_barcode
    ) AS case_list /* end case_list */
JOIN
  isb-cgc-bq.TCGA.clinical_gdc_current AS clinical
ON
  case_list.case_barcode = clinical.submitter_id
_images/BigQueryExample2.png

We now have combined information from two tables through a join (inner join by default). The same information is stored in the case_barcode field in the mutations table and in the submitter_id in the clinical table, which enables us to join on them.

Step 3

Show the gene expression levels for the four genes of interest, and order them by case id (Case Barcode) and gene name (HGNC_gene_symbol).

SELECT
  genex.case_barcode AS case_barcode,
  genex.sample_barcode AS sample_barcode,
  genex.aliquot_barcode AS aliquot_barcode,
  genex.HGNC_gene_symbol AS HGNC_gene_symbol,
  clinical_info.Variant_Type AS Variant_Type,
  genex.gene_id AS gene_id,
  genex.normalized_count AS normalized_count,
  genex.project_short_name AS project_short_name,
  clinical_info.demo__gender AS gender,
  clinical_info.demo__vital_status AS vital_status,
  clinical_info.demo__days_to_death AS days_to_death
FROM ( /* This will get the clinical information for the cases*/
  SELECT
    case_list.Variant_Type AS Variant_Type,
    case_list.case_barcode AS case_barcode,
    clinical.demo__gender,
    clinical.demo__vital_status,
    clinical.demo__days_to_death
  FROM
    /* this will get the unique list of cases having the CDKN2A gene mutation in bladder cancer BLCA cases*/
    (SELECT
      mutation.case_barcode,
      mutation.Variant_Type
    FROM
      isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_DCC_2017_02 AS mutation
    WHERE
      mutation.Hugo_Symbol = 'CDKN2A'
      AND project_short_name = 'TCGA-BLCA'
    GROUP BY
      mutation.case_barcode,
      mutation.Variant_Type
    ORDER BY
      mutation.case_barcode
      ) AS case_list /* end case_list */
  INNER JOIN
    isb-cgc-bq.TCGA.clinical_gdc_current AS clinical
  ON
    case_list.case_barcode = clinical.submitter_id /* end clinical annotation */ ) AS clinical_info
INNER JOIN
  isb-cgc-bq.TCGA_versioned.RNAseq_hg19_gdc_2017_02 AS genex
ON
  genex.case_barcode = clinical_info.case_barcode
WHERE
  genex.HGNC_gene_symbol IN ('MDM2', 'TP53', 'CDKN1A','CCNE1')
ORDER BY
  case_barcode,
  HGNC_gene_symbol
_images/BigQueryExample3.png

We now have all the data together in one table for further analysis. Note that the final join surrounds the previous join top and bottom. This is a common method of performing table joins.

Saving Query Results

You can download the results from a query in either CSV or JSON format, or save it for further analysis into a Google BigQuery table; see the options under SAVE RESULTS.

_images/SaveResultsButton.png

Running large queries combining multiple tables may be limited by cost and resources. If your query gets too complex it can take too long to run. Saving results as intermediate tables is a solution to these issues and can allow others to view and use them. Creating intermediate result tables can be a good approach to obtain the same result more quickly and at a lower cost.

SQL Functions

Standard SQL includes a large variety of built-in functions and operators including logical and statistical aggregate functions, and mathematical functions, just to name a few. User-defined functions (UDFs) are also supported and can be used to further extend the types of analyses possible in BigQuery. ISB-CGC offers a set of UDFs that implement commonly used statistical tests and methods in cancer research and bioinformatics. Please refer to this page for information on how to use the ISB-CGC UDFs.

Composing Queries Using the bq Command Line Tool

The bq command line tool is part of the cloud SDK and can be used to interact directly with BigQuery from the command line. The cloud SDK is easy to install and is available for most operating systems. It be can used to create and upload your own tables into BigQuery (if you have your own GCP project) as well as run queries at the command-line like this:

 bq query --use_legacy_sql=false \
'SELECT COUNT(DISTINCT(sample_barcode_tumor)) AS numSamples
  FROM `isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r10`
  WHERE Hugo_Symbol="KRAS"'

Using BigQuery from R

There are a number of resources online as well as through ISB-CGC that demonstrate how to access BigQuery from R:

  • BigQuery can be accessed from R using one of two powerful R packages; please refer to the documentation provided with these packages for more information:

  • If you have a GCP, you can use R with BigQuery through the Google AI plaform. Please refer to the Google documentation for more detail.

  • Explore our Community Notebook Repository for examples on how to access BigQuery from R.

Using BigQuery from Python

Getting Help

Aside from the documentation, the best place to look for help using BigQuery and tips and tricks with SQL is StackOverflow. If you tag your question with google-bigquery your question will quickly get the attention of Google BigQuery experts.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

User Defined Functions

BigQuery now supports User Defined Functions (UDFs) in SQL and JavaScript that extend BigQuery for more specialized computations and that can be reused in notebooks and queries. To facilitate the analysis of cancer data, ISB-CGC offers a set of UDFs that implement commonly used statistical tests and methods in cancer research and bioinformatics. The UDFs are located in the isb-cgc-bq.functions data set, and the source code of the functions and examples of how to use them can be found in our Community Notebook GitHub Repository. The following table lists all the functions available in ISB-CGC.

UDF (click for details)

Description

Notebooks

kmeans

K-means method for clustering data

Python

p_fisherexact

p value of the Fisher exact test

mannwhitneyu

Mann–Whitney U test

kruskal_walis

Kruskal Walis test

Python

significance_level_ttest2

Significance level of the two sided T test

Python

complement_chisquare_cdf

One minus the CDF of the Chi Square distribution

jstat_normal_cdf

CDF of the Normal distribution


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Usage Costs

There are two basic types of usages costs associated with BigQuery: query costs and storage costs.

Query Costs

The main costs associated with using BigQuery are the query costs. In BigQuery, queries are billed according to how much data is scanned during the course of the query, and the rate is $5 per TB, although the first 1 TB is free each month. Queries can be more expensive as they become more computationally intensive.

While most of the cost is suprisingly low, it is always important to think carefully about your queries and to make them as efficient as possible. For example, if you want to derive summary information about all ~20,000 genes, you could do that with a single query that might cost a few pennies, or you might write a less-clever query that returns information only about a single gene and then programmatically loop over all genes, running that single-gene query 20,000 times. Your overall query costs using this less-clever approach, instead of being a few pennies would be several hundred dollars! This latter approach would also take significantly more time.

Storage Costs

You may want to upload your own data to BigQuery or to store results of your queries as new BigQuery tables. In BigQuery, storage costs are based on the amount of data stored. For example, ISB-CGC is hosting PanCancer Atlas tables in BigQuery and is paying for the storage costs (with support from NCI). The size of each PanCancer Atlas table is less than 1.5 GB and therefore costs less than $0.25 per year to store.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Web Interface (Web App)

The ISB-CGC Web Interface (Web App) provides robust functionality for the user to analyze the ISB-CGC cancer data through a user interface. Without needing to use any programming, you can select and filter data from one or more public data sets (such as TCGA, CCLE, TARGET and BEATAML1.0), combine with your own uploaded data and analyze using a variety of built-in plot types. There is also a built-in Integrative Genomics Viewer (IGV) and Radiology Viewer.

Over time we will be updating and enhancing this web interface based on your feedback. We welcome your ideas and needs. Please use this link to provide them.

Login to Web App

The ISB-CGC Web App is accessed through a Google Account identity (freely available with a new account or by linking to an existing email account). If you are not logged into the ISB-CGC Web App, you will be presented with this page:

_images/startscreen-nologin.png

You login through the “Sign In” link in the upper right.

Also on this page are links to:

  • ISB-CGC BigQuery Table Search

  • Cancer Data File Browser

  • Chromosomal Aberations & Gene Fusions (Mitelman) database

  • The TP53 Database

  • Cohort Builder/Data Explorer

  • Pipelines and APIs

  • Notebooks

  • Controlled Access Data

  • Documentation

  • ISB-CGC Publications and Citations

If your screen looks like this:

You have successfully logged into ISB-CGC Web App! Please subscribe for updates provided by ISB-CGC.

_images/IfYourScreenLooksLikeThis.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Dashboard

Upon signing in with a Google account identity, you will be presented with the following page:

_images/My-Dashboard.png

This is your personal “Dashboard” where your Analyses, Gene and miRNA Lists, Variable Lists, Cohorts, and Saved Programs are readily accessible. Descriptions of how to use each component of this user interface are provided in the individual subsections of this user guide.

Multiple Sample Analyses can be grouped into Workbooks (and saved for later use, editing, and sharing). Workbooks are used to group together multiple related analyses, and can be used for sharing groups of analysis results with specific groups of people. For example, you may use one Workbook for an on-going study of gene mutations and pathways involved in Head and Neck Cancer (with one research group you are part of), and use a different Workbook for another on-going study with a different set of collaborators in which you are investigating survival-time after diagnosis for patients with different types of lung cancers. Think of workbooks as containers in which you can create and group related analyses, and which you can share with specific colleagues.

Breadcrumbs show you where you are in the Web App as you move from one section to another (figure below). These are live links, and can be used to rapidly navigate from one section of the interface to another.

_images/Breadcrumbs.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Workbooks

Workbooks store the analyses you create, and their related data. You can create multiple analyses in a workbook and store them on separate worksheets within the workbook. The worksheets you create to conduct analysis are based on the source data selected (i.e. Genes and miRNAs, Variables and Cohorts). Workbooks can be used to:

  • Group together multiple related analyses.

  • Share analysis results with specific groups of people.

For example, you can create a Workbook (i.e., Disease A) which consists of identifying gene mutations and pathways involved in Head and Neck Cancer (and share it with research Group A). And then you could create another Workbook (i.e., Disease B) with a different group of researchers (Group B) investigating the average time after diagnosis of death for different lung cancers. Think of workbooks as virtual “excel spreadsheets”. Various related analyses can be created in individual worksheets (“Tabs” within the spreadsheet) and grouped together in one workbook (the overall spreadsheet).

Create a New Workbook

On Your Dashboard, there is a Saved Workbooks panel. This panel displays any previously created, saved workbooks. If you do not have any saved workbooks you will see “Workbooks store the analyses you create, and their related data.” text in the panel. To create a new workbook, click on the Create a New Workbook link.

Selecting Create a New Workbook from the WORKBOOKS menu dropdown also displays a screen where you can create a new workbook.

_images/CreateWorkbook.png

Follow these steps to create a workbook:

  1. From the Workbook creation panel, select an Analysis Type (i.e., Bar Chart, Histogram, Scatter Plot, Violin Plot, Cubby Hole Plot, SeqPeek, OncoPrint or OncoGrid).

    Analysis Type Description

    • Bar Chart - This chart is used to plot a single categorical feature for one or more cohorts. It generates vertical lines to represent the type of data being used. The X axis shows categorical information being used while the other axis (Y axis) displays categorical data chosen in the edit analysis settings.

    • Histogram - This chart is used to plot a single numerical feature for one or more cohorts. It generates vertical lines to represent the type of data being used. The X axis shows numerical information being used while the other axis (Y axis) displays numerical data chosen in the edit analysis settings.

    • Scatter Plot - This chart is used to plot two numerical features (X & Y axis) for one or more cohorts. Can also color code points by a single categorical feature.

    • Violin Plot - This chart is used to plot a categorical feature on the X axis versus a numerical feature on the Y axis. Points in the plot can be colored by another categorical feature.

    • Cubby Hole Plot - This chart is used to plot two categorical features. Boxes are colored by their related p-values.

    • SeqPeek - This visualization shows where somatic mutations have been observed on a linear representation of a specific protein. Each horizontal strip represents the protein, with data from different tumor types (aka cohorts or studies) shown stacked one on top of the other.

    • OncoPrint - This chart is used to plot multiple genomic alterations (somatic mutation) events across a set of samples using color-coded glyphs. OncoPrint is developed and provided by cBioPortal.

    • OncoGrid - This chart is used to visualize the top mutated genes across programs/projects and the number of cases affected. You can also view the mutation frequency, clinical data, data format types, number of gene sets and the number of cases affected.

    Notes:

    • A user has the option to make the axis logarithmic if the plot can display continuous numerical data, e.g. mRNA expression levels.

    • For Violin Plot and Scatter Plot you can select multiple cohorts as your Color By Feature. This will cause the Legend to list all the cohorts that the sample is associated to. Please be aware you’ll end up with lots of permutations if you have lots of samples that belong to many different cohorts.

    • For OncoPrint, OncoGrid, and SeqPeek analyses, a default gene list is provided. Genes with consensus score of 6 or higher are added to the default gene list. (Ref: Bailey et al., Cell. 2018 Apr 5;173(2):371-385.e18. doi: 10.1016/j.cell.2018.02.06 )

  2. You will then select Genes and miRNAs or Variables (or, optionally both).

    Genes and miRNAs

    Selecting this link (or the ‘+’ adjacent to it) displays the Data Source | Gene & miRNA Favorites screen showing previously created “Gene and miRNA Favorites”. Click the Apply to Worksheet to apply or click Apply New Gene & miRNA List to create a new list and apply. Any Gene and miRNA List you create here will automatically be added to your Gene and miRNA Favorites list and can be selected for additional analysis later. (See Gene and miRNA Favorites for details.)

    Variables

    Selecting this link (or the ‘+’ adjacent to it) displays the Data Source | Variable Favorites screen showing previously created “Variables Favorites”. Click the Apply to Worksheet to apply or click Apply New Variable List to create a new list and apply. Any Variable Favorites you create here will automatically be added to your Variable Favorites and can be selected for additional analysis later. (See Variable Favorites for details.)

  3. Select your Cohort - Cohorts allow the user to create custom groupings of the samples and/or cases that can be used for further analysis.

    Selecting this link (or the ‘+’ adjacent to it) displays the Data Source | Cohorts screen showing previously created “Cohorts”. Click the Apply to Worksheet to apply or click Filter or Barcodes button to create a new cohort and apply. Any Cohorts you create here will automatically be added to your Cohorts list and can be selected for additional analysis later.

    The user can also add multiple Cohorts to the worksheet if desired. More information about Cohorts can be found here.

  4. Select Edit Plot Settings - This will display the Plot Settings panel displaying the applicable X & Y axis settings (i.e. Categorical or Numerical based on the analysis type selected). Depending on the analysis type selected (e.g., Bar chart, Histogram, Scatter Plot, Violin Plot, Cubby Hole Plot, SeqPeek, OncoPrint or OncoGrid) additional specifications may appear for selection.

  5. Select Toggle Sample Selection - After a plot has been displayed, using the Toggle Sample Selection button allows you to create a smaller cohort from within the plot itself.

  6. Select Redraw - After a plot has been displayed, using the Redraw button will reset the analysis to its original setting after being zoomed-in or moved.

  7. Select Download - After a plot has been displayed, using the Download button will allow you to download the analysis as a SVG, PNG, or a JSON file.

  8. Select Toggle Full Screen - After a plot has been displayed, using this button will display the plot to the full screen.

Note: If you wish to use your own data in graphing, please review the documentations on how to upload your own data and on how to graph your own data. Using your own data uses a slightly different approach than is described here.

Saved Workbooks

Selecting Saved Workbooks from the WORKBOOKS menu dropdown displays a screen which lists all of your saved workbooks, and information about the workbooks, including Version and Build, Name, number of Worksheets, Ownership and Last Updated.

To the left of each Workbook, dropdown options allow you to Edit, Duplicate or Delete the Workbook.

  • Edit - Selecting Edit displays a popup screen which allows you to update the Workbook name, build and description.

  • Duplicate - Selecting Duplicate enables you to make a copy of the worksheet. Note that this will create a copy of the worksheet and reference the cohorts, variables, and gene lists used in the workbook, but will not make duplicates of the cohort, variables, and gene lists used in the workbook.

  • Delete - This option will delete the workbook.

Clicking on the workbook Name will display the Workbook Details screen.

Workbook Details Screen

On the top of the Workbook Details Screen are the Edit Details, Duplicate and Delete buttons. They perform the same functions as described for the workbook dropdown menu options on the Saved Workbooks screen, described above.

_images/WorkbookDetails.png
Share a Workbook

Clicking the Share button allows you to share the workbook in the Web App with users you select by entering the user’s email.

The User will receive an email message with a link to your shared workbook explaining that you want to share a workbook with them and that you have invited them to join. If the email address you entered is not registered with ISB-CGC, a message displays, “The following user emails could not be found; please ask them to log into the site first:(email entered).”

Manipulation of Workbooks and Worksheets

Creating A Worksheet - By selecting the “+” next to an existing worksheet, a user can create a new worksheet to create a new analysis. You can give the new worksheet a unique name and provide a worksheet description. This is an ideal way for the user to easily have access to different graphs with the same data in the same workbook.

Worksheet Drop Down Menu - The worksheet will have a drop down menu that allows the user to edit, duplicate or delete the worksheet. Click the downward pointing arrow next to the name of the worksheet that is open.

Edit Details - This item allows the user to edit the name of the worksheet and also give a brief description on the worksheet being used for analysis. You can also change the build from HG19 to HG38 using this feature. Changing the build allows you to graph data from either builds.

Duplicate - This item allows the user to create a duplicate worksheet in the workbook for further analysis and comparison.

Delete - This item will only appear when you are working with multiple worksheets. This will permanently delete the worksheet from the workbook.

Edit Plot Settings - This function allows you to select new Plot Settings for the selected analysis type.

Note: When selecting a gene or miRNA for either the X-axis or Y-axis variable, you will be prompted to select a specification. If you select Gene Expression you have the option of choosing a Select Feature. If you select the Copy Number specification you can choose a Value Filter. If you select the Protein specification you can select a Protein Filter. If you select the Mutation specification you can select a Value Filter. If you select a miRNA expression you can select a Select Feature.

Enable Sample section and Edit Analysis Settings - Enable Sample Selection (shown in the image below) allows you to select samples from displayed analysis and save that selection to a new cohort for further drill down analysis. The Edit Analysis Settings allows you to change the variables you wish to use for your analysis (varies by which analysis you choose). Finally, if you select miRNA you can select specification miRNA Expression and you will be prompted to select a feature.

_images/edit_analysis_finger.PNG
Comment on a Workbook

Any user who owns or has had a workbook shared with them can comment on it. To open comments, use the Comments button at the top right. A right sidebar will appear and any previously comments will be shown.

On the bottom of the comments sidebar, you can create a new comment and save it. It should appear at the bottom of the list of comments.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Programs

The Programs screen displays information about public programs available through the Web App.

Public Programs

Selecting Public Program from the PROGRAMS menu dropdown displays the Programs screen, PUBLIC PROGRAMS tab. This screen displays details about the public programs currently available in the Web App. It displays the number of projects, the ownership and the last date each program was updated.

Clicking the + adjacent to each program will display a list of all projects in the program, and their last updated dates.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Analyses

This feature allows you to create, edit details, duplicate, delete, or share analyses. You can customize new workbooks with selected data (Genes and miRNAs, Variables, Cohorts) and the following plot types:

  • Bar Chart

  • Histogram

  • Scatter Plot

  • Violin Plot

  • Cubby Hole Plot

  • SeqPeek

  • OncoPrint

  • OncoGrid

Start New Workbook With…<Plot Type>

Selecting Start New Workbook With and one of the above plot types from the Analyses menu dropdown displays a screen which enables you to create, edit details, duplicate, delete, or share analyses.

This is the same screen that is displayed when you choose to create a workbook using the Create a New Workbook link from Your Dashboard or the WORKBOOK menu, except that the Analysis Type field is prepopulated with your selected plot type.

_images/Analyses-Dropdown.png

Browse All Analyses

Selecting Browse All Analyses from the Analyses menu dropdown displays a screen which provides a visual example and a written description of each type of plot. This information can help you determine which type of plot would be useful in your analysis.

From here, click on the Start a New Workbook <Plot Type> link to go to a screen which enables you to build your analysis.

_images/Analyses-Descriptions1.png _images/Analyses-Descriptions2.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Genes and miRNAs

This feature allows you to create and manage Gene and miRNA lists for inclusion in workbooks and use in subsequent analyses.

Create a Gene & miRNA Favorite

Selecting Create Gene & miRNA Favorites from the GENES & miRNAs menu dropdown displays the Create Gene & miRNA Favorites screen.

To create a new Gene & miRNA Favorite:

  • Name your new favorite; you can create many favorites and use them later when working with workbooks.

  • Specify the Gene(s) and/or miRNA(s) to include in this list. You can do this by:

    • Uploading a pre-existing list using the Upload Gene and miRNA List link

    • Entering Genes and miRNAs by typing them into the input box (with auto-completion support).

      • To aid in Gene selection, you can access the HGNC portal (Hugo Gene Nomenclature Committee) via the View Gene Identifiers link.

      • To aid in miRNA selection, you can access the miRBase via the View miRNA Identifiers link.

      • If duplicate symbols are entered they will be marked for your deletion or automatically dropped when the list is saved. If an unrecognized item is entered it will also be flagged for your attention.

  • Click Save As Favorite.

_images/Gene_Favorite.png

Manage Gene & miRNA Favorites

Selecting Manage Gene & miRNA Favorites from the GENES & miRNAs menu dropdown displays the Saved Gene & miRNA Favorites screen. This screen displays your saved Gene & miRNA Favorites and allows you to edit or delete them, as well as start a new workbook using your favorite.

Clicking on the Create New Favorite button will take you to the Create Gene & miRNA Favorite screen.

Select Genes & miRNAs for a New Workbook

Selecting Select Genes & miRNAs for a New Workbook from the GENES & miRNAs menu dropdown displays the Data Source | Gene & miRNA Favorites screen. This screen displays your saved Gene & miRNA Favorites and allows you to apply them to a new workbook.

  • Check the box adjacent to your favorite and click the Apply to New Worksheet button to create a new workbook using your Gene & miRNA Favorite.

  • Click the Apply New Gene & miRNA List button to create a new favorite. This takes you to the Create Gene & miRNA Favorite screen.

Resources for understanding and working with miRNAs and gene identifiers:


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Variables

Creating a variable favorites list is a way of creating custom groupings of the samples and/or cases that you are interested in analyzing further. For example, you can create a variable favorites list that spans across multiple projects, only contains samples for which certain types of data are available, or focuses on specific phenotypic characteristics. A Variable Favorites list can be included in a workbook.

Create Variable Favorites List

To create a variable list from Your Dashboard, click on the Create Variable Favorites link which will display the Create Variable Favorite screen.

Or, from the menu dropdown, select Create Variables Favorite List from the VARIABLES menu dropdown.

To create a new Variable Favorite:

  • Name your new favorite; you can create many favorites and use them later when working with workbooks.

  • Select attributes and features for your variable list by performing one or more of these actions:

    • Select a data set (program) from the Select Data Set drop down list. This will display features for that data set under the COMMON and CLINICAL SEARCH tabs.

    • Common Filter Selection - Filters (attributes and features) that are fairly common across programs are displayed under the COMMON tab. Changing the data set will change the list of available filters.

      • Check the checkbox adjacent to each feature that you are interested in. They will display in the Selected Variables panel.

    • Clinical Filter Feature Search - Click the CLINICAL SEARCH tab. This filter allows the user to search by any clinical feature in the selected data set (program). Changing the data set will change the list of available filters.

      • Enter one or more characters in the Feature Search field. A list of features containing these characters displays. Select a feature from the list and it will display in the Selected Variables panel.

    • Favorites Filter Selection - From the Data Set drop down list, select Favorites. This displays your existing Variable Favorite lists, and their component features. These features can now be selected for a new Variable Favorite list by checking the checkbox adjacent to each feature that you are interested in. They will display in the Selected Variables panel.

    • User Data Filter Search- From the Data Set drop down list, select User Data-User. This option allows you to select from filters that you have uploaded using the upload data functionality. It’s separated by projects within your program; a drop down list will display the associated features.

  • Verify that all your selected filters are displayed in the Selected Variables panel on the right-hand side. Clicking Clear All will remove all selected filters.

  • Click Save As Favorite to save the Variable list.

_images/Create_Variables.png

Manage Variable Favorites List

Selecting Manage Variable Favorites List from the VARIABLES menu dropdown displays the Saved Variable Favorites screen. This screen displays your saved Variables Favorites and allows you to edit or delete them, as well as start a new workbook using your favorite.

  • Editing a Variable Favorites List - Clicking the Edit button displays the Edit Variable Favorite screen, which shows all filters in the selected variable list. Any variables selected will be added to any existing variables in the list. Variables can also be removed from the favorite list. The title of the variable favorite list can be changed. To return to the previous view, you must either save any selected filters, or choose to cancel adding any new filters.

  • Deleting a Variable Favorites List - Clicking the Delete button will delete the variable list.

  • Apply To New Workbook button - Clicking on the Apply to New Workbook button will take you to a screen where you can create a new workbook using your variable list.

Select Variables for a New Workbook

Selecting Variables for a New Workbook from the VARIABLES menu dropdown displays the Data Source | Variables screen. This screen allows you to create a new workbook with the selected variables.

  • Click the Create New Workbook With Selected Variables button to create a new workbook using your selected variables.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Cohorts

Cohorts are a way of creating custom groupings of the samples and/or cases that you are interested in analyzing further. You may frequently reuse a cohort in multiple analyses. Creating a “saved cohort” allows you to do this. If you have any existing saved cohorts, they will display here for you to view, edit and share.

Create a New Cohort

To create a cohort from Your Dashboard, click on the Create a Cohort link in the Saved Cohorts panel at the bottom of the screen and select either “Filter” or “Barcodes” from the dropdown. The Filter link will display the cohort creation page; filters are explained below. The Barcodes link will display a page where you can upload samples/cases barcodes and create a cohort from them.

If you already have saved cohorts, they will be listed in the Saved Cohorts panel. Click on the Saved Cohorts link in that panel and a page with the details of your saved cohorts will display. Alternatively, to go directly to a given cohort, click on its name and the cohort details page of that cohort will display.

You can also navigate to these functions by using the drop down options in the COHORTS item on the menu bar.

_images/CreateCohort.png
Cohort Creation - Filters

Using the list of data sets and filters on the left, you can select the attributes and features that interest you from ISB-CGC data or user data. You can create a cohort containing multiple programs.

Select Data Set

This panel in the top left of the screen allows you to pick the programs and user data sets that you want included in the cohort.

The drop down list will display the ISB-CGC data sets that the Web App is currently supporting, as well as your user data. By default, the list is sorted by Node (Genomics Data Commons, Proteomics Data Commons, User) with programs listed below each node header. The sort order can be changed by selecting Program next to Sort By.

_images/SelectDataSet.png
Select Filters

When an ISB-CGC hosted data set is selected, appropriate filters will display under three tabs. All tabs are not available for all programs, but all programs will have some features available on the CASE tab.

  • The CASE tab displays clinical and demographic features applicable to the selected program.

  • The DATA tab displays data types (ex. Aligned Reads, Copy Number Segment Masked) applicable to the selected program.

  • The MOLECULAR tab displays filters pertaining to mutations.

For USER DATA, there is one tab called “PROJECTS & STUDIES” which allow you to filter by the projects or studies you have uploaded to the system.

Click on a filter name to see the selection values. For example, when you click on “Vital Status”, it expands and provides a list containing “Alive”, “Dead”, and “NA” as values you may choose.

Selected filters will display in the Cohort Filters panel. The Data Set Details panel will update the Total Number of Cases and the Total Number of Samples based on the selected filters.

Individual selections within a filter group are “ORed” together, meaning if any of the conditions are met, they will be in the results. On the other hand, filters are “ANDed” together, meaning that data must meet all filter criteria in order to be selected. There may be times where you have no cases and samples in the results, based on the combination of filters you have chosen.

  • If you use AND and do not see the data you are expecting in the filter, try OR instead. AND is a more restrictive criteria requiring all filters to be met; OR is less restrictive, requiring only one criteria to be met for the data to display.

  • You may want to consider adding the term “AND” or “OR” in your saved cohort title since the type of combination used in your cohort does not display in the filters list for a saved cohort.

Note: Hovering over the Disease Code name will display the disease code long name if it’s part of the TCGA, CCLE, or TARGET data set.

Molecular Tab

The Molecular Tab is only available for TCGA data. It enables the user to filter by Gene Mutation Status, creating a cohort based on the presence of a mutation (of various types) in a gene or genes.

To combine multiple gene filters, select AND (requires all filters to be met for the data to be filtered) or OR (at least one criteria needs to be met for the data to be displayed). You can also filter by Genomic Build.

Programs & Projects Tab

The Programs & Projects Tab is only available for User Data. It displays the programs and projects that are part of the user data set.

Cohort Filters Panel

This panel displays the selected filters for the cohort. Filters are listed under the program name. If you click on the program name, the screeb will change to display the information for that program.

Selecting an X beside a single filter will remove that filter. Selecting Clear All in the top right of the panel will remove all the filters. Note that you cannot removed filters once the cohort has been saved. (See Set Operations below for more ways to add or remove filters from your cohorts.)

Data Set Details Panel

This panel shows the Total Number of Samples and Total Number of Cases for the currently displayed data set based on the selected filters. If there is a small “timer” icon, the calculation is taking place; the results should appear soon.

Data Set Clinical Features Panel

This panel shows a list of images (called “treemaps”) that give a high level breakdown of the selected samples for a handful of features (ex. Disease Code, Vital Status, Gender, Sample Type, Age at Diagnosis, etc.) for the currently displayed data set based on the selected filters.

By using the Show More button, you can see additional tree maps. Mousing over an image shows the details of each specific section of the image and the number of samples associated with it.

Programs & Projects Panel

This panel displays a list of images (called “treemaps”) similar to the Data Set Clinical Features panel, but is only available when the User Data tab is selected. This panel displays a high level breakdown of the projects and studies you have uploaded to the system. Hovering over the image will show details of that specific section of the image and the number of samples associated with it.

Saving the Cohort

Click the Save as New Cohort button when you are ready to save the cohort based on the filters you have set. You will be asked for a cohort name and the selected filters will be displayed. Enter the name and click the Create Cohort button.

NOTE: When working with multiple programs you will see a yellow notification box stating, “Your cohort contains samples from multiple programs. Please note that filters will only apply to samples from the program indicated by the tab they were chosen on - they will not apply to samples from other programs in this cohort.”

Cohort Creation - Barcodes

This feature allows you upload or enter your own list of sample or cases barcodes from multiple programs. There is a blue Show Instructions button on both the UPLOAD and ENTER tabs.

Upload Tab

This feature allows uploading files with barcodes to create a cohort. Files must be in GDC Data Portal case manifest format, or in comma/tab-delimited case/sample/program format. The file can be a maximum of 32MB. Also, files must be in tab- or comma-delimited format (TSV or CSV) and have an extension of .txt, .csv, or .tsv. After selecting the file and uploading it, the entries will be validated. Any entries which are found to be invalid will be listed, and you can choose to omit them and continue with cohort creation, or select a new file for verification and upload.

GDC Data Portal Case Manifest Files

GDC Data Portal case manifests can be obtained on the ‘Cases’ tab of the Exploration section of the data portal here. JSON case manifests must have a .json extension, and will be validated against the GDC’s JSON schema. The minimum required properties for each entry in the JSON file are the project object and the submitter_id field. The project object must include the project_id property. All other properties will be ignored.

TSV case manifests must have a .tsv extension, and must contain the first three columns of the GDC TSV case manifest in the following order: Case UUID, Case ID, Project. Any other columns will be ignored. Do not remove the header row of the TSV case manifest.

Because the GDC Data Portal case manifest entries are cases, all samples from a case will be included in the cohort.

Below are the instructions which display when the Show Instructions button is clicked.

_images/CreateCohorts-Barcodes-Upload-Instructions1.png _images/CreateCohorts-Barcodes-Upload-Instructions2.png
Enter Tab

This feature will allow you to manually input barcodes for cohort creation. There is a maximum length of 10000 characters for the text box. Please use the file upload option if you need to upload more barcodes than will fit in that space.

Below are the instructions which display when the Show Instructions button is clicked.

_images/CreateCohorts-Barcodes-Enter-Instructions.png

Manage Saved Cohorts

Selecting Manage Saved Cohorts from the COHORT menu dropdown displays the Cohorts screen, SAVED COHORTS tab. This screen displays your saved cohorts and allows you to view, edit, delete, set operations, and share them. In addition, you can start a new workbook using selected cohorts.

To view a cohort, click on the name of the cohort to display the cohort details. Alternately, you can view the cohort details by clicking on its name in the “Saved Cohorts” panel on the “Your Dashboard” page.

From Cohorts screen, SAVED COHORTS tab, you can perform the following functions. Except for Set Operations, these functions are described in detail in the Cohort Details Screen section, as they are also available there.

  • New Workbook

  • Delete

  • Set Operations

  • Share

Set Operations

Clicking the Set Operations button displays a New Cohort screen where you can create new cohorts from two or more existing cohorts using the union, intersection or complement operations. The Set Operations button will only be available if at least two cohorts are selected on the Cohorts screen.

On the New Cohort screen, enter a name for the new cohort and select a set operation. The intersect and union operations can take any number of cohorts and in any order. The complement operation requires that there is a base cohort, from which the other cohorts will be subtracted. Click Okay to complete the set operation and create the new cohort.

Note: To combine the user uploaded data and the ISB-CGC data, use the Set Operations function. This is possible because the list of barcodes is what is used to create the set operation. For example, to make a cohort of user data samples and ISB-CGC curated samples, Set Union must be used, and to filter user data which is an extension of TCGA or TARGET samples, Set Intersection must be used.

The figure below shows what the results of the set operations will be (represented by I for Intersect, U for Union, and C for Complement). There are two types of sets shown, those that overlap (on the left) and those that are nested (on the right). For the last row (complement operations), the “Subtracted” area is removed from the “Base” area to result in the Complement (C).

_images/SetOperations.PNG

Cohort Details Screen

The cohort details screen displays the details of a specific cohort. The title of the cohort is displayed at the top of the page.

_images/CreateDetails.png

The screen is divided into the following sections:

Current Filters Panel

This panel displays current filters on this cohort or any of its ancestors. Saved filters cannot be removed, but new ones can be added using Edit.

Cohort Details Panel

This panel displays the Internal ISB-CGC Cohort ID (the identifier you use to access this cohort through the APIs), and the number of samples and cases in this cohort. The number of samples may be larger than the number of cases because some cases may have provided multiple samples. This panel also displays “Your Permissions” which can be either Owner or Reader, as well as Revision History. If you have edited the cohort, the filters that were used to originally create the cohort are displayed under the “Creation Filters” header. The newly applied filters (after original creation) are displayed under the “Applied Filters” header.

Select Data Set

This panel displays all the programs and user data sets that are included in the cohort; click on the drop down to see them.

By default, the list is sorted by Node (Genomics Data Commons, Proteomics Data Commons, User) with programs listed below each node header. The sort order can be changed by selecting Program next to Sort By. To see details about a program or data set, select it from the drop down list.

Data Set Details Panel

This panel shows the Total Number of Samples and Total Number of Cases for the currently displayed data set (selected from the Data Set drop down) based on the selected filters.

Data Set Clinical Features Panel

This panel shows a list of images (called “treemaps”) that give a high level breakdown of the selected samples for a handful of features (ex. Disease Code, Vital Status, Gender, Sample Type, Age at Diagnosis, etc.) for the selected program.

By using the “Show More” button, you can see additional tree maps. Mousing over an image shows the details of each specific section of the image and the number of samples associated with it.

Cohort Details Screen functions:

Create a New Workbook

Clicking the New Workbook button brings you to a screen where you can create a new workbook using this cohort.

Edit a cohort

Clicking the Edit button displays the Filters panel. Any filters selected will be added to existing filters. To return to the previous view, save any newly selected filters using the Save Changes button, or cancel adding any new filters by clicking the Cancel link.

Comment on a cohort

Clicking the Comments button displays the Comments panel. Here anyone who can see this cohort (such as an owner or someone who has shared access to the cohort) can comment on it. Comments are shared with anyone who can view this cohort. They are ordered by newest on the bottom. Selecting the “X” on the Comments panel will close the panel.

Copy a cohort

To create a copy of the cohort, click on the Duplicate button. This will take you to a new copy of the cohort which has the same list of samples and cases; you will be the owner of the copy.

This is how you create a copy of another researcher’s cohort that they have shared with you. (Note: If they later change their cohort, your cohort will not be updated; it will remain the same as it was at the time you duplicated it).

Delete a cohort

Click the Delete button to delete the cohort. Confirm by clicking the second Delete button presented.

File Browser

Clicking the File Browser button displays a screen with a list of data files associated with your current cohort. This list includes all files which are stored on the Google Cloud, including both controlled access and open access data.

_images/CohortFileBrowser.png

You can use “Show”, “Page”, “Previous” and “Next” to navigate through the list. The columns are sortable by selecting the column header. You can select a subset of the default columns to show by using the “Choose Columns to Display” tool.

You can filter by Genomic Build (HG19 or HG38) and view which platforms and files are available for the build selected.

You can filter by full or partial Case Barcode on all tabs. To remove the search key word, click the “X” button adjacent to it. Filtering by Case Barcode updates the number to the right of all the other filters.

You may also filter by data type, data format, platform, disease code, disease strategy, and/or experimental strategy. Selecting a filter will update the associated list. The numbers next to the filter refers to the number of files available for that filter.

The tabs “IGV”, “Pathology Images” and “Radiology Images” allow you to filter for files that show you respectively read-level sequence data (viewed using the IGV viewer), pathology images, and radiology images. Please note: only if you have authenticated as a dbGaP authorized user will you be able to select controlled access files to view in the IGV viewer (CCLE data does not require authorization to view the sequence data in the IGV viewer). Details of how to view Sequences, and Pathology and Radiology Images are provided below.

Viewing a Sequence

When available, sequences in a cohort can be viewed using the IGV viewer. To find those sequences that can be viewed, select the IGV link on the File Browser screen. The File Listing panel will display the files that can be viewed with the IGV viewer. Selecting the checkbox in the “View” column (maximum of file files) and clicking the Launch IGV button in the upper panel will display an IGV view of the selected sequence(s) data.

Controlled access files will be viewable by sequence ONLY if you have authenticated as a dbGaP-authorized user.

More information about Viewing a Sequence in the IGV Viewer.

Using the Image Pathology Viewer

Note

All tissue slide images from the TCGA program are currently unavailable for viewing. (Diagnostic images will display.)

When available, pathology images can be viewed using the caMicroscope tool (see more about caMicroscope provide here). These are the pathology images that are associated with TCGA samples. To find images that can be viewed, open a saved cohort and select the File Browser button. You can also select the File Browser link from the Dashboard Saved Cohorts panel. The files associated with your cohort will be shown. Click on Pathology Images to see a list of available pathology images. Hovering over the File Name and clicking on “Open in caMicroscope” will open the image file in a new tab using caMicroscope. (HINT: using a smaller cohort will provide faster response in creating the list of files available).

To zoom into the image, either click the left button or use your wheel to zoom in. Use your mouse to move around the image. To zoom out of the image, shift-slick the left mouse button or use your wheel to zoom out. Selecting caMicroscope at the top of page will send you to the caMicroscope homepage. If you hover over the Slide Barcode section on the top right hand side you will see metadata information listed.

Viewing a Radiology Image

To find images that can be viewed, open a saved cohort and select the File Browser button. You can also click the File Browser link from the Dashboard Saved Cohorts panel. The files associated with your cohort will be shown. Click the Radiology Images tab to view a list of available radiology images. Hovering over the Study Instance UID column and clicking on “Open in CHIF Viewer” will open the series Selection panel in a new tab using Osimis DICOM. (HINT: Using a smaller cohort will provide faster response in creating the list of files available.)

For a more detailed step-by-step process of Viewing Radiology Images using the Osimis DICOM viewer please go here.

Download File List as CSV

To download a list of files that are part of this cohort, select the CSV button in the upper right on the File Listing panel (on all tabs) on the File Browser screen.

The file contains the following information for each file:

  • Case Barcode

  • Sample Barcode

  • Program

  • Platform

  • Experimental Strategy

  • Data Category

  • Data Type

  • Data Format

  • Genomic Data Commons(GDC) File UUID

  • Google Cloud Storage(GCS) location

  • Genomic Data Commons(GDC) Index

  • Index File Google Cloud Storage(GCS) location

  • File Size

  • Access Type (open or controlled access)

Export File List to BigQuery

To export the File List to BigQuery, select the BigQuery button on the File Browser screen. You will need to have registered a Google Cloud Project and a BigQuery dataset to be able to export to BigQuery. More information on how to register a BigQuery Dataset can be found here. You can either make a new table or append to an existing table. You can also give the table a unique name; if left blank, a name will be provided for the table.

The table will contain the following information (for each of the data type tabs):

  • row

  • cohort_id

  • case_barcode

  • sample_barcode

  • project_short_name

  • date_added

  • build

  • gdc_file_uuid

  • gdc_case_uuid

  • platform

  • exp_strategy

  • data_category

  • data_type

  • data_format

  • cloud_storage_location

  • file_size_bytes

  • index_file_gdc_uuid

  • index_file_cloud_storage_location

Export File List to Google Cloud Storage

To export the File List to Google Cloud Storage (GCS), select the GCS button on the File Browser screen. You will need to have registered a Google Cloud Project and a GCS Object to be able to export to GCS. More information on how to register a GCS bucket can be found here. You can also give the object a unique name; if left blank, a name will be provided for the bucket. You will be able to select either CSV or JSON as the file format for exporting into Cloud Storage. All exported files are converted into zip files.

The file will contain the following information (for each of the data type tabs):

  • sample_barcode

  • case_barcode

  • cloud_storage_location

  • file_size_bytes

  • platform

  • data_type

  • data_category

  • exp_strategy

  • data_format

  • gdc_file_uuid

  • gdc_case_uuid

  • project_short_name

  • cohort_id

  • build

  • index_file_storage_location

  • index_file_gdc_uuid

  • date_added

Cohort export to CSV

Click the CSV button to download the cohort in CSV format. The file will contain a list of sample and cases IDs in the cohort.

Cohort export to BigQuery

Clicking the BigQuery button allows you to create a new table or append to an existing table. You must have registered a BigQuery data set with a Google Cloud Project on the registered Google Cloud Projects details page. More information on how to register a BigQuery data set can be found here.

If a user wants to export a cohort to their own premade table, it is required to have the following columns:

{
      'fields': [
          {
              'name': 'cohort_id',
              'type': 'INTEGER',
              'mode': 'REQUIRED'
          },{
              'name': 'case_barcode',
              'type': 'STRING',
              'mode': 'REQUIRED'
          },{
              'name': 'sample_barcode',
              'type': 'STRING',
              'mode': 'REQUIRED'
          },{
              'name': 'project_short_name',
              'type': 'STRING',
              'mode': 'REQUIRED'
          },{
              'name': 'date_added',
              'type': 'TIMESTAMP',
              'mode': 'REQUIRED'
          },{
              'name': 'case_gdc_uuid',
              'type': 'STRING'
          }
      ]
  }

Note: You shouldn’t ever set UUID to ‘required’ because sometimes a sample doesn’t have a UUID, and the attempt to insert a ‘null’ will cause the cohort export to fail.

Cohort export to Cloud Storage

Clicking the GCS button allows you to save the details of the cohort in a specified Google Cloud Storage location. You must have a registered Google Cloud Storage (GCS) bucket with a Google Cloud Project on the registered Google Cloud Projects details page. More information on how to register a GCS bucket can be found here. You will be able to select either CSV or JSON as the file format for exporting into Cloud Storage. All exported files are converted into zip files.

Share a cohort

Clicking the Share button allows you to share the cohort in the Web App with users you select by entering the user’s email.

If the email address you entered is not registered with ISB-CGC, a message displays, “The following user emails could not be found; please ask them to log into the site first:(email entered).”

Public Cohorts

Selecting Public Cohorts from the COHORT menu dropdown displays the Cohorts screen, PUBLIC COHORTS tab. This screen displays details about any public cohorts currently available in the Web App. It displays the cohort name, number of cases, number of samples and the last date each program was updated. Public cohorts can be used for “New Workbook” and “Set Operations”.

To create new workbooks based on a public cohort, check the checkbox adjacent to the public cohort and click on the New Workbook button.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Graphing User Data

Once a user has uploaded their own data to the Web App, that data can be visualized using the same graphing tools that are available for graphing TCGA, CCLE, TARGET and BEATAML1.0 data. However, the process for graphing user data is slightly different from how it is done for that data.

Important sections on the Web App Dashboard

The boxes in the figure below are links that are used to graph user data.

_images/TopAnnotated.png

Step 1: Create a Cohort from your project

  • From the Web App Dashboard, click on Create Cohort.

  • Click on the User Data tab and select the project or study that will be the cohort.

  • Save as a new cohort.

_images/CohortCreation.png

Step 2: Create a Variables Favorite

  • From the Web App Dashboard, click on Create Variable Favorites.

  • Click on the Projects tab to see the user supplied studies.

  • Select the variables that will be available to graph. Note that if the study has a large number of selections, using the browser search function can help locate the item.

  • Give the variables a name and click on the Save as Favorite button.

_images/Variables_selected_genes.png

Step 3: Graph the Favorites in a Workbook

  • From the Web App Dashboard, click on Create a new Workbook.

  • Under the Source Data heading, select the Variables and Cohorts that you wish to use in the graph. In each case you will be brought to a page listing all of the available Variables or Cohorts. Simply select the desired ones and then click the Add to Workbook button.

  • Under the Analysis Type heading, select the appropriate graph type. This will cause a window to slide in from the right.

  • Fill in the X and Y axis variables, select a variable to use for coloring and finally select the cohort to use.

_images/GraphingStart.png
  • Click on the Update Plot button to have the system gather the data and generate the plot.

  • If changes need to be made to the plot, click on the Edit Analysis Settings link to bring back the graph dialog box.

_images/GraphingGraphed.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Integrative Genomics Viewer (IGV)

IGV is a widely used interactive tool for exploring genomic data. A web-based version is integrated into the ISB-CGC Web App, and the IGV desktop version can also be used to access cancer data in Google Cloud Storage (GCS). For more information about IGV, please follow the links in the Acknowledgments section at the bottom of this page.

Accessing the IGV Browser from the Web App

To access IGV, first select a cohort and then go to the cohort file list page by clicking on the “File Browser” button at the top of the page.

_images/cohort.PNG

On the File Browser page, click on IGV in the top menu bar.

The resulting file list can be filtered using the Build (HG19 or HG38) and the other filters listed on the left. Click the View checkbox (far right column) for each file that you want to view in IGV. Sometimes the checkbox cannot be checked; here are some reasons why:

  • Many files viewable in IGV may require that the user have dbGaP authorization to view controlled access data. If the user has been authenticated and authorized through the user details page, the user will be able to select files. Otherwise the cursor will be disabled when the user hovers over a checkbox. Open source data such as the CCLE project do not require dbGaP authorization and can be viewed by any authenticated user.

  • Only a maximum of five files can be selected for viewing at a time.

To view the selected files in the IGV Browser, click on the “Launch IGV” button in the upper right of the window.

_images/CCLE_Files.PNG

NOTES:

  • You will only be able to view controlled access sequence files if you have logged in as a registered dbGaP authorized user.

  • You will need to disable your browser pop-up blocker to view files with IGV. If you see a 403 error when using the IGV viewer, the pop-up blocker is the cause of that error. Turn off the blocker and try again.

Using IGV Desktop Application to View Aligned Reads in Google Cloud Storage

You can also download and use the IGV desktop application to view aligned reads stored in BAM files in Google Cloud Storage. To do this, download the most recent version of IGV. After launching IGV, go to the “Settings” menu to enable the Google Menu item in the application (directions on how to do this).

To load BAM files from ISB-CGC Google Cloud Storage, use the “File” > “Load from URL…” menu item in the IGV application, entering the path to the bam file in GCS. Paths to BAM files stored by ISB-CGC can be found using the cohorts().cloud_storage_file_paths() and samples().cloud_storage_file_paths() APIs described here.

NOTE:

Acknowledgments

The copyright to the Integrative Genomics Viewer is held by the Broad Institute, and the software has been released under the MIT License. For more information about IGV please see the IGV home page or the IGV github repo.

We are grateful to the IGV team for their assistance in integrating IGV into the ISB-CGC Web App.

Robinson J T, Thorvaldsdottir H, Winckler W, Guttman M, Lander E S, Getz G & Mesirov J P, Integrative genomics viewer, Nature Biotechnology 29, 24-26 (2011).

Thorvaldsdottir H, Robinson J T, Mesirov J P, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Briefings in Bioinformatics 14, 178-192 (2013).


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Radiology Viewer

Radiology images are viewed in an Osimis Web Viewer, a plug-in to the Orthanc Image Server (Orthanc). The ISB-CGC web application uses an instance of Orthanc to manage radiology files for the purpose of viewing. Currently only DICOM formatted files from TCGA samples are available for viewing. It may be helpful to review the DICOM Model of the Real World to understand the relationship between patient DICOM studies, DICOM series and DICOM instances.

The ISB-CGC web application File Browser page presents a table of DICOM studies associated with patients in some cohort.

_images/OsimisPick.png

Viewer Components

Clicking on a study in the table opens an Osimis Web Viewer in a new tab:

_images/OsimisInitialDisplay.png

All the DICOM series which comprise the selected study are shown as thumbnail images in the Series Selection Panel. Note that it can take several seconds for these thumnails to appear. By default, the thumbnails are laid out in a grid pattern. This can be changed to a list format by clicking on the List Display button list above the thumbnails. The list format displays a description of each series.

_images/OsimisThumbnailList.png

Change back again to the grid pattern by clicking on the Grid Display button grid.

To the lower right of each thumbnail is a small blue circle in which is displayed the number of DICOM instances which comprise the corresponding DICOM series. In addition, when your cursor hovers over a thumbnail, the viewer cycles through the instances comprising the series. (Note these low resolution images are loaded in the background and may not be available for cycling immediately after the viewer window opens.)

To view a larger rendering of a series, drag its thumbnail into the viewport. The first instance of the series is immediately displayed.

_images/OsimisSingleVP.png

At the same time, at the bottom of the viewport you will notice a grid comprised of a series of rectangles corresponding to the instances in the series. The color of the tabs indicates the following:

  • Black: The corresponding instance is not yet available for viewing

  • Red: A reduced resolution image of the instance is available for viewing

  • Green: The full resolution image of the instance is available for viewing

Typically, the viewer loads reduced resolution images for all series as quickly as possible. It loads full resolution images only when a series is dragged into the viewport.

When instance images have been loaded, you can scroll through the instances using your mouse’s thumbwheel or equvalent. As you scroll, the grid at the bottom of the screen highlights the instance currently being displayed. Clicking on a rectangle in the grid causes the corresponding instance to be displayed. The Play Controls play in the lower left corner of the main window enable you to single step forward or backward through the series, and to cycle through the series repeatedly. A frame rate slider pops up when you hover over the play button.

Viewing Functions

A set of buttons above the viewport provides a range of functions.

_images/OsimisViewportButtons.png

The Layout button layout controls subdividing the viewport for the simultaneous display of one, two or four series. Drag a series into any of the subviewports to display it. Clicking in a subviewport gives it focus for mousewheel and cursor drag operations.

_images/OsimisMultiVP.png

Of the remaining buttons, some are modal, changing the effect of the cursor drag function. A blue line underscores the currently selected mode. Other buttons immediately perform some operation on the subviewport that has focus.

  • The Invert Color button invert immediately inverts the colors of the series in the (sub)viewport having focus.

  • The Zoom button zoom is modal. When selected, dragging the cursor with mouse button depressed expands or contracts the series in the (sub)viewport having focus. Expansion/contraction is around the cursor position at which dragging begins.

  • The Pan button pan is modal. When selected, dragging the cursor with mouse button depressed causes panning of the series in the (sub)viewport having focus.

  • The Windowing Presets button presets operates both modally and immediately. Hovering the cursor over the button displays a list of windowing presets, one of which can be selected by clicking on it. The selection immediately sets Window Width (WW) and Window Center (WC) values for the series in the (sub)viewport having focus. The WW,WC value pair specifies a linear conversion from stored pixel values to values to be displayed. See here for further information on Window Center and Window Width.

    DICOM instances generally include WW,WC value pairs and these are used by default. Other WW,WC value pairs that may be appropriate for specific cases can be selected on the pop-up. The Preset #1 selection restores WW,WC to the DICOM setting.

    The Windowing Presets button operates modally when clicked. In this mode, dragging the cursor left or right in a (sub)viewport changes the Window Width value applied to the series in that (sub)viewport. Dragging the cursor up or down in a (sub)viewport changes the Window Center value applied to the series in that (sub)viewport.

  • The Magnifying Glass button glass is modal. Hovering the cursor over the button displays a pop-up containing two sliders that control the magnification level and size of a virtual magnifying glass. When selected, dragging the cursor with mouse button depressed opens a virtual magnifying glass that displays a magnified rendering of the underlying image in the region of the cursor.

  • The Length Measurement button len is modal. When selected, the distance in physical units between two points in an instance can be measured. To perform a measurement, click the mouse button once with the cursor over some point of interest, and then again over a second point of interest. Alternatively, depress and hold the mouse button while the cursor is over the first point of interest, then release the mouse button while the cursor is over the second point of interest. A line joining the two points and its length are displayed. The line will scale if the series is zoomed in or out.

    A length measurement can be moved by clicking on it and dragging. To remove a length measurement, drag it or an endpoint outside of the extent of the between instance. Note that if you have “zoomed in” on an instance, its extent may be much larger than the (sub)viewport in which it is displayed. This can make it difficult to drag the measure outside of the extent of the instance. In this case it may be necessary to “zoom out” in order to be able to drag the measure outside of the extent of the instance.

    A length measurement is only visible on the instance on which it was made. There is currently no support for saving length measurements.

  • The Angle Measurement button ang is modal. When selected, the angle between features in an instance can be measured. To perform a measurement, click on a point of interest in an instance. A pair of lines are displayed. Drag the end points of the lines as needed to form the angle to be measured. The angle between the lines is displayed continuously as any endpoint is dragged.

    An angle measurement can be moved by clicking on one of the lines and dragging it while holding down the mouse button. To remove an angle measurement, drag it or an endpoint outside of the extent of the instance. Note that if you have “zoomed in” on an instance, its extent may be much larger than the (sub)viewport in which it is displayed. This can make it difficult to drag the measure outside of the extent of the instance. In this case it may be necessary to “zoom out” in order to be able to drag the measure outside of the extent of the instance.

    An angle measurement is only visible on the instance on which it was made. There is currently no support for saving angle measurements.

  • The Pixel Probe button probe is modal. When selected, clicking on a point in an instance displays a circle at the probe point, the X and Y location of the pixel relative to the top left corner of the instance, and the intensity or color of the selected pixel. The value of color instance pixels is specified in RGB coordinates. For monochrome instances, both a Stored Pixel value (SP) and a Modality Pixel value (MO) are displayed. The MO values is calculated as SP * RescaleSlope + RescaleIntercept, where RescaleSlope and RescaleIntercept are DICOM values of the instance.

    A pixel probe can be moved by clicking on the probe indicator and dragging it while holding down the mouse button. To remove a pixel probe, drag it outside of the extent of the instance. Note that if you have “zoomed in” on an instance, its extent may be much larger than the (sub)viewport in which it is displayed. This can make it difficult to drag the measure outside of the extent of the instance. In this case it may be necessary to “zoom out” in order to be able to drag the measure outside of the extent of the instance.

    A pixel probe is only visible on the instance on which it was made. There is currently no support for saving pixel probes.

  • The Elliptical ROI button eROI is modal. When selected, click on an instance and drag either of the small circles to configure an elliptical region of interest. The area, in pixels, of the ellipse is displayed near the ellipse. On monotone instances, the mean and standard deviation of the intensities of the pixels within the ellipse are also displayed.

    An ellipse can be moved by clicking on its border and dragging it while holding down the mouse button. To remove an elliptical ROI, drag the ellipse or one of its control points outside of the extent of the instance. Note that if you have “zoomed in” on an instance, its extent may be much larger than the (sub)viewport in which it is displayed. This can make it difficult to drag the ROI outside of the extent of the instance. In this case it may be necessary to “zoom out” in order to be able to drag the ROI outside of the extent of the instance.

    An elliptical ROI is only visible on the instance on which it was made. There is currently no support for saving elliptical ROIs.

  • The Rectangle ROI button rROI is modal. When selected, click on an instance and drag either of the small circles to configure a rectangular region of interest. The area, in pixels, of the rectangle is displayed near the rectangle. On monotone instances, the mean and standard deviation of the intensities of the pixels within the rectangle are also displayed.

    A rectangle can be moved by clicking on its border and dragging it while holding down the mouse button. To remove a rectangular ROI, drag the rectange or one of its control points outside of the extent of the instance. Note that if you have “zoomed in” on an instance, its extent may be much larger than the (sub)viewport in which it is displayed. This can make it difficult to drag the ROI outside of the extent of the instance. In this case it may be necessary to “zoom out” in order to be able to drag the ROI outside of the extent of the instance.

    A rectangular ROI is only visible on the instance on which it was made. There is currently no support for saving rectangular ROIs.

  • The Rotate Left button left immediately performs a ninety degree left rotation of the image in the (sub)viewport that has focus.

  • The Rotate Right button right immediately performs a ninety degree right rotation of the image in the (sub)viewport that has focus.

  • The Flip Horizontally button hflip immediately performs a flip about the Y axis of the image in the (sub)viewport that has focus.

  • The Flip Vertically button vflip immediately performs a flip about the X axis of the image in the (sub)viewport that has focus.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Registering a Google Cloud Project

This section will show you how to register a Google Cloud Project (GCP), which you can use to store data from ISB-CGC. Users need to have access to a Google Cloud Project to perform the steps in this section. If you don’t, see the the ISB-CGC Quick-Start Guide.

To allow flexibility while working with different research teams and different processes, you can have many GCPs registered with ISB-CGC.

Registering your Google Cloud Project

Click on screen shots to enlarge them.

To register your Google Cloud Projectwith ISB-CGC, go to the Account Details page. After signing into the ISB-CGC Web App, either select the “persona” icon next to your login name or select Account Details from the drop down menu under your login name, which takes you to the following page:

_images/RegisteredGCPs.png

Click the Register button in the Google Cloud Platform section. That takes you to the following page:

_images/RegisterAGCPForm.png

The instructions will walk you through how to add the necessary ISB-CGC and DCF service accounts to your project. Go to the Google Cloud Platform and follow these steps. You can hide the instructions by selecting the blue Instructions button.

Please be sure to add both service accounts listed below. If you don’t add both service accounts you will run into issues. Then return to the ISB-CGC Register a Google Cloud Project page, enter your Google Cloud Project ID and, click Verify.

_images/RegisterServiceAccountsList.png

Once you have completed these steps, a listing of the Google Cloud Project members will display:

_images/GCPMembers.png

Click the Register button to go to the next screen:

_images/kidsprojectregistered.png

Managing your Google Cloud Projects

You can add or delete Google Cloud Projects by following the instructions below.

Adding additional Google Cloud Projects

To register additional Google Cloud Projects, select the + Register New Google Cloud Project button from the “Registered Google Cloud Projects” page (see screenshot below).

_images/registerAnotherGCP.png
Deleting Google Cloud Projects

To unregister a GCP, select the Unregister Project button from the drop down menu beside the project on the “Registered Google Cloud Projects” page (see screenshot below).

_images/unregisterGCP.png

Registering Cloud Storage Buckets and BigQuery Datasets

Registering a Google Cloud Storage Bucket and a BigQuery Dataset is a prerequisite for storing data downloaded from the Web App to your own Google Cloud location. (Please note: The names of the buckets and data sets are case sensitive.)

How To Register Buckets and Datasets

Once you have created a bucket and a dataset in the Google Cloud Console of your Google Cloud Project, you will need to register them with your project name using the Web App.

Step 1: Click on your user icon in the upper right or Account Details from the drop down menu under your name.

_images/Register_Step_1.png

Step 2: Click on the View button under Google Cloud Projects.

_images/Register_Step_2.png

Step 3: Click on the project you wish to use. If you have not registered a project, follow the instructions above.

_images/Register_Step_3.png

Step 4: Use the “Register Cloud Storage Bucket” or “Register BigQuery Dataset” links to add buckets and datasets as needed.

_images/Register_Step_4.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Data used by the Web App

The Web App performs its data retrieval and counts on ISB-CGC Google BigQuery tables which are based on the latest GDC data release. This means that you will see current data, but that the same queries in the Web App could produce different results if they were run during different time periods, when the Web App was based on different GDC data releases.

Sharing Cohorts between the Web App and the API

Cohorts are one of the central concepts used when analyzing large datasets. Cohorts can be created either in the Web App or via the ISB-CGC REST API. What may not be as clear is that cohorts created by one of the systems can be viewed and used in the other. In other words, you can create a cohort using the API and use it in the Web App or you can create a cohort in the Web App and use it in the API. This can give users significant flexibility in creating and sharing their cohorts.

Choosing a Web Browser

The Web App was optimized for use with the Google Chrome web browser. Most of the functionality should work with recent versions of other web browsers (e.g. Firefox, Safari, Internet Explorer). If you find an issue and you are not using Chrome, please try using Chrome to see if the issue appears to be browser specific.

Web App Time Zone

Also please note the system is set in Pacific time, so if you see some inconsistencies with the time in the workbooks or cohorts you generated, it could be due to this.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Cohort Builder/Data Explorer

Cohorts are a way of creating custom groupings of the samples and/or cases that you are interested in analyzing further. The Cohort Builder/Data Explorer is an ISB-CGC web interface which allows you to build cohorts based on clinical demographics and molecular filters. Compare patient cohorts with various exploration tools including IGV viewer, image viewers, and analytical visualization.

Selecting Cohort Builder/Data Explorer from the Resources drop down menu on the ISB-CGC home screen will display the Create Cohorts - Filters screen. Another way to get to this screen is to click on the Launch icon in the Cohort Builder/Data Explorer box in the Resources section of the ISB-CGC home page.

You will be able to use the available filters to create a cohort, without needing to log into the ISB-CGC Web Application. Except for the ability to save cohorts, this screen has the same functionality as the one that you navigate to when selecting the COHORTS - Create a New Cohort - Filters option after signing into the Web App. To learn more about this screen, see the Cohorts documentation.

You may want to frequently reuse a cohort in multiple analyses. Creating a “saved cohort” allows you to do this. If this is the case, click on the Login to Save New Cohort button.

_images/CreateCohorts-noSignIn.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Cancer Data File Browser

The Cancer Data File Browser is an ISB-CGC web interface which allows you to explore a comprehensive selection of cancer related data files in Google Cloud Storage Buckets, such as raw sequencing, cancer nucleotide variation, pathology or radiology images.

Selecting Cancer Data File Browser from the Data Browsers drop down menu on the ISB-CGC home screen will display the Cancer Data File Browser screen. Another way to get to this screen is to click on the Launch icon in the Cancer Data File Browser box in the Data Browsers section of the ISB-CGC home page.

You will be able to use the available filters to select a file record list. Click on the CSV button to download this list which includes barcodes and GCS locations, without needing to log into the ISB-CGC Web Application. Except for the ability to save output results to a Google BigQuery table or to a Google Cloud Storage Bucket (GCS), this screen has the same functionality as the one that you navigate to when selecting the File Browser button from the Saved Cohorts screen, after signing into the Web App. To learn more about this screen, see the Cohorts File Browser documentation.

Note that the maximum number of file records that can be downloaded is 65000. You’ll need to use the filters to get the file listing results below this number.

If you decide to log into the ISB-CGC Web App (using Sign In in the upper right-hand corner), you can register a Google Cloud Project and BigQuery data set and export the file record list to a BigQuery table or a Google Cloud Storage Bucket.

_images/DataBrowser-noSignIn.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Mitelman Database

The Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer is devoted to genes, chromosomes, and cancer. The Mitelman Database is supported by NCI (National Cancer Institute), the Swedish Cancer Society and the Swedish Childhood Cancer. The Mitelman Database is available from ISB-CGC to access and search the data.

The information in the Mitelman Database relates cytogenetic changes and their genomic consequences, in particular gene fusions, to tumor characteristics, based either on individual cases or associations. All the data have been manually culled from the literature by Felix Mitelman in collaboration with Bertil Johansson and Fredrik Mertens. The database is updated quarterly in January, April, July, and October.

Using the Mitelman Database

It can be accessed from the ISB-CGC homepage (https://isb-cgc.org/) by clicking on Launch in the Chromosomal Aberrations & Gene Fusions DB box or selecting Chromosomal Aberrations & Gene Fusions DB from the Data Browsers drop down menu on the main menu bar. It can also be accessed directly from https://mitelmandatabase.isb-cgc.org.

The user queries the database by parameters such as topography, morphology, gene characteristics, cytogenetic aberrations, and journal references. There are five searchers available:

  • Cases Cytogenetics Searcher

    • allows you to query the individual patient cases using fields such as the aberration, breakpoint, morphology, and topography

  • Gene Fusions Searcher

    • finds studies pertaining to gene rearrangements, in particular gene fusions, detected either as a consequence of cytogenetic aberrations or identified by sequencing

  • Clinical Associations Searcher

    • searches studies pertaining to clinical associations of cytogenetic aberrations and/or gene rearrangements.

  • Recurrent Chromosome Aberrations Searcher

    • provides a way to search for structural and numerical abnormalities that are recurrent, i.e., present in two or more cases with the same morphology and topography

  • References Searcher

    • queries only the references themselves, i.e., the references from the individual cases and the molecular biology and clinical associations

Until June 2022, the resulting genetic location information retrieved from the database was only displayed in karyotypes. Now, genomic coordinates are also displayed. Thanks to procedures incorporated from the web-based tool CytoConverter, karyotypes are converted to genomic coordinates and can be optionally viewed by the Mitelman Database user.

The user has the option of viewing the genomic coordinate information for either individual karyotypes or for multiple karyotypes in a search result. For individual karyotypes, the corresponding chromosome and its start and end position are given. In addition, the type of imbalance (gain or loss) is noted. For multiple karyotypes in the search results, net imbalances across the selected group are displayed in chart, ideogram or tabular format; information includes the chromosome affected, start and end positions, and whether the segment has been lost or gained.

How to Cite

To cite the use of the Mitelman Database, authors should cite the following source:

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (2022). Mitelman F, Johansson B and Mertens F (Eds.), https://mitelmandatabase.isb-cgc.org

In addition, when using information about chromosomal gains and losses (found on the Karyotype Info and Overall Chromosomal Imbalances pages), please also cite the following:

CytoConverter: a web-based tool to convert karyotypes to genomic coordinates. Wang, J., LaFramboise, T. BMC Bioinformatics 20, 467 (2019). https://doi.org/10.1186/s12859-019-3062-4

More Information

For more information, please see the Mitelman Database About page and User Guide.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

The TP53 Database

The TP53 Database (https://tp53.isb-cgc.org) compiles TP53 variant data that have been reported in the published literature since 1989 or are available in other public databases. Database releases are identified by a number. The following data are available:

The TP53 Database is meant to be a source of information on TP53 variants for a broad range of scientists and clinicians who work in different research areas:

  • Basic research, to study the structural and functional aspects of the p53 protein and the TP53 gene

  • Molecular pathology of cancer, to understand the clinical significance of TP53 variants identified in cancer patients

  • Molecular epidemiology of cancer, to analyze the links between specific exposures and TP53 variant patterns in order to make inferences about possible causes of cancer

  • Molecular genetics, to analyze genotype/phenotype relationships

The database includes various annotations on the predicted or experimentally assessed functional impact of TP53 variants, clinicopathologic characteristics of tumors and demographic and life-style information on patients. This information is useful to compile tumor-specific variant patterns and to draw hypotheses on the nature of the molecular events involved in TP53 mutagenesis and allows for the analysis of genotype/phenotype relationships.

Detailed information on data and annotations available is provided in the User Manual.

The ongoing project involves:

  • Performing regular review of the literature on TP53 variants

  • Extracting TP53 data from genetic and genomic databases

  • Developing standard annotations of TP53 variants

  • Performing research on TP53 variants, their patterns, origins and clinical impacts.

How to Cite

When using the database, authors should cite the following source:

The TP53 Database (R20, July 2019): https://tp53.isb-cgc.org

and refer to de Andrade K.C. et al. in the bibliography as below:

de Andrade, K.C., Lee, E.E., Tookmanian, E.M. et al. The TP53 Database: transition from the International Agency for Research on Cancer to the US National Cancer Institute. Cell Death Differ (2022). https://doi.org/10.1038/s41418-022-00976-3

More Information

For information regarding releases, credits and disclaimers, please see the TP53 About page.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

The Synthetic Lethality Resource

Synthetic lethal interactions (SLIs), genetic interactions in which the simultaneous inactivation of two genes leads to a lethal phenotype, are promising targets for therapeutic intervention in cancer, as exemplified by the recent success of PARP inhibitors in treating BRCA1/2-deficient tumors. We present SL-Cloud, a cloud-based integrated resource to facilitate the prediction of context-specific SLIs. This resource addresses two main challenges related to SLI inference: the need to wrangle and preprocess large multi-omic datasets and the multiple comparable prediction approaches available.

SL-Cloud provides a cloud-based data access platform coupled with software and well documented computational notebooks that reimplement published synthetic lethality (SL) inference algorithms to facilitate novel investigation into synthetic lethality. In addition, we provide general purpose functions that support these prediction workflows, e.g. saving data in BigQuery tables. We anticipate that users can leverage the resources provided in this project to conduct highly customizable analysis based on their cancer type of interest and particular context.

More information about SL-Cloud can be found in the following links:


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Pipelines and APIs

APIs

ISB-CGC provides programmatic access to cancer data (both open and controlled-access) and metadata stored on the Google Cloud Platform through a combination of ISB-CGC APIs and Google APIs. Access to ISB-CGC metadata and user-data such as patient cohort definitions is provided through the ISB-CGC API. For more details on Google Cloud APIs, please read the extensive in-depth Google Cloud APIs documentation.

ISB-CGC APIs

ISB-CGC provides a Swagger API, which interacts with the ISB-CGC Web App data, including user-generated data such as cohorts. We provide API calls pertaining to specific samples, cases, files, cohorts, and users. The syntax for all of these is available on the ISB-CGC API v4.0 UI webpage.

The ISB-CGC APIs can also be used via Python and R. We have tutorial notebooks available in our Community Notebook Repository.

Some example uses of the ISB-CGC API are:

  • Obtaining detailed metadata about a particular patient or sample

  • Creating (or retrieving a previously saved) cohort of patients and samples

  • Retrieving a cohort’s file manifest using the cohort ID or specific filters

  • Register, refresh, and unregister a specified Google Cloud Project (GCP)

Note

APIs calling user-generated data (such as your cohorts) require identity credentials.

Authorization

Some of the APIs - such as for programs, samples, and cases - can be accessed without authorization. APIs that call on information saved in a user’s account, such as the cohorts and GCP APIs, require account authorization.

In order to access the APIs that require ISB-CGC authorization, you will need to generate a credentials file on your local machine or on your VM. To load your credentials into your command line interface:

  1. Clone the ISB-CGC-API scripts GitHub repository to your local machine.

  2. Run the isb_auth.py script either through the command line or within Python.

  3. If you are running the ISB-CGC APIs on a VM, upload the file generated by the above process.

ISB-CGC API v4.0 UI

The ISB-CGC API v4.0 UI displays details about the syntax for each call and also provides an interface to test requests.

To generate a subset of ISB-CGC hosted data with your desired characteristics, we have provided tools to generate cohorts of patients. In addition to the BigQuery command line, users may create and share cohorts using the ISB-CGC Web App and then access them using the Swagger UI API.

Make a Request

As mentioned before, some of the API calls will require authentication - denoted by a small lock symbol. This can be done by using the ‘Authorize’ button at the top right of the page.

For a quick demonstration of the syntax of an API call, one can test the POST/samples request. This API request has the following syntax:

{
 "barcodes": [
 <barcode 1>,
 <barcode 2>,
 ...,
 <barcode n>,
 ]
}

TCGA samples are easily selected by using the 16-character barcode, i.e. TCGA-B9-7268-01A, while patients are identified using the 12-character prefix of the sample barcode, in this case TCGA-B9-7268. Other data sets such as CCLE may use other naming conventions.

The value in the Parameters field can be edited by selecting ‘Try it out’. One can change the default sample barcodes or leave them. The request can be run by selecting ‘Execute’.

Request Response

Swagger UI submits the request and shows the curl code that was submitted. The ‘Response body’ section will display the response to the request. The expected format of the response for the above request is shown below:

{
 "data": [
 {
  "samples": [
    {
         "data_details": [
           {
             <key 1>: <value 1>,
             <key 2>: <value 2>,
             ...,
             <key n>: <value n>,
           }
         ],
         "biospecimen_data": {
           <key 1>: <value 1>,
           <key 2>: <value 2>,
           ...,
           <key n>: <value n>,
         },
         "sample_barcode": "string",
         "case_barcode": "string"
       }
     ]
   }
 ],
 "code": 0,
 "barcodes_not_found": [
   "string"
 ],
 "total_found": 0,
 "notes": "string"
}

The JSON formatted response can be downloaded by selecting the ‘Download’ button.

Warning

  • Any special characters in the input field will cause the request to fail. e.g. spacing in input box.

  • Please make sure to delete all fields not being used.

  • Case barcode centric requests only pull file paths specific to case entries.

  • Sample centric requests pull file paths specific to sample entries.

  • Cohorts made using the Web App will differ in sample counts from cohorts made using BigQuery tables. The Web App takes into consideration samples which correspond to pathology slide images and this information is currently not in the BigQuery tables.

For any questions or feedback on the API, please do not hesitate to contact us at feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

St. Jude Bioinformatics Tools

The following bioinformatics tools and workflows developed by St. Jude have been containerized and made available for execution in the cloud. Each link below navigates to the tools’ original documentation. If you would like guidance on how to run these on ISB-CGC, please attend our office hours or contact us (feedback@isb-cgc.org).

CICERO (Clipped-reads Extended for RNA Optimization) is an assembly-based algorithm to detect diverse classes of driver gene fusions from RNA-seq.

GitHub

RNAIndel calls coding indels from tumor RNA-Seq data and classifies them as somatic, germline, and artifactual.

GitHub

Teltale is a program that computes the fraction of telomeric reads in a BAM file.

GitHub

NetBID (Network-based Bayesian Inference of Drivers) is a data-driven system biology pipeline and toolkit for finding drivers from transcriptomics, proteomics and phosphoproteomics data, where the drivers can be either transcription factors (TF) or signaling factors (SIG).

GitHub


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Data Access and Security Overview

Understanding Data Access Levels

  • Public Data Sometimes the word “public” is misinterpreted as meaning “open”. All of the TCGA data is public data, and much of it is open, meaning that it is accessible and available to all users; while some low-level TCGA data is controlled and restricted to authorized users.

  • Open-Access Data Depending on how you categorize the data, most of the TCGA data is open-access data. This includes all de-identified clinical and biospecimen data, as well as all Level-3 molecular data including gene expression data, DNA methylation data, DNA copy-number data, protein expression data, somatic mutation calls, etc.

  • Controlled-Access Data All low-level sequence data (both DNA-seq and RNA-seq), the raw SNP array data (CEL files), germline mutation calls, and a small amount of other data are treated as controlled data and require that a user is properly authenticated and have dbGaP authorization prior to accessing these data.

Note that many public, open-access datasets may still be restricted in various ways. Typically, a License document containing explicit terms of use will be associated with each dataset. Some institutions have their own licenses, though many uses one of the Creative Commons licenses. License terms apply to both data and source-code, so please be aware of the terms of a license whenever you plan to reuse data or source code produced by someone else. We recommend that you review the TCGA Publication Guidelines.

Understanding Data Security

Much of the low-level TCGA and TARGET data (including DNA and RNA reads, and SNP CEL files, for example) are classified as “controlled access data” and are under the control of the dbGaP Data Access Committee (DAC).

Investigator(s) requesting to receive genomic data in accordance with the NIH Genomic Data Sharing Policy are required to submit:

  • a data access request (DAR)

  • a research use statement (RUS)

Note: Requesters and institutional signing officials (SO) must have NIH eRA user IDs to begin this process. Visit the electronic Research Administration (eRA) for more information on registering for an NIH eRA account. NIH staff may utilize their NIH login. (See the dbGaP Data Access Request Portal for additional instructions.)

Additionally, they must:

  • Submit a Data Use Certification (DUC) co-signed by the designated Institutional Official(s) at their sponsoring institution

  • Protect data confidentiality (any data which has been designated “controlled” must be protected accordingly, unless prior release authorization is obtained from an NCI data custodian)

  • Ensure that appropriate data security measures are in place

Google Cloud Platform and Access Control

In the context of Google Cloud Platform (GCP) projects, it is important to realize that all members of a GCP project must have at least read access to all data stored within that project, as well as to all virtual machines, boot disks, and persistent disks attached to that project.

Therefore, if a principle investigator (PI) establishes a GCP project (project-A) for the purposes of analyzing controlled data (eg performing mutation analysis on TCGA sequence data), then:

  • All members of project-A must be authorized to view controlled data.

  • The outputs of certain analyses performed on controlled data, if they are summary in nature, may no longer be controlled data and could be copied to a second GCP project (project-B) for further downstream analyses by researchers who are not authorized to view controlled data.

  • Researchers who are not authorized to view controlled data could be made members of project-B, while users who are authorized could be members of both project-A and project-B.

Your Responsibilities

The PI and the PI’s institution are responsible for and will be held accountable for ensuring the security of controlled data, not the cloud service provider. The Google Cloud Platform has been certified as FedRAMP compliant which means that it has been independently assessed and shown to meet all necessary FedRAMP security controls. This provides the assurance that the data security and access control mechanisms implemented by the Google Cloud Platform and made available to end users are sufficient to safeguard the data. However, it remains the PI’s responsibility to ensure that these access control mechanisms are used appropriately and effectively within the context of the PI’s GCP project.

You should think about securing controlled data within the context of your GCP project in the same way that you would think about securing controlled data that you might download to a file server or compute cluster at your own institution. Your responsibilities regarding the appropriate use of the data are the same in a cloud environment. For more information, please refer to the NIH Security Best Practices for Controlled-Access Data.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Accessing Controlled Data

In ISB-CGC, you can gain access to controlled data via personal user credentials:

  • Provides access to controlled data for 24 hours at a time;

  • Uses your personal credentials;

  • Example uses: the ISB-CGC Web App, R Studio or running short jobs on Google Compute Engine that complete in under 24 hours

Note

If you are looking to gain access to COSMIC data, please see the COSMIC documentation.

Prerequisites

You’ll need the following before requesting controlled access via ISB-CGC:

  • A Google identity;

  • An NIH or electronic Research Administration (eRA) account;

  • Database of Genotypes and Phenotypes (dbGaP) permission for each type of controlled access data of interest, linked to your NIH or eRA account;

  • Your Google identify linked to your NIH/eRA account via the ISB-CGC Web App.

1) Google identity

If you don’t have a Google identity yet, please see the ISB-CGC Quick-Start Guide.

2) NIH or eRA account

Intramural researchers can use their NIH log-in account, and extramural researchers will need to have a personal eRA account. Either way, the user’s NIH/eRA account needs to be affiliated with their institution’s eRA account. Your principal investigator (PI) or other authorized person can create your personal eRA account and link it to your institution’s eRA account.

If you already have an NIH/eRA account, you can log into eRA at https://public.era.nih.gov/commons.

  • If the Institution listed for you is not your current one, ask your PI to change it for you.

  • If you are the PI or other authorized person, you can create, link and update accounts from here.

Visit electronic Research Administration (eRA) for more information on registering for a NIH eRA account.

Controlled Access Via Personal User Credentials

The first time that you perform the above steps, you are automatically granted controlled access via your personal user credentials. This access lasts for 24 hours, though it can be extended. Subsequently, to obtain access, sign into the Web App, click on your persona (or Account Details on the drop down menu next to your name). Click the Get Controlled Access button below Obtain controlled access for 24 hours.

_images/DataAccess-24hours.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Notebooks

What’s a notebook?

Notebooks provide an interface to an interactive analysis environment. They are a mix of code (usually R or Python), descriptive explanations, and visualizations. They’re often used to demonstrate an analysis in a step by step fashion. We provide a set of notebooks below as tutorials for several frequently run analyses. You can run these through Jupyter Lab, R Studio, or Google Colaboratory.

I’m a novice, how do I…

Get started fast?

Python

R

Find GDC file locations?

Python

R

Plot a BigQuery result?

Python

R

Plot a heatmap using data in BigQuery?

Python

R

Work with cloud storage?

Python

Create cohorts of patients?

Python

R

Use PyPika or dbplyr to build a query?

Python

R

Create a complex cohort?

Python

R

Join multiple tables?

Python

Get started working with the COSMIC datasets?

Python

Convert a .bam file to a .fastq file with samtools?

Python

Find a GA4GH Tool Repository Service (TRS) tool?

Python

Run workflow execution service (WES) tools?

Python

Use the ISB-CGC APIs?

Python

R

Explore CPTAC protein abundances?

Python

Compare protein and gene expression in CPTAC?

Python

I’m an advanced user, how do I…

Make a BigQuery table from an NCBI GEO data set?

Python

Compare cohorts with survival analysis and feature comparison?

Python

R

Run an ANOVA with BigQuery?*

Python

R

Score gene sets in BigQuery?*

Python

R

Correlate gene expression and copy number variation?

Python

Compute gene-gene expression correlation using BigQuery?

Python

Create randomized subsets of patients using BigQuery?

Python

R

Convert a 10X scRNA-seq bam file to fastq with dsub?

Python

Quantify 10X scRNA-seq gene expression with Kallisto and BUStools?

Python

Compute Nearest Centroid Classification using BigQuery?

Python

R

Analyze data in the COSMIC Cancer Gene Census dataset?

Python

Use a BigQuery user defined function to perform k-means clustering?

Python

Compute correlations of protein and gene expression in CPTAC?

Python

Compare protein expression from different pipelines using CPTAC data?

Python

Calculate associations between radiomics tumor imaging features and gene expression?

Python

Analyze the correlation between gene mutations and tumor imaging features?

Python

Compare gene expression in tumor against gene expression in normal tissue?

Python

Identify cancer pathways from the Reactome database that are related to a set of genes?

Python

Integrate the Targetome and Reactome datasets to identify pathways affected by cancer drugs?

Python

*Notebook inspired by a Query of the Month Blog post


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Statistical Notebooks

Integrated statistical analysis and exploration of multiple genomic and clinical data types provides researchers with a great possibility to expand our current knowledge of cancer. ISB-CGC offers a great source of diverse data types including gene expression, somatic mutations, clinical data, etc. We have developed a series of notebooks that use BigQuery to compute the statistical associations between different combinations of the data types available in ISB-CGC.

Bioinformatics notebooks

Significant correlations and their p-values using BigQuery

Python

One-way ANOVA with BigQuery

Python

R

Score gene sets in BigQuery

Python

R

Nearest Centroid Classification using BigQuery

Python

R

Standard pairwise statistics

The following table lists notebooks that compute associations between pairs of data types available in ISB-CGC. They assess the statistical significance for an association using rank-ordered data and a statistical test appropriate to each data type pair depending on categorical or numerical categorization. The Regulome Explorer inspired notebook is a special notebook that allows computation of associations between all possible data types available in the TCGA dataset; more details are below.

Data type

Data type

Statistical test/notebook

Gene expression

Clinical

Kruskal-Wallis score

Gene expression

Somatic mutation

T-test score

Gene expression

Gene expression

Spearman Correlation

Somatic mutation

Clinical

Chi Square test

Somatic mutation

Somatic Mutation

Fisher’s exact test

All types

All types

Regulome Explorer inspired notebook

Regulome Explorer Inspired Notebook

Regulome Explorer is a well-established web tool for the exploration and visualization of associations between clinical and molecular features of TCGA data. Regulome Explorer was developed in 2012 in close collaboration between the Institute for Systems Biology and the MD Anderson Cancer Center. It enables users to search and visualize precomputed statistical data filtered according to user-specified parameters. Although Regulome Explorer’s broad functionality and high-quality graphics make it a valuable tool for exploring and visualizing 20 of the 33 TCGA data sets, it does not yet contain analysis of recent releases of TCGA and cannot be easily applied to data sets other than TCGA.

We developed a more flexible version, replicating capabilities of Regulome Explorer, as a Python notebook that uses Google Cloud resources. Rather than working with precomputed, fixed cohorts and fixed results, statistical analyses are dynamically performed in the cloud, with user defined patient cohorts. Moreover, the notebook can be extended so that users can analyze additional data sets available as part of the ‘ISB-CGC BigQuery ecosystem’ such as TCGA, TARGET, CCLE, COSMIC, and others. The notebook can be accessed in Regulome Explorer inspired notebook.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Machine Learning Notebooks

Machine learning methods have enabled researchers to leverage and integrate the vast amounts of diverse cancer data to reveal new insights, develop better diagnostics, and improve therapy. ISB-CGC offers examples of how to use Google Cloud resources to train and use machine learning models for a variety of cancer applications and datasets.

How to build an RNA-seq logistic regression classifier

Python

R

How to build an RNA-seq logistic regression classifier with BigQuery ML

Python

How to perform nearest centroid classification using BigQuery

Python

R

How to predict cancer survival with BigQuery ML

Python

How to predict cancer survival with TensorFlow

Python


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

HTAN Notebooks

HTAN is a National Cancer Institute (NCI)-funded Cancer Moonshot initiative to construct 3-dimensional atlases of the dynamic cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease (Cell April 2020).

Investigating HTAN scRNA-seq with BigQuery

Python

Explore HTAN single cell RNA seq data

Python

Explore HTAN Clinicial, Biospecimen, and Assay Metadata

Python

R


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Tutorials and How-To Guides

The links on this page connect to How-To guides, workshop materials, examples and other helpful tutorials. We encourage the community to provide feedback on these examples and also to add your own examples to enrich this public resource! Contact us at feedback@isb-cgc.org.

Video Tutorials

See the following page for ISB-CGC produced videos giving helpful tours through ISB-CGC:


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Query of the Month

Welcome to the ‘Query of the Month’ where we’ll be creating a collection of new and interesting queries to demonstrate the powerful combination of BigData from the NCI cancer programs like TCGA, and BigQuery from Google.

NOTE! We mostly spend time producing notebooks for our community collection. Check it out: https://github.com/isb-cgc/Community-Notebooks ReadTheDocs: https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowTos.html

Query of the Month is produced by the ISB-CGC team, with special effort by:

  • David L Gibbs (david.gibbs ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)

  • Kawther Abdilleh (kawther.abdilleh ( ~ at ~ ) gdit (~ dot ~) com)

  • Sheila M Reynolds (sheila.reynolds ( ~ at ~ ) systemsbiology ( ~ dot ~ ) org)


Table of Contents

2019
  • July2019: New notebooks added, cohorts and GEO data

  • June2019: Community Notebooks launched!

  • February2019: BigQuery in R - a refresher

  • January2019: Bam slicing in a cloud hosted python notebook.

2018
  • December2018: BigQuery Tips & Tricks

  • November2018: Transform VCF (DNA variants) files to BigQuery.

  • October2018: Jupyter notebooks & Dataproc clusters … in the cloud.

  • September2018: R scripts in the cloud.

  • August2018: Using BigQuery ML in a shiny app.

  • July2018: First look: BigQuery ML.

  • June2018: Processing bam files using WDL ‘scatter and gather’.

  • May2018: Processing bam files using CWL ‘scatter and gather’.

  • April2018: Running CWL workflows in the cloud.

  • March2018: Machine learning classifer in BigQuery?! Top Scoring Pairs implementation.

  • February2018: BioCircos shiny app, showing pairwise correlations within a pathway.

  • January2018: Gene Set Scoring in BigQuery, using the new hg38 mutation tables.

2017
  • December2017: BigQuery comparing TCGA samples to GTEx tissues with Spearman correlation.

  • November2017: Run an R (or python) script in batch mode using dsub on the google cloud.

  • October2017: Using plotly for visualization in Shiny apps. We implement an interatictive heatmap using heatmaply

  • September2017: We implement a new statistical test in BigQuery: the one-way ANOVA.

  • August2017: A small demo application using BigQuery as the backend for a Shiny app.

  • July2017: Look at the BigQuery RECORD data type in methylation tables from the GDC.

  • May2017: Continued from April: estimating the distance between samples based on shared mutations in pathways.

  • April2017: BigQuery compute a similarity metric on overlapping mutations between samples. Uses MC3 mutation table and data from COSMIC.

  • March2017: BigQuery to compute a pairwise distance matrix and a heatmap in R

  • February2017: Using BigQuery, define K-means clustering as a user defined (javascript) function

  • January2017: Comparing Standard SQL and Legacy SQL.

2016
  • December2016: Spearman correlation in BigQuery to compare the new hg38 expression data to the hg19 data

Importing a GDC File Manifest into ISB-CGC

If you’ve been using the National Cancer Institute’s Genomic Data Commons Portal, you know that while you can identify interesting cases and files, you need to download files to your own system in order to perform unique analysis.

Since the ISB-CGC stores Google Cloud file references for the GDC data, you can do your analysis on the cloud without having to move data. This tutorial will show you how to take a downloaded file manifest from the GDC, and use ISB-CGC to find the file locations on the cloud, providing a useful analysis starting point.

Download the File Manifest from GDC

On the GDC Data Portal, first use the selection filters to create your cohort. In the example shown below, the filters of Program: TCGA, Primary Site: kidney, Vital Status: dead and Gender: female were set to produce a cohort of 84 cases with 2332 files.

To download a File Manifest, which we’ll use to find the file locations in ISB-CGC, on the Repository screen, click on the Manifest button.

_images/GDC-KidneyExample.png

Import the File Manifest into Google BigQuery

Importing a GDC file manifest into its own BigQuery table will enable you to join that table with an ISB-CGC BigQuery table containing the file locations on the Google Cloud. Here’s how to do it.

If you don’t already have a Google Cloud Project, please see the following ISB-CGC documentation pages for guidance:

One way of keeping your file manifests organized is to create a data set specifically for those tables. New data sets can be created by clicking on the Create Dataset button within your project in BigQuery.

Creating a table from a GDC file manifest is remarkably easy:

  • Click on the Create Table button while you are within your new data set.

  • In the resulting screen, for Create table from, select Upload. Select your manifest file and set the File format to CSV. (Tab delimited will work with this setting.)

  • Have BigQuery automatically create the schema by checking the Auto detect box for Schema.

  • Click on Advanced options. Select Tab for Field delimiter; enter 1 for Header rows to skip.

  • Click on the Create Table button.

_images/BQ-CreateKidneyManifestTable.png

Find the file locations on the Google Cloud

Now that you have a table containing the GDC file identifiers, the next step is to find the locations for the Level 1 files on the Google Cloud. To help with that task, ISB-CGC maintains BigQuery tables that contain the GDC file identifier and the Google bucket location for the file in data set GDC_metadata. Adding the Google bucket location to our GDC information can be done via a simple SQL query:

SELECT gdc.*, isb.file_gdc_url
FROM `Your-project.GDC_Import.GDC_Kidney_File_manifest` as gdc,
     `isb-cgc.GDC_metadata.rel22_GDCfileID_to_GCSurl` as isb
WHERE gdc.id = isb.file_gdc_id

Note that you’ll need to replace “Your-project.GDC_Import.GDC_Kidney_File_manifest” with your project and the data set and table that you created above.

This query will return the results shown below and, as with any BigQuery result, you can either export it as a file or save it as a new table in BigQuery.

_images/BQ-Results-KidneyManifestURLTable.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Analysis Using BigQuery, R & Bioconductor

In this tutorial, we are interested in analyzing gene expression and protein abundance differences between two types of TCGA kidney cancers, Kidney Renal Clear Cell Carcinoma (KIRC) and Kidney Renal Papillary Carcinoma (KIRP). We will build a cohort of patients with these cancer types and extract their respective gene expression and protein abundance data using Google BigQuery.

Note

An interactive step-by-step version of this tutorial can also be found on our homepage here.

This tutorial demonstrates how to:

  • Identify tables of interest using the ISB-CGC BigQuery Table Search UI

  • Navigate to tables in the Google BigQuery Console directly from the ISB-CGC BigQuery Table Search

  • Build and run queries in the Google BigQuery Console

  • Link to R notebooks in the Google AI Platform for data interrogation and plot visualization

  • Use Bioconductor packages designed for TCGA data on ISB-CGC BigQuery tables

There are no prerequisites for using the ISB-CGC BigQuery Table Search, but in order to use the ISB-CGC tables in the Google Cloud BigQuery Console and within an R program, you’ll need to have a Google Cloud Platform project and have linked it to the ISB-CGC BigQuery tables. Please see these sections of the ISB-CGC documentation for guidance:

Click on the screenshots below to enlarge them.

Using the Google Cloud BigQuery Console

On the GCP BigQuery Console we can preview the table, look at the schema, and perform queries. The image below shows the preview of the contents of the TCGA Clinical BigQuery table.

_images/BQConsole-TCGA.png

Here’s a short SQL query (that completes in 0.3 seconds) which identifies how many patients there are with TCGA kidney cancers. Enter this SQL query in the BigQuery Console and click Run:

SELECT distinct (case_barcode)
FROM `isb-cgc.TCGA_bioclin_v0.clinical_v1`
WHERE project_short_name LIKE "TCGA-KIR%"
_images/BQConsole-Barcodes.png

Using a Google Cloud AI Platform R Notebook and Bioconductor

From here, we can use either R or Python to perform higher level analyses. In this example, we will be running an R notebook in the Google Cloud AI Platform Notebooks environment. If you prefer, you can run this example in a local R environment instead.

To use Google Cloud AI Platform Notebooks, from the Google Cloud Platform Navigation menu (on the left), select AI Platform -> Notebooks under the Artificial Intelligence section.

_images/GCP-AI-Platform.png

Notebooks can be created in both R or Python. We’ll create our notebook in R.

_images/GCP-Notebooks.png

The Google Cloud AI platform R notebook environment looks very similar to other Jupyter notebook environments. Users can create interactive R notebooks or simpler R console notebooks.

_images/GCP-R-Notebook.png

Enter or copy each block into the R terminal. Click Run after each block to see the results.

install.packages("bigrquery")
library(bigrquery)
project <- "your project" #Replace with your project name
# Query the clinical table for our cohort.
# Retrieve Age at Diagnosis and Clinical Stage for Kidney Cancer data.
sql <- "Select case_barcode, age_at_diagnosis, project_short_name, clinical_stage
        from `isb-cgc.TCGA_bioclin_v0.Clinical` as clin
        where project_short_name like 'TCGA-KIR%'"

clinical_tbl <- bq_project_query (project, query = sql) #Put data in temporary BQ table
clinical_data <- bq_table_download(clinical_tbl) #Put data into a dataframe
head(clinical_data)
_images/Clinical-dataframe.png
# Plot two histograms of age of diagnosis data of our cohort.
layout(matrix(1:2, 2, 1))
hist(clinical_data[clinical_data$project_short_name == "TCGA-KIRP",]$age_at_diagnosis,
    xlim=c(15,100), ylim=c(0,40), breaks=seq(15,100,2),
    col="#FFCC66", main='TCGA-KIRP', xlab='Age at diagnosis (years)')

hist(clinical_data[clinical_data$project_short_name == "TCGA-KIRC",]$age_at_diagnosis,
    xlim=c(15,100), ylim=c(0,40), breaks=seq(15,100,2),
    col="#99CCFF", main='TCGA-KIRC', xlab='Age at diagnosis (years)')
_images/Clinical-histograms.png
# Create SQL query to retrieve the mean gene expression and mean protein expression per project/case.
# Load it into a dataframe.
sql_expression <- "with gexp as (
    select project_short_name, case_barcode, gene_name, avg(HTSeq__FPKM) as mean_gexp
    from `isb-cgc.TCGA_hg38_data_v0.RNAseq_Gene_Expression`
    where project_short_name like 'TCGA-KIR%' and gene_type = 'protein_coding'
    group by project_short_name, case_barcode, gene_name
), pexp as (
    select project_short_name, case_barcode, gene_name, avg(protein_expression) as mean_pexp
    from `isb-cgc.TCGA_hg38_data_v0.Protein_Expression`
    where project_short_name like 'TCGA-KIR%'
    group by project_short_name, case_barcode, gene_name
)
select gexp.project_short_name, gexp.case_barcode, gexp.gene_name, gexp.mean_gexp, pexp.mean_pexp
from gexp inner join pexp
on gexp.project_short_name = pexp.project_short_name
  and gexp.case_barcode = pexp.case_barcode
  and gexp.gene_name = pexp.gene_name"

expression_data <- bq_table_download(bq_project_query (project, query = sql_expression)) #Put data into a dataframe
head(expression_data)
_images/Expression-dataframe.png
# Determine the number of cases from each project.
length(unique(expression_data$case_barcode[expression_data$project_short_name == "TCGA-KIRP"]))
length(unique(expression_data$case_barcode[expression_data$project_short_name == "TCGA-KIRC"]))
_images/Num-cases.png
#Create a dataframe that lists all the cases.
expression_data$id <- paste(expression_data$project_short_name, expression_data$case_barcode, sep='.')
cases <- unique(expression_data$id)

# Transform the expression_data data frame, so that columns are samples, rows are genes.
list_exp <- lapply(cases, function(case){
    temp <- expression_data[expression_data$id == case, c('gene_name', 'mean_gexp')]
    names(temp) <- c('gene_name', case)
    return(temp)
})

gene_exps <- Reduce(function(x, y) merge(x, y, all=T, by="gene_name"), list_exp)
head(gene_exps)
dim(gene_exps)
_images/gene-exp-dataframe.png
# Perform the same transform for protein abundance.
  list_abun <- lapply(cases, function(case){
      temp <- expression_data[expression_data$id == case, c('gene_name', 'mean_pexp')]
      names(temp) <- c('gene_name', case)
      return(temp)
  })
  pep_abun <- Reduce(function(x, y) merge(x, y, all=T, by="gene_name"), list_abun)
  head(pep_abun)
  dim(pep_abun)
_images/pep-abun-dataframe.png
# Separate the cohorts (types of kidney cancer) into two dataframes and
# generate a scatterplot of gene expression and protein abundance.
# Gene expression first.
exp_p <- gene_exps[,grep('KIRP', names(gene_exps))]
exp_c <- gene_exps[,grep('KIRC', names(gene_exps))]
plot(log(rowMeans(exp_p)), log(rowMeans(exp_c)),
    xlab='log(FPKM KIRP)', ylab='log(FPKM KIRC)',
    xlim=c(-3.5,7.5), ylim=c(-3.5,7.5), pch=19, cex=2,
    col=rgb(178,34,34,max=255,alpha=150))
_images/gene-scatterplot.png
# Peptide expression second.
abun_p <- pep_abun[,grep('KIRP', names(pep_abun))]
abun_c <- pep_abun[,grep('KIRC', names(pep_abun))]
plot(rowMeans(abun_p), rowMeans(abun_c),
   xlab='KIRP protein abundance', ylab="KIRC protein abundance",
   xlim=c(-0.25,0.3), ylim=c(-0.25,0.3), pch=19, cex=2,
   col=rgb(140,140,230,max=255,alpha=150))
_images/peptide-scatterplot.png
# Load the Bioconductor package maftools, which has capabilities to summarize,
# analyze and visualize Mutation Annotation Format (MAF) data.
install.packages("maftools")
library("maftools")
# Use BigQuery to load TCGA somatic mutation data for our cancers of interest.
sql_kirc<-"SELECT Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele,
Tumor_Seq_Allele2, Variant_Classification, Variant_Type, sample_barcode_tumor FROM
`isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation` WHERE project_short_name = 'TCGA-KIRC'"

sql_kirp<-"SELECT Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele,
Tumor_Seq_Allele2, Variant_Classification, Variant_Type, sample_barcode_tumor FROM
`isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation` WHERE project_short_name = 'TCGA-KIRP'"

maf_kirc <- bq_table_download(bq_project_query (project, query = sql_kirc)) #Put data into a dataframe
maf_kirp <- bq_table_download(bq_project_query (project, query = sql_kirp)) #Put data into a dataframe

#Rename column 9 to the field name required by maftools.
colnames(maf_kirc)[9] <- "Tumor_Sample_Barcode"
colnames(maf_kirp)[9] <- "Tumor_Sample_Barcode"

head(maf_kirc)
head(maf_kirp)
_images/somatic-mutation-dataframes.png
# Convert data frames to maftools objects.
kirc <- read.maf(maf_kirc)
kirp <- read.maf(maf_kirp)
# Leverage maftools plotting functionality.
plotmafSummary(maf = kirp, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
plotmafSummary(maf = kirc, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)

Here is the MAF Plot Summary for Kidney Renal Papillary Carcinoma.

_images/plotmafSummary-kirp.png
oncoplot(maf = kirp, top = 10)
oncoplot(maf = kirc, top = 10)

Here is the oncoplot for Kidney Renal Papillary Carcinoma.

_images/oncoplot-kirp.png

Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Teaching Sessions and Workshops

NIH Library Session - October 14th, 2021

We offered a half-day online bioinformatics workshop in collaboration with the NIH Library on October 14th, 2021. This workshop included a two hour interactive data science and bioinformatics component using the R statistical language and Google Cloud (BigQuery) to explore NCI genomic and proteomics (TCGA) datasets. The following outline of the interactive workshop links to Jupyter notebooks used during the training. These notebooks can be executed in Google Colab or other Jupyter environments.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Release Notes

The ISB-CGC has created documentation to inform researchers about major changes between the ISB-CGC Data Releases, ISB-CGC Table Search, and the ISB-CGC WebApp. For more information, please select one of the options below.

ISB-CGC Data Release Notes

June 23, 2022

HTAN data added

BigQuery tables created

  • isb-cgc-bq.HTAN_versioned.scRNAseq_CHOP_seurat_regrCycleHeatShockGenes_pool_18Infants_scRNA_VEG3000_updated_rename_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_CHOP_seurat_pool_logNorm_gini_FiveHD_10Xv3_downsample10000HSPC_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_CHOP_seurat_integrated_18MLLr_normal_final_rename_r2

  • isb-cgc-bq.HTAN_versioned.schema_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_VUMC_HTAN_VAL_EPI_V2_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_VUMC_HTAN_VAL_DIS_NONEPI_V2_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_VUMC_HTAN_DIS_EPI_V2_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_VUMC_ABNORMALS_EPI_V2_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_therapy_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_level4_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_level3_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_level2_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scRNAseq_level1_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scATACseq_level4_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scATACseq_level3_metadata_r2

  • isb-cgc-bq.HTAN_versioned.scATACseq_level1_metadata_r2

  • isb-cgc-bq.HTAN_versioned.srrs_imaging_level2_metadata_r2

  • isb-cgc-bq.HTAN_versioned.srrs_clinical_tier2_r2

  • isb-cgc-bq.HTAN_versioned.srrs_biospecimen_r2

  • isb-cgc-bq.HTAN_versioned.proteomics_metadata_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_moleculartest_r2`

  • isb-cgc-bq.HTAN_versioned.metabolomics_metadata_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier3_lung_r2

  • isb-cgc-bq.HTAN_versioned.lipidomics_metadata_r2

  • isb-cgc-bq.HTAN_versioned.imaging_level2_metadata_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_followup_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_familyhistory_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_exposure_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_diagnosis_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier1_demographics_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier2_r2

  • isb-cgc-bq.HTAN_versioned.bulkWES_level2_metadata_r2

  • isb-cgc-bq.HTAN_versioned.bulkWES_level1_metadata_r2

  • isb-cgc-bq.HTAN_versioned.bulkRNAseq_level3_metadata_r2

  • isb-cgc-bq.HTAN_versioned.bulkRNAseq_level2_metadata_r2

  • isb-cgc-bq.HTAN_versioned.bulkRNAseq_level1_metadata_r2

  • isb-cgc-bq.HTAN_versioned.clinical_tier3_breast_r2

  • isb-cgc-bq.HTAN_versioned.biospecimen_r2

  • isb-cgc-bq.HTAN.scRNAseq_CHOP_seurat_regrCycleHeatShockGenes_pool_18Infants_scRNA_VEG3000_updated_rename_current

  • isb-cgc-bq.HTAN.scRNAseq_CHOP_seurat_pool_logNorm_gini_FiveHD_10Xv3_downsample10000HSPC_current

  • isb-cgc-bq.HTAN.scRNAseq_CHOP_seurat_integrated_18MLLr_normal_final_rename_current

  • isb-cgc-bq.HTAN.schema_current

  • isb-cgc-bq.HTAN.scRNAseq_VUMC_HTAN_VAL_EPI_V2_current

  • isb-cgc-bq.HTAN.scRNAseq_VUMC_HTAN_VAL_DIS_NONEPI_V2_current

  • isb-cgc-bq.HTAN.scRNAseq_VUMC_HTAN_DIS_EPI_V2_current

  • isb-cgc-bq.HTAN.scRNAseq_VUMC_ABNORMALS_EPI_V2_current

  • isb-cgc-bq.HTAN.clinical_tier1_therapy_current

  • isb-cgc-bq.HTAN.scRNAseq_level4_metadata_current

  • isb-cgc-bq.HTAN.scRNAseq_level3_metadata_current

  • isb-cgc-bq.HTAN.scRNAseq_level2_metadata_current

  • isb-cgc-bq.HTAN.scRNAseq_level1_metadata_current

  • isb-cgc-bq.HTAN.scATACseq_level4_metadata_current

  • isb-cgc-bq.HTAN.scATACseq_level3_metadata_current

  • isb-cgc-bq.HTAN.scATACseq_level1_metadata_current

  • isb-cgc-bq.HTAN.srrs_imaging_level2_metadata_current

  • isb-cgc-bq.HTAN.srrs_clinical_tiecurrent_current

  • isb-cgc-bq.HTAN.srrs_biospecimen_current

  • isb-cgc-bq.HTAN.proteomics_metadata_current

  • isb-cgc-bq.HTAN.clinical_tier1_moleculartest_current

  • isb-cgc-bq.HTAN.metabolomics_metadata_current

  • isb-cgc-bq.HTAN.clinical_tier3_lung_current

  • isb-cgc-bq.HTAN.lipidomics_metadata_current

  • isb-cgc-bq.HTAN.imaging_level2_metadata_current

  • isb-cgc-bq.HTAN.clinical_tier1_followup_current

  • isb-cgc-bq.HTAN.clinical_tier1_familyhistory_current

  • isb-cgc-bq.HTAN.clinical_tier1_exposure_current

  • isb-cgc-bq.HTAN.clinical_tier1_diagnosis_current

  • isb-cgc-bq.HTAN.clinical_tier1_demographics_current

  • isb-cgc-bq.HTAN.clinical_tier2_current

  • isb-cgc-bq.HTAN.bulkWES_level2_metadata_current

  • isb-cgc-bq.HTAN.bulkWES_level1_metadata_current

  • isb-cgc-bq.HTAN.bulkRNAseq_level3_metadata_current

  • isb-cgc-bq.HTAN.bulkRNAseq_level2_metadata_current

  • isb-cgc-bq.HTAN.bulkRNAseq_level1_metadata_current

  • isb-cgc-bq.HTAN.clinical_tier3_breast_current

  • isb-cgc-bq.HTAN.biospecimen_current

June 15, 2022

New clinical tables added to isb-cgc-bq for GDC release 33.

BigQuery tables created

  • isb-cgc-bq.TRIO.clinical_gdc_current

  • isb-cgc-bq.TRIO_versioned.clinical_gdc_r33

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_r33

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r33

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_gdc_r33

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_treatments_gdc_r33

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_gdc_r33

  • isb-cgc-bq.CTSP_versioned.clinical_gdc_r33

  • isb-cgc-bq.CMI_versioned.clinical_gdc_r33

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r33

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_gdc_r33

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_treatments_gdc_r33

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_gdc_r33

  • isb-cgc-bq.CGCI_versioned.clinical_gdc_r33

  • isb-cgc-bq.MP2PRT.clinical_gdc_current

  • isb-cgc-bq.MP2PRT_versioned.clinical_gdc_r33

  • isb-cgc-bq.EXC_RESPONDERS.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.EXC_RESPONDERS_versioned.clinical_diagnoses_treatments_gdc_r33

  • isb-cgc-bq.EXC_RESPONDERS.clinical_diagnoses_gdc_current

  • isb-cgc-bq.EXC_RESPONDERS_versioned.clinical_diagnoses_gdc_r33

  • isb-cgc-bq.EXC_RESPONDERS.clinical_gdc_current

  • isb-cgc-bq.EXC_RESPONDERS_versioned.clinical_gdc_r33

BigQuery tables updated

  • isb-cgc-bq.TARGET.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CTSP.clinical_gdc_current

  • isb-cgc-bq.CMI.clinical_gdc_current

  • isb-cgc-bq.CGCI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CGCI.clinical_gdc_current

May 5, 2022

New file metadata tables added to isb-cgc-bq for GDC release 32.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r32

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r32

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r32

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r32

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r32

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

April 28, 2022

Cluster these TCGA DNA methylation and TCGA RNAseq tables to improve query performance.

BigQuery tables created

  • isb-cgc-bq.TCGA.DNA_methylation_hg19_gdc_current

  • isb-cgc-bq.TCGA.DNA_methylation_hg38_gdc_current

  • isb-cgc-bq.TCGA.RNAseq_hg19_gdc_current

  • isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current

BigQuery tables updated

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg38_gdc_current

February 10, 2022

New clinical tables added to isb-cgc-bq for GDC release 31.

BigQuery tables created

  • isb-cgc-bq.REBC_versioned.clinical_gdc_r31

  • isb-cgc-bq.REBC.clinical_gdc_current

  • isb-cgc-bq.REBC_versioned.clinical_diagnoses_treatments_gdc_r31

  • isb-cgc-bq.REBC.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.TRIO_versioned.clinical_gdc_r31

  • isb-cgc-bq.TRIO.clinical_gdc_current

  • isb-cgc-bq.BEATAML1_0_versioned.clinical_gdc_r31

  • isb-cgc-bq.CGCI_versioned.clinical_gdc_r31

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_gdc_r31

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_treatments_gdc_r31

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_gdc_r31

  • isb-cgc-bq.CPTAC_versioned.clinical_gdc_r31

  • isb-cgc-bq.CTSP_versioned.clinical_gdc_r31

  • isb-cgc-bq.FM_versioned.clinical_gdc_r31

  • isb-cgc-bq.GENIE_versioned.clinical_gdc_r31

  • isb-cgc-bq.HCMI_versioned.clinical_gdc_r31

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_gdc_r31

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_treatments_gdc_r31

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_gdc_r31

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r31

  • isb-cgc-bq.MMRF_versioned.clinical_gdc_r31

  • isb-cgc-bq.NCICCR_versioned.clinical_gdc_r31

  • isb-cgc-bq.OHSU_versioned.clinical_gdc_r31

  • isb-cgc-bq.ORGANOID_versioned.clinical_gdc_r31

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_r31

  • isb-cgc-bq.TCGA_versioned.clinical_gdc_r31

  • isb-cgc-bq.VAREPOP_versioned.clinical_gdc_r31

  • isb-cgc-bq.WCDT_versioned.clinical_gdc_r31

Current clinical tables updated to GDC release 31.

BigQuery tables updated

  • isb-cgc-bq.BEATAML1_0.clinical_gdc_current

  • isb-cgc-bq.CGCI.clinical_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.CPTAC.clinical_gdc_current

  • isb-cgc-bq.CTSP.clinical_gdc_current

  • isb-cgc-bq.FM.clinical_gdc_current

  • isb-cgc-bq.GENIE.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.MMRF.clinical_gdc_current

  • isb-cgc-bq.NCICCR.clinical_gdc_current

  • isb-cgc-bq.OHSU.clinical_gdc_current

  • isb-cgc-bq.ORGANOID.clinical_gdc_current

  • isb-cgc-bq.TARGET.clinical_gdc_current

  • isb-cgc-bq.TCGA.clinical_gdc_current

  • isb-cgc-bq.VAREPOP.clinical_gdc_current

  • isb-cgc-bq.WCDT.clinical_gdc_current

February 2, 2022

New tables for Synthetic Lethality.

BigQuery tables created

  • isb-cgc-bq.annotations.gene_info_human_NCBI_current

  • isb-cgc-bq.annotations.gene2ensembl_human_NCBI_current

  • isb-cgc-bq.annotations.gene2refseq_human_NCBI_current

  • isb-cgc-bq.annotations.Human2Yeast_mapping_Alliance_for_Genome_Resources_current

  • isb-cgc-bq.annotations.Yeast2Human_mapping_Alliance_for_Genome_Resources_current

  • isb-cgc-bq.annotations_versioned.gene_info_human_NCBI_2020_07

  • isb-cgc-bq.annotations_versioned.gene2ensembl_human_NCBI_2020_07

  • isb-cgc-bq.annotations_versioned.gene2refseq_human_NCBI_2020_07

  • isb-cgc-bq.annotations_versioned.Human2Yeast_mapping_Alliance_for_Genome_Resources_R3_0_1

  • isb-cgc-bq.annotations_versioned.Yeast2Human_mapping_Alliance_for_Genome_Resources_R3_0_1

  • isb-cgc-bq.DEPMAP.Achilles_gene_effect_DepMapPublic_current

  • isb-cgc-bq.DEPMAP.CCLE_gene_cn_DepMapPublic_current

  • isb-cgc-bq.DEPMAP.CCLE_gene_expression_DepMapPublic_current

  • isb-cgc-bq.DEPMAP.CCLE_mutation_DepMapPublic_current

  • isb-cgc-bq.DEPMAP.CCLE_SomaticMutation_DEMETER2_current

  • isb-cgc-bq.DEPMAP.Combined_gene_dep_score_DEMETER2_current

  • isb-cgc-bq.DEPMAP.RNAseq_IRPKM_DEMETER2_current

  • isb-cgc-bq.DEPMAP.Sample_Info_DEMETER2_current

  • isb-cgc-bq.DEPMAP.sample_info_DepMapPublic_current

  • isb-cgc-bq.DEPMAP.WES_SNP_CN_DEMETER2_current

  • isb-cgc-bq.DEPMAP_versioned.Achilles_gene_effect_DepMapPublic_20Q3

  • isb-cgc-bq.DEPMAP_versioned.CCLE_gene_cn_DepMapPublic_20Q3

  • isb-cgc-bq.DEPMAP_versioned.CCLE_gene_expression_DepMapPublic_20Q3

  • isb-cgc-bq.DEPMAP_versioned.CCLE_mutation_DepMapPublic_20Q3

  • isb-cgc-bq.DEPMAP_versioned.CCLE_SomaticMutation_DEMETER2_v6

  • isb-cgc-bq.DEPMAP_versioned.Combined_gene_dep_score_DEMETER2_v6

  • isb-cgc-bq.DEPMAP_versioned.RNAseq_IRPKM_DEMETER2_v6

  • isb-cgc-bq.DEPMAP_versioned.Sample_Info_DEMETER2_v6

  • isb-cgc-bq.DEPMAP_versioned.sample_info_DepMapPublic_20Q3

  • isb-cgc-bq.DEPMAP_versioned.WES_SNP_CN_DEMETER2_v6

  • isb-cgc-bq.supplementary_tables.Bailey_etal_Cell_2018_cancer_driver_genes

  • isb-cgc-bq.supplementary_tables.Constanzo_etal_Science_2016_SGA_Genetic_Interactions

  • isb-cgc-bq.synthetic_lethality.gene_info_human_HGNC_NCBI_2020_07

  • isb-cgc-bq.synthetic_lethality.sample_info_TCGAlabels_DepMapPublic_20Q3

January 26, 2022

New GENCODE annotation tables.

BigQuery tables created

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v39

BigQuery tables updated

  • isb-cgc-bq.GENCODE.annotation_gtf_hg38_current

January 13, 2022

New TCGA Radiology Images tables.

BigQuery tables created

  • isb-cgc-bq.TCGA_versioned.radiology_images_tcia_2022_01

BigQuery tables updated

  • isb-cgc-bq.TCGA.radiology_images_tcia_current

December 7, 2021

New per sample file metadata added to isb-cgc-bq for GDC release 30.

BigQuery tables created

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.REBC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.REBC_versioned.per_sample_file_metadata_hg38_gdc_r30

  • isb-cgc-bq.TRIO.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TRIO_versioned.per_sample_file_metadata_hg38_gdc_r30

Current per sample file metadata tables updated to GDC release 30.

BigQuery tables updated

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_current

New Datasets REBC, REBC_versioned, TRIO, and TRIO_versioned were created.

November 3, 2021 and December 3, 2021

New file metadata tables added to isb-cgc-bq for GDC release 30.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r30

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r30

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r30

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r30

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r30

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r30

BigQuery tables updated

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

October 19, 2021

New Pan-Cancer Atlas Clinical and Survival Data

BigQuery table created

  • isb-cgc-bq.pancancer_atlas.TCGA_Clinical_Data_Resource_Extra

October 1, 2021

New Targetome datasets and tables added to isb-cgc-bq.

BigQuery tables created

  • isb-cgc-bq.targetome.drug_synonyms_current

  • isb-cgc-bq.targetome.experiments_current

  • isb-cgc-bq.targetome.interactions_current

  • isb-cgc-bq.targetome.sources_current

  • isb-cgc-bq.targetome.target_synonyms_current

  • isb-cgc-bq.targetome_versioned.drug_synonyms_v1

  • isb-cgc-bq.targetome_versioned.experiments_v1

  • isb-cgc-bq.targetome_versioned.interactions_v1

  • isb-cgc-bq.targetome_versioned.sources_v1

  • isb-cgc-bq.targetome_versioned.target_synonyms_v1

September 22, 2021

BigQuery tables created

New Copy Number Segment tables added to isb-cgc-bq.

  • isb-cgc-bq.CGCI.copy_number_segment_hg38_gdc_current

  • isb-cgc-bq.CGCI_versioned.copy_number_segment_hg38_gdc_r27

  • isb-cgc-bq.CPTAC.copy_number_segment_hg38_gdc_current

  • isb-cgc-bq.CPTAC_versioned.copy_number_segment_hg38_gdc_r28

  • isb-cgc-bq.HCMI.copy_number_segment_hg38_gdc_current

  • isb-cgc-bq.HCMI_versioned.copy_number_segment_hg38_gdc_r29

  • isb-cgc-bq.TARGET.copy_number_segment_allelic_hg38_gdc_current

  • isb-cgc-bq.TARGET_versioned.copy_number_segment_allelic_hg38_gdc_r23

  • isb-cgc-bq.TCGA.copy_number_segment_allelic_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.copy_number_segment_allelic_hg38_gdc_r23

September 3, 2021

BigQuery tables created

  • isb-cgc-bq.CPTAC_versioned.masked_somatic_mutation_hg38_gdc_r28

BigQuery tables updated

  • isb-cgc-bq.CPTAC.masked_somatic_mutation_hg38_gdc_current

September 1, 2021

New Reactome datasets and tables added to isb-cgc-bq.

BigQuery tables created

  • isb-cgc-bq.reactome.pathway_current

  • isb-cgc-bq.reactome.physical_entity_current

  • isb-cgc-bq.reactome.pe_to_pathway_current

  • isb-cgc-bq.reactome.pathway_hierarchy_current

  • isb-cgc-bq.reactome_versioned.pathway_v77

  • isb-cgc-bq.reactome_versioned.physical_entity_v77

  • isb-cgc-bq.reactome_versioned.pe_to_pathway_v77

  • isb-cgc-bq.reactome_versioned.pathway_hierarchy_v77

Added release 28 miRNAseq isoform table and RNAseq for TCGA

BigQuery tables created

  • isb-cgc-bq.TCGA_versioned.miRNAseq_isoform_hg38_gdc_r28

  • isb-cgc-bq.TCGA_versioned.miRNAseq_hg38_gdc_r28

  • isb-cgc-bq.TCGA_versioned.RNAseq_hg38_gdc_r28

BigQuery tables updated

  • isb-cgc-bq.TCGA.miRNAseq_isoform_hg38_gdc_current

  • isb-cgc-bq.TCGA.miRNAseq_hg38_gdc_current

  • isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current

August 2, 2021

New study, case metadata, file metadata, clinical, project-level per-sample file, and protein abundance log2ratio (quant) tables added to isb-cgc-bq for PDC V1.21.

BigQuery tables created

  • isb-cgc-bq.CBTTC_versioned.quant_phosphoproteome_pediatric_brain_cancer_pilot_study_pdc_V1_21

  • isb-cgc-bq.CBTTC_versioned.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC3_other_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.clinical_proteogenomic_translational_research_centers_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC2_other_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC3_other_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_proteogenomic_translational_research_centers_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_GBM_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_LUAD_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_UCEC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_prospective_breast_BI_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_glycoproteome_prospective_ovarian_JHU_N_linked_glycosite_containing_peptide_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_CCRCC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_GBM_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_HNSCC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_LUAD_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_UCEC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_breast_BI_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_colon_PNNL_lumos_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_ovarian_PNNL_lumos_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_GBM_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_LUAD_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_UCEC_discovery_study_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_breast_BI_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_colon_PNNL_qeplus_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_JHU_pdc_V1_21

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_V1_21

  • isb-cgc-bq.ICPC_versioned.quant_phosphoproteome_HBV_related_hepatocellular_carcinoma_pdc_V1_21

  • isb-cgc-bq.ICPC_versioned.quant_phosphoproteome_proteogenomics_of_gastric_cancer_pdc_V1_21

  • isb-cgc-bq.ICPC_versioned.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_V1_21

  • isb-cgc-bq.ICPC_versioned.quant_proteome_proteogenomics_of_gastric_cancer_pdc_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.aliquot_to_case_mapping_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.case_metadata_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.file_associated_entity_mapping_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.file_metadata_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.gene_info_V1_21

  • isb-cgc-bq.PDC_metadata_versioned.studies_V1_21

  • isb-cgc-bq.TCGA_versioned.clinical_CPTAC_TCGA_pdc_V1_21

  • isb-cgc-bq.TCGA_versioned.quant_phosphoproteome_TCGA_breast_cancer_pdc_V1_21

  • isb-cgc-bq.TCGA_versioned.quant_phosphoproteome_TCGA_ovarian_PNNL_velos_qexactive_pdc_V1_21

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_breast_cancer_pdc_V1_21

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_JHU_pdc_V1_21

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_PNNL_pdc_V1_21

BigQuery tables updated

  • isb-cgc-bq.CBTTC.quant_phosphoproteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CBTTC.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC3_other_pdc_current

  • isb-cgc-bq.CPTAC.clinical_proteogenomic_translational_research_centers_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC2_other_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC3_other_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_proteogenomic_translational_research_centers_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_glycoproteome_prospective_ovarian_JHU_N_linked_glycosite_containing_peptide_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_HNSCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_colon_PNNL_lumos_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_ovarian_PNNL_lumos_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_colon_PNNL_qeplus_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_JHU_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_current

  • isb-cgc-bq.ICPC.quant_phosphoproteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_phosphoproteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.PDC_metadata.aliquot_to_case_mapping_current

  • isb-cgc-bq.PDC_metadata.case_metadata_current

  • isb-cgc-bq.PDC_metadata.file_associated_entity_mapping_current

  • isb-cgc-bq.PDC_metadata.file_metadata_current

  • isb-cgc-bq.PDC_metadata.gene_info_current

  • isb-cgc-bq.PDC_metadata.studies_current

  • isb-cgc-bq.TCGA.clinical_CPTAC_TCGA_pdc_current

  • isb-cgc-bq.TCGA.quant_phosphoproteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA.quant_phosphoproteome_TCGA_ovarian_PNNL_velos_qexactive_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_JHU_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_PNNL_pdc_current

July 14, 2021

Added release 28 miRNAseq isoform table for CPTAC

BigQuery tables created

  • isb-cgc-bq.CPTAC_versioned.miRNAseq_isoform_hg38_gdc_r28

  • isb-cgc-bq.CPTAC.miRNAseq_isoform_hg38_gdc_current

June 21, 2021

Updated the release 28 CPTAC miRNAseq tables to include the sample_type_name field

BigQuery tables created

  • isb-cgc-bq.CPTAC_versioned.miRNAseq_hg38_gdc_r28_v2

BigQuery tables updated

  • isb-cgc-bq.CPTAC.miRNAseq_hg38_gdc_current

June 18, 2021

New study, case metadata, file metadata, clinical, project-level per-sample file, and protein abundance log2ratio (quant) tables added to isb-cgc-bq for PDC V1.19.

BigQuery tables created

  • isb-cgc-bq.CBTTC_versioned.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC2_other_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC3_other_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_glycoproteome_prospective_ovarian_JHU_N_linked_glycosite_containing_peptide_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_GBM_discovery_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_LUAD_discovery_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_UCEC_discovery_study_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_breast_BI_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_colon_PNNL_qeplus_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_JHU_pdc_V1_19

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_V1_19

  • isb-cgc-bq.ICPC_versioned.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_V1_19

  • isb-cgc-bq.ICPC_versioned.quant_proteome_proteogenomics_of_gastric_cancer_pdc_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.aliquot_to_case_mapping_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.case_metadata_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.file_associated_entity_mapping_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.file_metadata_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.gene_info_V1_19

  • isb-cgc-bq.PDC_metadata_versioned.refseq_mapping_2021_03

  • isb-cgc-bq.PDC_metadata_versioned.studies_V1_19

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_breast_cancer_pdc_V1_19

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_JHU_pdc_V1_19

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_PNNL_pdc_V1_19

BigQuery tables updated

  • isb-cgc-bq.CBTTC.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC2_other_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC3_other_pdc_current

  • isb-cgc-bq.CPTAC.quant_glycoproteome_prospective_ovarian_JHU_N_linked_glycosite_containing_peptide_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_colon_PNNL_qeplus_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_JHU_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.PDC_metadata.aliquot_to_case_mapping_current

  • isb-cgc-bq.PDC_metadata.case_metadata_current

  • isb-cgc-bq.PDC_metadata.file_associated_entity_mapping_current

  • isb-cgc-bq.PDC_metadata.file_metadata_current

  • isb-cgc-bq.PDC_metadata.gene_info_current

  • isb-cgc-bq.PDC_metadata.refseq_mapping_current

  • isb-cgc-bq.PDC_metadata.studies_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_JHU_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_PNNL_pdc_current

June 10, 2021

New study and project-level per sample file metadata tables added to isb-cgc-bq for PDC V1.17.

BigQuery tables created

  • isb-cgc-bq.PDC_metadata_versioned.studies_V1_17

  • isb-cgc-bq.CBTTC_versioned.per_sample_file_metadata_pediatric_brain_cancer_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC_2_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC2_other_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC3_discovery_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_CPTAC3_other_pdc_V1_17

  • isb-cgc-bq.GPRP_versioned.per_sample_file_metadata_georgetown_lung_cancer_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.per_sample_file_metadata_academia_sinica_LUAD_100_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.per_sample_file_metadata_HBV_related_hepatocellular_carcinoma_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.per_sample_file_metadata_human_early_onset_gastric_cancer_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.per_sample_file_metadata_oral_squamous_cell_carcinoma_pdc_V1_17

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies_versioned.per_sample_file_metadata_pct_swath_kidney_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_CPTAC_TCGA_pdc_V1_17

  • isb-cgc-bq.PDC_metadata.studies_current

  • isb-cgc-bq.CBTTC.per_sample_file_metadata_pediatric_brain_cancer_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC_2_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC2_other_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC3_discovery_pdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_CPTAC3_other_pdc_current

  • isb-cgc-bq.GPRP.per_sample_file_metadata_georgetown_lung_cancer_pdc_current

  • isb-cgc-bq.ICPC.per_sample_file_metadata_academia_sinica_LUAD_100_pdc_current

  • isb-cgc-bq.ICPC.per_sample_file_metadata_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.per_sample_file_metadata_human_early_onset_gastric_cancer_pdc_current

  • isb-cgc-bq.ICPC.per_sample_file_metadata_oral_squamous_cell_carcinoma_pdc_current

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies.per_sample_file_metadata_pct_swath_kidney_pdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_CPTAC_TCGA_pdc_current

May 28, 2021

New per sample file metadata added to isb-cgc-bq for GDC release 29.

BigQuery tables created

  • isb-cgc-bq.CMI_versioned.per_sample_file_metadata_hg38_gdc_r29

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r29

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r29

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r29

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r29

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r29

Current per sample file metadata tables updated to GDC release 29.

BigQuery tables updated

  • isb-cgc-bq.CMI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CGCI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.HCMI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg19_gdc_current

May 27, 2021

New controlled-access VCF tables.

BigQuery tables created

  • isb-cgc-cbq.VAREPOP_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.VAREPOP.vcf_hg38_gdc_current

  • isb-cgc-cbq.TCGA_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.TCGA.vcf_hg38_gdc_current

  • isb-cgc-cbq.ORGANOID_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.ORGANOID.vcf_hg38_gdc_current

  • isb-cgc-cbq.MMRF_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.MMRF.vcf_hg38_gdc_current

  • isb-cgc-cbq.HCMI_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.HCMI.vcf_hg38_gdc_current

  • isb-cgc-cbq.FM_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.FM.vcf_hg38_gdc_current

May 26, 2021

New case metadata, file metadata, clinical, and quant data (for actylome, glycoproteome, phosphoproteome, and proteome) added to isb-cgc-bq from PDC V1.17.

BigQuery tables created

  • isb-cgc-bq.CBTTC_versioned.clinical_diagnoses_pediatric_brain_cancer_pdc_V1_17

  • isb-cgc-bq.CBTTC_versioned.clinical_pediatric_brain_cancer_pdc_V1_17

  • isb-cgc-bq.CBTTC_versioned.quant_phosphoproteome_pediatric_brain_cancer_pilot_study_pdc_V1_17

  • isb-cgc-bq.CBTTC_versioned.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC_2_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC2_other_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC3_discovery_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC3_other_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_GBM_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_LUAD_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_CPTAC_UCEC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_acetylome_prospective_breast_BI_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_glycoproteome_prospective_ovarian_JHU_n_linked_glycosite_containing_peptide_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_CCRCC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_GBM_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_HNSCC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_LUAD_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_CPTAC_UCEC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_breast_BI_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_colon_PNNL_lumos_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_phosphoproteome_prospective_ovarian_PNNL_lumos_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_GBM_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_LUAD_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_UCEC_discovery_study_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_breast_BI_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_colon_PNNL_qeplus_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_JHU_pdc_V1_17

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_V1_17

  • isb-cgc-bq.GPRP_versioned.clinical_georgetown_lung_cancer_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.clinical_academia_sinica_LUAD_100_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.clinical_HBV_related_hepatocellular_carcinoma_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.clinical_human_early_onset_gastric_cancer_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.clinical_oral_squamous_cell_carcinoma_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.quant_phosphoproteome_HBV_related_hepatocellular_carcinoma_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.quant_phosphoproteome_proteogenomics_of_gastric_cancer_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_V1_17

  • isb-cgc-bq.ICPC_versioned.quant_proteome_proteogenomics_of_gastric_cancer_pdc_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.aliquot_to_case_mapping_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.case_metadata_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.file_associated_entity_mapping_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.file_metadata_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.gene_info_V1_17

  • isb-cgc-bq.PDC_metadata_versioned.refseq_mapping_2021_02

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies_versioned.clinical_pct_swath_kidney_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.clinical_CPTAC_TCGA_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.quant_phosphoproteome_TCGA_breast_cancer_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.quant_phosphoproteome_TCGA_ovarian_PNNL_velos_qexactive_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_breast_cancer_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_JHU_pdc_V1_17

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_PNNL_pdc_V1_17

  • isb-cgc-bq.CBTTC.quant_phosphoproteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC2_other_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC3_other_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_acetylome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_glycoproteome_prospective_ovarian_JHU_n_linked_glycosite_containing_peptide_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_HNSCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_colon_PNNL_lumos_pdc_current

  • isb-cgc-bq.CPTAC.quant_phosphoproteome_prospective_ovarian_PNNL_lumos_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_GBM_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_HNSCC_discovery_study_pdc_current

  • isb-cgc-bq.ICPC.quant_phosphoproteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_phosphoproteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.PDC_metadata.gene_info_current

  • isb-cgc-bq.PDC_metadata.refseq_mapping_current

  • isb-cgc-bq.TCGA.quant_phosphoproteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA.quant_phosphoproteome_TCGA_ovarian_PNNL_velos_qexactive_pdc_current

BigQuery tables updated

  • isb-cgc-bq.CBTTC.clinical_diagnoses_pediatric_brain_cancer_pdc_current

  • isb-cgc-bq.CBTTC.clinical_pediatric_brain_cancer_pdc_current

  • isb-cgc-bq.CBTTC.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC_2_pdc_current

  • isb-cgc-bq.CPTAC.clinical_CPTAC3_discovery_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_colon_PNNL_qeplus_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_JHU_pdc_current

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_current

  • isb-cgc-bq.GPRP.clinical_georgetown_lung_cancer_pdc_current

  • isb-cgc-bq.ICPC.clinical_academia_sinica_LUAD_100_pdc_current

  • isb-cgc-bq.ICPC.clinical_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.clinical_human_early_onset_gastric_cancer_pdc_current

  • isb-cgc-bq.ICPC.clinical_oral_squamous_cell_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC.quant_proteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.PDC_metadata.aliquot_to_case_mapping_current

  • isb-cgc-bq.PDC_metadata.case_metadata_current

  • isb-cgc-bq.PDC_metadata.file_associated_entity_mapping_current

  • isb-cgc-bq.PDC_metadata.file_metadata_current

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies.clinical_pct_swath_kidney_pdc_current

  • isb-cgc-bq.TCGA.clinical_CPTAC_TCGA_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_JHU_pdc_current

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_PNNL_pdc_current

New CPTAC controlled-access VCF tables.

BigQuery tables created

  • isb-cgc-cbq.CPTAC3_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.CPTAC3.vcf_hg38_gdc_current

  • isb-cgc-cbq.CPTAC2_versioned.vcf_hg38_gdc_r24

  • isb-cgc-cbq.CPTAC2.vcf_hg38_gdc_current

May 24, 2021

New CPTAC RNA Seq table added to isb-cgc-bq for GDC release 28.

BigQuery tables created

  • isb-cgc-bq.CPTAC_versioned.RNAseq_hg38_gdc_r28

BigQuery tables updated

  • isb-cgc-bq.CPTAC.RNAseq_hg38_gdc_current

May 21, 2021

New clinical tables added to isb-cgc-bq for GDC release 29.

BigQuery tables created

  • isb-cgc-bq.BEATAML1_0_versioned.clinical_gdc_r29

  • isb-cgc-bq.CGCI_versioned.clinical_gdc_r29

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_gdc_r29

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_treatments_gdc_r29

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_gdc_r29

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r29

  • isb-cgc-bq.CMI_versioned.clinical_gdc_r29

  • isb-cgc-bq.CPTAC_versioned.clinical_gdc_r29

  • isb-cgc-bq.CTSP_versioned.clinical_gdc_r29

  • isb-cgc-bq.FM_versioned.clinical_gdc_r29

  • isb-cgc-bq.GENIE_versioned.clinical_gdc_r29

  • isb-cgc-bq.HCMI_versioned.clinical_gdc_r29

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_gdc_r29

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_treatments_gdc_r29

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_gdc_r29

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r29

  • isb-cgc-bq.MMRF_versioned.clinical_gdc_r29

  • isb-cgc-bq.MMRF_versioned.clinical_diagnoses_treatments_gdc_r29

  • isb-cgc-bq.MMRF_versioned.clinical_family_histories_gdc_r29

  • isb-cgc-bq.MMRF_versioned.clinical_follow_ups_gdc_r29

  • isb-cgc-bq.MMRF_versioned.clinical_follow_ups_molecular_tests_gdc_r29

  • isb-cgc-bq.NCICCR_versioned.clinical_gdc_r29

  • isb-cgc-bq.OHSU_versioned.clinical_gdc_r29

  • isb-cgc-bq.ORGANOID_versioned.clinical_gdc_r29

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_r29

  • isb-cgc-bq.TCGA_versioned.clinical_gdc_r29

  • isb-cgc-bq.TCGA_versioned.clinical_diagnoses_treatments_gdc_r29

  • isb-cgc-bq.VAREPOP_versioned.clinical_gdc_r29

  • isb-cgc-bq.VAREPOP_versioned.clinical_diagnoses_treatments_gdc_r29

  • isb-cgc-bq.VAREPOP_versioned.clinical_family_histories_gdc_r29

  • isb-cgc-bq.WCDT_versioned.clinical_gdc_r29

Current clinical tables updated to GDC release 29.

BigQuery tables updated

  • isb-cgc-bq.BEATAML1_0.clinical_gdc_current

  • isb-cgc-bq.CGCI.clinical_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.CGCI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.CMI.clinical_gdc_current

  • isb-cgc-bq.CPTAC.clinical_gdc_current

  • isb-cgc-bq.CTSP.clinical_gdc_current

  • isb-cgc-bq.FM.clinical_gdc_current

  • isb-cgc-bq.GENIE.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.MMRF.clinical_gdc_current

  • isb-cgc-bq.MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.MMRF.clinical_family_histories_gdc_current

  • isb-cgc-bq.MMRF.clinical_follow_ups_gdc_current

  • isb-cgc-bq.MMRF.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.NCICCR.clinical_gdc_current

  • isb-cgc-bq.OHSU.clinical_gdc_current

  • isb-cgc-bq.ORGANOID.clinical_gdc_current

  • isb-cgc-bq.TARGET.clinical_gdc_current

  • isb-cgc-bq.TCGA.clinical_gdc_current

  • isb-cgc-bq.TCGA.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.VAREPOP.clinical_gdc_current

  • isb-cgc-bq.VAREPOP.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.VAREPOP.clinical_family_histories_gdc_current

  • isb-cgc-bq.WCDT.clinical_gdc_current

May 18, 2021

New file metadata tables added to isb-cgc-bq for GDC release 29 and New GENCODE annotation tables.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r29

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r29

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r29

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r29

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r29

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r29

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v38

BigQuery tables updated

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

  • isb-cgc-bq.GENCODE.annotation_gtf_hg38_current

April 14, 2021

New PDC Aliquot and Case Metadata tables.

BigQuery tables created

  • isb-cgc-bq.PDC_metadata.aliquot_to_case_mapping_pdc_current

  • isb-cgc-bq.PDC_metadata_versioned.aliquot_to_case_mapping_pdc_V1_11

  • isb-cgc-bq.PDC_metadata.case_metadata_pdc_current

  • isb-cgc-bq.PDC_metadata_versioned.case_metadata_pdc_V1_11

April 2, 2021

New GENCODE annotation tables.

BigQuery tables created

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v36

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v37

BigQuery tables updated

  • isb-cgc-bq.GENCODE.annotation_gtf_hg38_current

March 30, 2021

New CPTAC miRNA expression tables.

BigQuery tables created

  • isb-cgc-bq.CPTAC.miRNAseq_hg38_gdc_current

  • isb-cgc-bq.CPTAC_versioned.miRNAseq_hg38_gdc_r28

March 22, 2021

New TARGET miRNA isoform expression tables.

BigQuery tables created

  • isb-cgc-bq.TARGET_versioned.miRNAseq_isoform_hg38_gdc_r25

BigQuery tables updated

  • isb-cgc-bq.TARGET.miRNAseq_isoform_hg38_gdc_current

March 17, 2021

New HCMI RNA Seq table

BigQuery tables created

  • isb-cgc-bq.HCMI_versioned.RNAseq_hg38_gdc_r28

BigQuery tables updated

  • isb-cgc-bq.HCMI.RNAseq_hg38_gdc_current

March 11, 2021

New HCMI Masked Somatic Mutation table

BigQuery tables created

  • isb-cgc-bq.HCMI_versioned.masked_somatic_mutation_hg38_gdc_r28

BigQuery tables updated

  • isb-cgc-bq.HCMI.masked_somatic_mutation_hg38_gdc_current

March 5, 2021

New file metadata, per sample file metadata, and clinical tables added to isb-cgc-bq for GDC release 28.

BigQuery tables created

  • isb-cgc-bq.CMI_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.WCDT_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.OHSU_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.FM_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.VAREPOP_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.NCICCR_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.ORGANOID_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.MMRF_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.BEATAML1_0_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r28

  • isb-cgc-bq.CCLE_versioned.per_sample_file_metadata_hg19_gdc_r28

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg19_gdc_r28

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r28

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r28

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r28

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_treatments_gdc_r28

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_gdc_r28

  • isb-cgc-bq.CPTAC_versioned.clinical_gdc_r28

  • isb-cgc-bq.HCMI_versioned.clinical_gdc_r28

  • isb-cgc-bq.CMI_versioned.clinical_gdc_r28

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_gdc_r28

Current file metadata, per sample file metadata, and clinical tables updated to GDC release 28.

BigQuery tables updated

  • isb-cgc-bq.CMI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.WCDT.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.GENIE.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.OHSU.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.FM.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.VAREPOP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CTSP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.NCICCR.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.ORGANOID.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.MMRF.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CGCI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.BEATAML1_0.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CCLE.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CPTAC.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_gdc_current

  • isb-cgc-bq.CMI.clinical_gdc_current

  • isb-cgc-bq.HCMI.clinical_follow_ups_gdc_current

March 3, 2021

PDC metadata

  • isb-cgc-bq.PDC_metadata.file_associated_entity_mapping_current

  • isb-cgc-bq.PDC_metadata_versioned.file_associated_entity_mapping_V1_9

  • isb-cgc-bq.PDC_metadata.file_metadata_current

  • isb-cgc-bq.PDC_metadata_versioned.file_metadata_V1_9

February 25, 2021

New TARGET miRNA-seq table

BigQuery tables created

  • isb-cgc-bq.TARGET_versioned.miRNAseq_hg38_gdc_r25

BigQuery tables updated

  • isb-cgc-bq.TARGET.miRNAseq_hg38_gdc_current

February 18, 2021

Pediatric Brain Cancer Pilot Study clinical data from PDC

  • isb-cgc-bq.CBTTC.clinical_pediatric_brain_cancer_pdc_current

  • isb-cgc-bq.CBTTC_versioned.clinical_pediatric_brain_cancer_pdc_V1_9

  • isb-cgc-bq.CBTTC.clinical_diagnoses_pediatric_brain_cancer_pdc_current

  • isb-cgc-bq.CBTTC_versioned.clinical_diagnoses_pediatric_brain_cancer_pdc_V1_9

Hepatitis B Virus (HBV) Related Hepatocellular Carcinoma clinical data from PDC

  • isb-cgc-bq.ICPC.clinical_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC_versioned.clinical_HBV_related_hepatocellular_carcinoma_pdc_V1_9

Proteogenomics of Gastric Cancer Proteome clinical data from PDC

  • isb-cgc-bq.ICPC.clinical_human_early_onset_gastric_cancer_pdc_current

  • isb-cgc-bq.ICPC_versioned.clinical_human_early_onset_gastric_cancer_pdc_V1_9

Oral Squamous Cell Carcinoma clinical data from PDC

  • isb-cgc-bq.ICPC.clinical_oral_squamous_cell_carcinoma_pdc_current

  • isb-cgc-bq.ICPC_versioned.clinical_oral_squamous_cell_carcinoma_pdc_V1_9

Academia Sinica LUAD-100 clinical data from PDC

  • isb-cgc-bq.ICPC.clinical_academia_sinica_LUAD_100_pdc_current

  • isb-cgc-bq.ICPC_versioned.clinical_academia_sinica_LUAD_100_pdc_V1_9

Georgetown Lung Cancer Proteomics Study clinical data from PDC

  • clinical_georgetown_lung_cancer_pdc_current

  • clinical_georgetown_lung_cancer_pdc_V1_9

Quantitative digital maps of tissue biopsies clinical data from PDC

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies.clinical_pct_swath_kidney_pdc_current

  • isb-cgc-bq.Quant_Maps_Tissue_Biopsies_versioned.clinical_pct_swath_kidney_pdc_V1_9

CPTAC clincal data from PDC

  • isb-cgc-bq.TCGA.clinical_CPTAC_TCGA_pdc_current

  • isb-cgc-bq.TCGA_versioned.clinical_CPTAC_TCGA_pdc_V1_9

  • isb-cgc-bq.CPTAC.clinical_CPTAC_2_pdc_current

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC_2_pdc_V1_9

  • isb-cgc-bq.CPTAC.clinical_CPTAC3_discovery_pdc_current

  • isb-cgc-bq.CPTAC_versioned.clinical_CPTAC3_discovery_pdc_V1_9

New CGCI and HCMI Masked Somatic Mutation tables

BigQuery tables created

  • isb-cgc-bq.CGCI.masked_somatic_mutation_hg38_gdc_current

  • isb-cgc-bq.CGCI_versioned.masked_somatic_mutation_hg38_gdc_r27

  • isb-cgc-bq.HCMI_versioned.masked_somatic_mutation_hg38_gdc_r27

BigQuery tables updated

  • isb-cgc-bq.HCMI.masked_somatic_mutation_hg38_gdc_current

February 1, 2021

New CTSP RNA Seq tables

BigQuery tables created

  • isb-cgc-bq.CTSP.RNAseq_hg38_gdc_current

  • isb-cgc-bq.CTSP_versioned.RNAseq_hg38_gdc_r23

January 12, 2021

New HCMI RNA Seq table

BigQuery tables created

  • isb-cgc-bq.HCMI.RNAseq_hg38_gdc_r27

BigQuery tables updated

  • isb-cgc-bq.HCMI.RNAseq_hg38_gdc_current

January 4, 2021

New TARGET RNA Seq tables

BigQuery tables created

  • isb-cgc-bq.TARGET.RNAseq_hg38_gdc_current

  • isb-cgc-bq.TARGET_versioned.RNAseq_hg38_gdc_r25

  • isb-cgc-bq.TARGET_versioned.RNAseq_hg38_gdc_r26

December 17, 2020

New CPTAC Masked Somatic Mutation (MAF) tables.

BigQuery tables created

  • isb-cgc-bq:CPTAC.masked_somatic_mutation_hg38_gdc_current

  • isb-cgc-bq:CPTAC_versioned.masked_somatic_mutation_hg38_gdc_r25

December 16, 2020

New per sample file metadata tables added to isb-cgc-bq for GDC release 27.

BigQuery tables created

  • isb-cgc-bq.CMI_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.WCDT_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.OHSU_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.FM_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.VAREPOP_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.NCICCR_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.ORGANOID_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.MMRF_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.BEATAML1_0_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r27

  • isb-cgc-bq.CCLE_versioned.per_sample_file_metadata_hg19_gdc_r27

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg19_gdc_r27

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r27

Current per sample file metadata tables updated to GDC release 27.

BigQuery tables updated

  • isb-cgc-bq.CMI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.WCDT.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.GENIE.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.OHSU.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.FM.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.VAREPOP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CTSP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.NCICCR.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.ORGANOID.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.MMRF.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CGCI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.BEATAML1_0.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CCLE.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg19_gdc_current

December 14, 2020

New GDC release 27 file metadata tables.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r27

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r27

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r27

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r27

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r27

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r27

Current file metadata tables updated to GDC release 27.

BigQuery tables updated

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

December 9, 2020

New CPTAC RNA Seq tables

BigQuery tables created

  • isb-cgc-bq.CPTAC.RNAseq_hg38_gdc_current

  • isb-cgc-bq.CPTAC_versioned.RNAseq_hg38_gdc_r25

December 8, 2020

CPTAC2, CPTAC3, TCGA quant proteome data from PDC, released Sept. 2020.

BigQuery tables created

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_PNNL_pdc_current

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_PNNL_pdc_2020_09

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_ovarian_JHU_pdc_current

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_ovarian_JHU_pdc_2020_09

  • isb-cgc-bq.TCGA.quant_proteome_TCGA_breast_cancer_pdc_current

  • isb-cgc-bq.TCGA_versioned.quant_proteome_TCGA_breast_cancer_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_PNNL_qeplus_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_ovarian_JHU_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_ovarian_JHU_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_colon_PNNL_qeplus_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_colon_PNNL_qeplus_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_prospective_breast_BI_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_prospective_breast_BI_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_UCEC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_UCEC_discovery_study_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_LUAD_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_LUAD_discovery_study_pdc_2020_09

  • isb-cgc-bq.CPTAC.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current

  • isb-cgc-bq.CPTAC_versioned.quant_proteome_CPTAC_CCRCC_discovery_study_pdc_2020_09

Pediatric Brain Cancer Pilot proteome study from PDC, released Sept. 2020.

  • isb-cgc-bq.CBTTC.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_current

  • isb-cgc-bq.CBTTC_versioned.quant_proteome_pediatric_brain_cancer_pilot_study_pdc_2020_09

Hepatitis B Virus (HBV) Related Hepatocellular Carcinoma Proteome study, released Sept. 2020.

  • isb-cgc-bq.ICPC.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_current

  • isb-cgc-bq.ICPC_versioned.quant_proteome_HBV_related_hepatocellular_carcinoma_pdc_2020_09

Proteogenomics of Gastric Cancer Proteome study, released Sept. 2020.

  • isb-cgc-bq.ICPC.quant_proteome_proteogenomics_of_gastric_cancer_pdc_current

  • isb-cgc-bq.ICPC_versioned.quant_proteome_proteogenomics_of_gastric_cancer_pdc_2020_09

December 2, 2020

Clinical data tables released for GDC release 27. Current clinical tables were updated to GDC release 27.

BigQuery tables created and updated

  • isb-cgc-bq.MMRF.clinical_gdc_current

  • isb-cgc-bq.MMRF_versioned.clinical_gdc_r27

  • isb-cgc-bq.NCICCR.clinical_gdc_current

  • isb-cgc-bq.NCICCR_versioned.clinical_gdc_r27

  • isb-cgc-bq.OHSU.clinical_gdc_current

  • isb-cgc-bq.OHSU_versioned.clinical_gdc_r27

  • isb-cgc-bq.HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r27

  • isb-cgc-bq.HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_treatments_gdc_r27

  • isb-cgc-bq.ORGANOID.clinical_gdc_current

  • isb-cgc-bq.ORGANOID_versioned.clinical_gdc_r27

  • isb-cgc-bq.CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_treatments_gdc_r27

  • isb-cgc-bq.MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.MMRF_versioned.clinical_diagnoses_treatments_gdc_r27

  • isb-cgc-bq.MMRF.clinical_follow_ups_gdc_current

  • isb-cgc-bq.MMRF_versioned.clinical_follow_ups_gdc_r27

  • isb-cgc-bq.TCGA.clinical_gdc_current

  • isb-cgc-bq.TCGA_versioned.clinical_gdc_r27

  • isb-cgc-bq.TARGET.clinical_gdc_current

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_r27

  • isb-cgc-bq.MMRF.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.MMRF_versioned.clinical_follow_ups_molecular_tests_gdc_r27

  • isb-cgc-bq.GENIE.clinical_gdc_current

  • isb-cgc-bq.GENIE_versioned.clinical_gdc_r27

  • isb-cgc-bq.VAREPOP.clinical_gdc_current

  • isb-cgc-bq.VAREPOP_versioned.clinical_gdc_r27

  • isb-cgc-bq.CTSP.clinical_gdc_current

  • isb-cgc-bq.CTSP_versioned.clinical_gdc_r27

  • isb-cgc-bq.CGCI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r27

  • isb-cgc-bq.VAREPOP.clinical_family_histories_gdc_current

  • isb-cgc-bq.VAREPOP_versioned.clinical_family_histories_gdc_r27

  • isb-cgc-bq.BEATAML1_0.clinical_gdc_current

  • isb-cgc-bq.BEATAML1_0_versioned.clinical_gdc_r27

  • isb-cgc-bq.MMRF.clinical_family_histories_gdc_current

  • isb-cgc-bq.MMRF_versioned.clinical_family_histories_gdc_r27

  • isb-cgc-bq.WCDT.clinical_gdc_current

  • isb-cgc-bq.WCDT_versioned.clinical_gdc_r27

  • isb-cgc-bq.VAREPOP.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.VAREPOP_versioned.clinical_diagnoses_treatments_gdc_r27

  • isb-cgc-bq.HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.HCMI_versioned.clinical_diagnoses_gdc_r27

  • isb-cgc-bq.CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq.CGCI_versioned.clinical_diagnoses_gdc_r27

  • isb-cgc-bq.CGCI.clinical_gdc_current

  • isb-cgc-bq.CGCI_versioned.clinical_gdc_r27

  • isb-cgc-bq.CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.CGCI_versioned.clinical_follow_ups_gdc_r27

  • isb-cgc-bq.TCGA.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq.TCGA_versioned.clinical_diagnoses_treatments_gdc_r27

  • isb-cgc-bq.CPTAC.clinical_gdc_current

  • isb-cgc-bq.CPTAC_versioned.clinical_gdc_r27

  • isb-cgc-bq.HCMI.clinical_gdc_current

  • isb-cgc-bq.HCMI_versioned.clinical_gdc_r27

  • isb-cgc-bq.CMI.clinical_gdc_current

  • isb-cgc-bq.CMI_versioned.clinical_gdc_r27

  • isb-cgc-bq.FM.clinical_gdc_current

  • isb-cgc-bq.FM_versioned.clinical_gdc_r27

  • isb-cgc-bq.HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq.HCMI_versioned.clinical_follow_ups_gdc_r27

November 16, 2020

New TARGET controlled-access VCF tables.

BigQuery tables created

  • isb-cgc-cbq.TARGET.vcf_hg38_gdc_current

  • isb-cgc-cbq.TARGET_versioned.vcf_hg38_gdc_r22

October 30, 2020

RNA Seq data tables released for the WCDT program.

BigQuery tables created

  • isb-cgc-bq:WCDT.RNAseq_hg38_gdc_current

  • isb-cgc-bq:WCDT_versioned.RNAseq_hg38_gdc_r22

October 23, 2020

Clinical data tables released for GDC release 25 and 26.

BigQuery tables created

  • isb-cgc-bq:BEATAML1_0_versioned.clinical_gdc_r25

  • isb-cgc-bq:CGCI_versioned.clinical_gdc_r25

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_gdc_r25

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_treatments_gdc_r25

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_gdc_r25

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r25

  • isb-cgc-bq:CPTAC_versioned.clinical_gdc_r25

  • isb-cgc-bq:CTSP_versioned.clinical_gdc_r25

  • isb-cgc-bq:FM_versioned.clinical_gdc_r25

  • isb-cgc-bq:GENIE_versioned.clinical_gdc_r25

  • isb-cgc-bq:HCMI_versioned.clinical_gdc_r25

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_gdc_r25

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_treatments_gdc_r25

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_gdc_r25

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r25

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r25

  • isb-cgc-bq:MMRF_versioned.clinical_diagnoses_treatments_gdc_r25

  • isb-cgc-bq:MMRF_versioned.clinical_family_histories_gdc_r25

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_gdc_r25

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_molecular_tests_gdc_r25

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r25

  • isb-cgc-bq:OHSU_versioned.clinical_gdc_r25

  • isb-cgc-bq:ORGANOID_versioned.clinical_gdc_r25

  • isb-cgc-bq:TARGET_versioned.clinical_gdc_r25

  • isb-cgc-bq:TCGA_versioned.clinical_gdc_r25

  • isb-cgc-bq:TCGA_versioned.clinical_diagnoses_treatments_gdc_r25

  • isb-cgc-bq:VAREPOP_versioned.clinical_gdc_r25

  • isb-cgc-bq:VAREPOP_versioned.clinical_diagnoses_treatments_gdc_r25

  • isb-cgc-bq:VAREPOP_versioned.clinical_family_histories_gdc_r25

  • isb-cgc-bq:WCDT_versioned.clinical_gdc_r25

  • isb-cgc-bq:BEATAML1_0_versioned.clinical_gdc_r26

  • isb-cgc-bq:CGCI_versioned.clinical_gdc_r26

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_gdc_r26

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_treatments_gdc_r26

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_gdc_r26

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r26

  • isb-cgc-bq:CMI_versioned.clinical_gdc_r26

  • isb-cgc-bq:CPTAC_versioned.clinical_gdc_r26

  • isb-cgc-bq:CTSP_versioned.clinical_gdc_r26

  • isb-cgc-bq:FM_versioned.clinical_gdc_r26

  • isb-cgc-bq:GENIE_versioned.clinical_gdc_r26

  • isb-cgc-bq:HCMI_versioned.clinical_gdc_r26

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_gdc_r26

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_treatments_gdc_r26

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_gdc_r26

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_diagnoses_treatments_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_family_histories_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_molecular_tests_gdc_r26

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r26

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r26

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_diagnoses_treatments_gdc_r26

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_diagnoses_treatments_gdc_r26

  • isb-cgc-bq:NCICCR_versioned.clinical_family_histories_gdc_r26

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r26

  • isb-cgc-bq:CMI.clinical_gdc_current

Current clinical tables were updated to GDC release 26.

BigQuery tables updated

  • isb-cgc-bq:BEATAML1_0.clinical_gdc_current

  • isb-cgc-bq:CGCI.clinical_gdc_current

  • isb-cgc-bq:CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq:CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq:CGCI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:CPTAC.clinical_gdc_current

  • isb-cgc-bq:CTSP.clinical_gdc_current

  • isb-cgc-bq:FM.clinical_gdc_current

  • isb-cgc-bq:GENIE.clinical_gdc_current

  • isb-cgc-bq:HCMI.clinical_gdc_current

  • isb-cgc-bq:HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq:HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq:HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:MMRF.clinical_gdc_current

  • isb-cgc-bq:MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:MMRF.clinical_family_histories_gdc_current

  • isb-cgc-bq:MMRF.clinical_follow_ups_gdc_current

  • isb-cgc-bq:MMRF.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:NCICCR.clinical_gdc_current

  • isb-cgc-bq:MMRF.clinical_gdc_current

  • isb-cgc-bq:NCICCR.clinical_gdc_current

  • isb-cgc-bq:MMRF.clinical_gdc_current

  • isb-cgc-bq:NCICCR.clinical_gdc_current

  • isb-cgc-bq:MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:NCICCR.clinical_gdc_current

  • isb-cgc-bq:MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:NCICCR.clinical_family_histories_gdc_current

  • isb-cgc-bq:MMRF.clinical_gdc_current

RNA Seq data tables released for the CMI program.

BigQuery tables created

  • isb-cgc-bq:CMI.RNAseq_hg38_gdc_current

  • isb-cgc-bq:CMI_versioned.RNAseq_hg38_gdc_r26

October 21, 2020

RNA Seq data tables released for the CGCI program.

BigQuery tables created

  • isb-cgc-bq:CGCI.RNAseq_hg38_gdc_current

  • isb-cgc-bq:CGCI_versioned.RNAseq_hg38_gdc_r24

October 15, 2020

Current file metadata tables updated to GDC release 26.

BigQuery tables updated

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

October 14, 2020

New GDC release 26 file metadata tables.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r26

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r26

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r26

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r26

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r26

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r26

New per sample file metadata tables added to isb-cgc-bq for GDC release 26.

BigQuery tables created

  • isb-cgc-bq.WCDT_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.OHSU_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.FM_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.VAREPOP_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.NCICCR_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.ORGANOID_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.MMRF_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.BEATAML1_0_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.CCLE_versioned.per_sample_file_metadata_hg19_gdc_r26

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg19_gdc_r26

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r26

  • isb-cgc-bq.CMI_versioned.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.CMI.per_sample_file_metadata_hg38_gdc_current

Current per sample file metadata tables updated to GDC release 26.

BigQuery tables updated

  • isb-cgc-bq.WCDT.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.GENIE.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.OHSU.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.FM.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.VAREPOP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CTSP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.NCICCR.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.ORGANOID.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.MMRF.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CGCI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.HCMI.per_sample_file_metadata_hg38_gdc_r26

  • isb-cgc-bq.BEATAML1_0.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CPTAC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq.CCLE.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TARGET.per_sample_file_metadata_hg19_gdc_current

  • isb-cgc-bq.TCGA.per_sample_file_metadata_hg19_gdc_current

October 06, 2020

New per sample file metadata tables added to isb-cgc-bq for GDC release 25.

BigQuery tables created

  • isb-cgc-bq.WCDT_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.GENIE_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.OHSU_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.FM_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.VAREPOP_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.CTSP_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.NCICCR_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.ORGANOID_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.MMRF_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.CGCI_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.HCMI_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.BEATAML1_0_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r25

  • isb-cgc-bq.CCLE_versioned.per_sample_file_metadata_hg19_gdc_r25

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg19_gdc_r25

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r25

October 02, 2020

Open Somatic Mutation data tables released for the HCMI program.

BigQuery tables created

  • isb-cgc-bq.HCMI.masked_somatic_mutation_hg38_gdc_current

  • isb-cgc-bq.HCMI_versioned.masked_somatic_mutation_hg38_gdc_r23

The new COSMIC release v92 data is available in BigQuery.

BigQuery tables created

  • isb-cgc-bq.COSMIC.ASCAT_purity_ploidy_grch37_current

  • isb-cgc-bq.COSMIC.ASCAT_purity_ploidy_grch38_current

  • isb-cgc-bq.COSMIC.breakpoints_grch37_current

  • isb-cgc-bq.COSMIC.breakpoints_grch38_current

  • isb-cgc-bq.COSMIC.cancer_gene_census_grch37_current

  • isb-cgc-bq.COSMIC.cancer_gene_census_grch38_current

  • isb-cgc-bq.COSMIC.cancer_gene_census_hallmarks_of_cancer_grch37_current

  • isb-cgc-bq.COSMIC.cancer_gene_census_hallmarks_of_cancer_grch38_current

  • isb-cgc-bq.COSMIC.classification_grch37_current

  • isb-cgc-bq.COSMIC.classification_grch38_current

  • isb-cgc-bq.COSMIC.complete_CNA_grch37_current

  • isb-cgc-bq.COSMIC.complete_CNA_grch38_current

  • isb-cgc-bq.COSMIC.complete_differential_methylation_grch37_current

  • isb-cgc-bq.COSMIC.complete_differential_methylation_grch38_current

  • isb-cgc-bq.COSMIC.complete_gene_expression_grch37_current

  • isb-cgc-bq.COSMIC.complete_gene_expression_grch38_current

  • isb-cgc-bq.COSMIC.complete_targeted_screens_mutant_grch37_current

  • isb-cgc-bq.COSMIC.complete_targeted_screens_mutant_grch38_current

  • isb-cgc-bq.COSMIC.fusion_grch37_current

  • isb-cgc-bq.COSMIC.fusion_grch38_current

  • isb-cgc-bq.COSMIC.genome_screens_mutant_grch37_current

  • isb-cgc-bq.COSMIC.genome_screens_mutant_grch38_current

  • isb-cgc-bq.COSMIC.HGNC_grch37_current

  • isb-cgc-bq.COSMIC.HGNC_grch38_current

  • isb-cgc-bq.COSMIC.mutant_census_grch37_current

  • isb-cgc-bq.COSMIC.mutant_census_grch38_current

  • isb-cgc-bq.COSMIC.mutant_grch37_current

  • isb-cgc-bq.COSMIC.mutant_grch38_current

  • isb-cgc-bq.COSMIC.mutation_tracking_grch37_current

  • isb-cgc-bq.COSMIC.mutation_tracking_grch38_current

  • isb-cgc-bq.COSMIC.NCV_grch37_current

  • isb-cgc-bq.COSMIC.NCV_grch38_current

  • isb-cgc-bq.COSMIC.resistance_mutations_grch37_current

  • isb-cgc-bq.COSMIC.resistance_mutations_grch38_current

  • isb-cgc-bq.COSMIC.sample_grch37_current

  • isb-cgc-bq.COSMIC.sample_grch38_current

  • isb-cgc-bq.COSMIC.structural_variants_grch37_current

  • isb-cgc-bq.COSMIC.structural_variants_grch38_current

  • isb-cgc-bq.COSMIC.transcripts_grch37_current

  • isb-cgc-bq.COSMIC.transcripts_grch38_current

  • isb-cgc-bq.COSMIC_versioned.ASCAT_purity_ploidy_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.ASCAT_purity_ploidy_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.breakpoints_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.breakpoints_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.cancer_gene_census_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.cancer_gene_census_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.cancer_gene_census_hallmarks_of_cancer_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.cancer_gene_census_hallmarks_of_cancer_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.classification_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.classification_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.complete_CNA_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.complete_CNA_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.complete_differential_methylation_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.complete_differential_methylation_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.complete_gene_expression_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.complete_gene_expression_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.complete_targeted_screens_mutant_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.complete_targeted_screens_mutant_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.fusion_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.fusion_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.genome_screens_mutant_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.genome_screens_mutant_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.HGNC_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.HGNC_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.mutant_census_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.mutant_census_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.mutant_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.mutant_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.mutation_tracking_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.mutation_tracking_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.NCV_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.NCV_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.resistance_mutations_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.resistance_mutations_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.sample_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.sample_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.structural_variants_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.structural_variants_grch38_v92

  • isb-cgc-bq.COSMIC_versioned.transcripts_grch37_v92

  • isb-cgc-bq.COSMIC_versioned.transcripts_grch38_v92

September 21, 2020

Current file metadata tables updated to GDC release 25.

BigQuery tables updated

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

September 18, 2020

New GDC release 25 file metadata tables.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r25

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r25

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r25

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r25

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r25

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r25

September 8, 2020

Table generated as part of an analysis for a poster submitted to the ACM-BCB2020 conference.

BigQuery tables created

  • isb-cgc-bq.supplementary_tables.Abdilleh_etal_ACM_BCB_2020_TCGA_bioclin_v0_Clinical_UNPIVOT

September 2, 2020

New GENCODE data, version 34 and 35.

BigQuery tables created

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v34

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v35

  • isb-cgc-bq.GENCODE.annotation_gtf_hg38_current

August 28, 2020

New GDC release 24 clinical tables.

BigQuery tables created

  • isb-cgc-bq:BEATAML1_0.clinical_gdc_current

  • isb-cgc-bq:BEATAML1_0_versioned.clinical_gdc_r24

  • isb-cgc-bq:CGCI.clinical_diagnoses_gdc_current

  • isb-cgc-bq:CGCI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:CGCI.clinical_follow_ups_gdc_current

  • isb-cgc-bq:CGCI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:CGCI.clinical_gdc_current

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_gdc_r24

  • isb-cgc-bq:CGCI_versioned.clinical_diagnoses_treatments_gdc_r24

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_gdc_r24

  • isb-cgc-bq:CGCI_versioned.clinical_follow_ups_molecular_tests_gdc_r24

  • isb-cgc-bq:CGCI_versioned.clinical_gdc_r24

  • isb-cgc-bq:CPTAC.clinical_gdc_current

  • isb-cgc-bq:CPTAC_versioned.clinical_gdc_r24

  • isb-cgc-bq:CTSP.clinical_gdc_current

  • isb-cgc-bq:CTSP_versioned.clinical_gdc_r24

  • isb-cgc-bq:FM.clinical_gdc_current

  • isb-cgc-bq:FM_versioned.clinical_gdc_r24

  • isb-cgc-bq:GENIE.clinical_gdc_current

  • isb-cgc-bq:GENIE_versioned.clinical_gdc_r24

  • isb-cgc-bq:HCMI.clinical_diagnoses_gdc_current

  • isb-cgc-bq:HCMI.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:HCMI.clinical_follow_ups_gdc_current

  • isb-cgc-bq:HCMI.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:HCMI.clinical_gdc_current

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_gdc_r24

  • isb-cgc-bq:HCMI_versioned.clinical_diagnoses_treatments_gdc_r24

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_gdc_r24

  • isb-cgc-bq:HCMI_versioned.clinical_follow_ups_molecular_tests_gdc_r24

  • isb-cgc-bq:HCMI_versioned.clinical_gdc_r24

  • isb-cgc-bq:MMRF.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:MMRF.clinical_family_histories_gdc_current

  • isb-cgc-bq:MMRF.clinical_follow_ups_gdc_current

  • isb-cgc-bq:MMRF.clinical_follow_ups_molecular_tests_gdc_current

  • isb-cgc-bq:MMRF.clinical_gdc_current

  • isb-cgc-bq:MMRF_versioned.clinical_diagnoses_treatments_gdc_r24

  • isb-cgc-bq:MMRF_versioned.clinical_family_histories_gdc_r24

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_gdc_r24

  • isb-cgc-bq:MMRF_versioned.clinical_follow_ups_molecular_tests_gdc_r24

  • isb-cgc-bq:MMRF_versioned.clinical_gdc_r24

  • isb-cgc-bq:NCICCR.clinical_gdc_current

  • isb-cgc-bq:NCICCR_versioned.clinical_gdc_r24

  • isb-cgc-bq:OHSU.clinical_gdc_current

  • isb-cgc-bq:OHSU_versioned.clinical_gdc_r24

  • isb-cgc-bq:ORGANOID.clinical_gdc_current

  • isb-cgc-bq:ORGANOID_versioned.clinical_gdc_r24

  • isb-cgc-bq:TARGET.clinical_gdc_current

  • isb-cgc-bq:TARGET_versioned.clinical_gdc_r24

  • isb-cgc-bq:TCGA.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:TCGA.clinical_gdc_current

  • isb-cgc-bq:TCGA_versioned.clinical_diagnoses_treatments_gdc_r24

  • isb-cgc-bq:TCGA_versioned.clinical_gdc_r24

  • isb-cgc-bq:VAREPOP.clinical_diagnoses_treatments_gdc_current

  • isb-cgc-bq:VAREPOP.clinical_family_histories_gdc_current

  • isb-cgc-bq:VAREPOP.clinical_gdc_current

  • isb-cgc-bq:VAREPOP_versioned.clinical_diagnoses_treatments_gdc_r24

  • isb-cgc-bq:VAREPOP_versioned.clinical_family_histories_gdc_r24

  • isb-cgc-bq:VAREPOP_versioned.clinical_gdc_r24

  • isb-cgc-bq:WCDT.clinical_gdc_current

  • isb-cgc-bq:WCDT_versioned.clinical_gdc_r24

July 23, 2020

New TCGA controlled-access MAF tables. New TARGET GDC release 22 RNAseq and miRNAseq tables.

BigQuery tables created

  • isb-cgc-cbq:TCGA.maf_hg38_gdc_current

  • isb-cgc-cbq:TCGA_versioned.maf_hg38_gdc_r14

  • isb-cgc-bq:TARGET_versioned.miRNAseq_hg38_gdc_r22

  • isb-cgc-bq:TARGET_versioned.RNAseq_hg38_gdc_r22

  • isb-cgc-bq:TARGET.miRNAseq_hg38_gdc_current

  • isb-cgc-bq:TARGET.RNAseq_hg38_gdc_current

July 21, 2020

New HCMI RNA seq table.

BigQuery tables created

  • isb-cgc.HCMI.RNAseq_hg38_gdc_r23

July 9, 2020

New per sample file metadata tables added to isb-cgc-bq for GDC release 24.

BigQuery tables created

  • isb-cgc-bq:BEATAML1_0.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:BEATAML1_0_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:TCGA.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:TCGA_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:TARGET.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:TARGET_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:GENIE.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:GENIE_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:CGCI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:CGCI_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:CPTAC.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:CPTAC_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:CTSP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:CTSP_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:FM.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:FM_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:HCMI.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:HCMI_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:MMRF.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:MMRF_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:NCICCR.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:NCICCR_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:OHSU.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:OHSU_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:ORGANOID.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:ORGANOID_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:VAREPOP.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:VAREPOP_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:WCDT.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:WCDT_versioned.per_sample_file_metadata_hg38_gdc_r24

  • isb-cgc-bq:CCLE.per_sample_file_metadata_hg38_gdc_current

  • isb-cgc-bq:CCLE_versioned.per_sample_file_metadata_hg38_gdc_r24

Existing GDC Release 24 file metadata tables in the isb-cgc project were copied to the isb-cgc-bq project.

BigQuery tables created

  • isb-cgc-bq.GDC_case_file_metadata_versioned.slide2caseIDmap_r24

  • isb-cgc-bq.GDC_case_file_metadata_versioned.GDCfileID_to_GCSurl_r24

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_legacy_r24

  • isb-cgc-bq.GDC_case_file_metadata_versioned.fileData_active_r24

  • isb-cgc-bq.GDC_case_file_metadata_versioned.caseData_r24

  • isb-cgc-bq.GDC_case_file_metadata_versioned.aliquot2caseIDmap_r24

  • isb-cgc-bq.GDC_case_file_metadata.slide2caseIDmap_current

  • isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_legacy_current

  • isb-cgc-bq.GDC_case_file_metadata.fileData_active_current

  • isb-cgc-bq.GDC_case_file_metadata.caseData_current

  • isb-cgc-bq.GDC_case_file_metadata.aliquot2caseIDmap_current

June 16, 2020

The new COSMIC release v91 data is available in BigQuery.

BigQuery tables created

  • isb-cgc:COSMIC_v91_grch37.ASCAT_Purity_Ploidy

  • isb-cgc:COSMIC_v91_grch37.Breakpoints

  • isb-cgc:COSMIC_v91_grch37.Cancer_Gene_Census

  • isb-cgc:COSMIC_v91_grch37.Complete_CNA

  • isb-cgc:COSMIC_v91_grch37.Complete_Differential_Methylation

  • isb-cgc:COSMIC_v91_grch37.Complete_Gene_Expression

  • isb-cgc:COSMIC_v91_grch37.Complete_Targeted_Screens_Mutant

  • isb-cgc:COSMIC_v91_grch37.Fusion

  • isb-cgc:COSMIC_v91_grch37.Genome_Screens_Mutant

  • isb-cgc:COSMIC_v91_grch37.HGNC

  • isb-cgc:COSMIC_v91_grch37.Mutant

  • isb-cgc:COSMIC_v91_grch37.Mutant_Census

  • isb-cgc:COSMIC_v91_grch37.Mutation_Tracking

  • isb-cgc:COSMIC_v91_grch37.NCV

  • isb-cgc:COSMIC_v91_grch37.Resistance_Mutations

  • isb-cgc:COSMIC_v91_grch37.Sample

  • isb-cgc:COSMIC_v91_grch37.Structural_Variants

  • isb-cgc:COSMIC_v91_grch37.Transcripts

  • isb-cgc:COSMIC_v91_grch38.ASCAT_Purity_Ploidy

  • isb-cgc:COSMIC_v91_grch38.Breakpoints

  • isb-cgc:COSMIC_v91_grch38.Cancer_Gene_Census

  • isb-cgc:COSMIC_v91_grch38.Classification

  • isb-cgc:COSMIC_v91_grch38.Complete_CNA

  • isb-cgc:COSMIC_v91_grch38.Complete_Differential_Methylation

  • isb-cgc:COSMIC_v91_grch38.Complete_Gene_Expression

  • isb-cgc:COSMIC_v91_grch38.Complete_Targeted_Screens_Mutant

  • isb-cgc:COSMIC_v91_grch38.Fusion

  • isb-cgc:COSMIC_v91_grch38.Genome_Screens_Mutant

  • isb-cgc:COSMIC_v91_grch38.HGNC

  • isb-cgc:COSMIC_v91_grch38.Mutant

  • isb-cgc:COSMIC_v91_grch38.Mutant_Census

  • isb-cgc:COSMIC_v91_grch38.Mutation_Tracking

  • isb-cgc:COSMIC_v91_grch38.NCV

  • isb-cgc:COSMIC_v91_grch38.Resistance_Mutations

  • isb-cgc:COSMIC_v91_grch38.Sample

  • isb-cgc:COSMIC_v91_grch38.Structural_Variants

  • isb-cgc:COSMIC_v91_grch38.Transcripts

June 09, 2020

New GDC file ID to GCS url tables added to isb-cgc for GDC release 24.

BigQuery tables created

  • isb-cgc:GDC_metadata.rel24_GDCfileID_to_GCSurl

May 28, 2020

New data set and RNA Sequence table derived data tables added to isb-cgc.

BigQuery tables created

  • isb-cgc:TARGET.RNAseq_hg38_r22

May 27, 2020

PanCancer tables were added to the isb-cgc project. The Pan-Cancer Atlas tables include clinical, methylation, RPPA and copy number data.

BigQuery tables created

The following tables were created under the isb-cgc:pancer-altas data set:

  • BarcodeMap

  • clinical_PANCAN_patient_with_followup

  • EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp

  • Filtered_all_CNVR_data_by_gene

  • Filtered_clinical_PANCAN_patient_with_followup

  • Filtered_EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp

  • Filtered_jhu_usc_edu_PANCAN_HumanMethylation27_betaValue_whitelisted

  • Filtered_jhu_usc_edu_PANCAN_HumanMethylation450_betaValue_whitelisted

  • Filtered_jhu_usc_edu_PANCAN_merged_HumanMethylation27_HumanMethylation450_betaValue_whitelisted

  • Filtered_MC3_MAF_V5_one_per_tumor_sample

  • Filtered_pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16

  • Filtered_TCGA_RPPA_pancan_clean

  • jhu_usc_edu_PANCAN_HumanMethylation27_betaValue_whitelisted

  • jhu_usc_edu_PANCAN_HumanMethylation450_betaValue_whitelisted

  • jhu_usc_edu_PANCAN_merged_HumanMethylation27_HumanMethylation450_betaValue_whitelisted

  • merged_sample_quality_annotations

  • pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16

  • TCGA_CDR

  • TCGA_RPPA_pancan_clean

  • Whitelist_ParticipantBarcodes

GDC data release 24.0 was released on May 7, 2020.

Updates to existing programs and projects

  • 110 new cases were released from the HNSCC cohort of CPTAC-3. This includes WXS, WGS, RNA-Seq and miRNA-Seq data.

  • Aliquot-level WXS MAFs are now available from the following projects: CPTAC-2 and CPTAC-3

BigQuery tables created

  • isb-cgc:GDC_metadata.rel24_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel24_caseData

  • isb-cgc:GDC_metadata.rel24_fileData_active

  • isb-cgc:GDC_metadata.rel24_fileData_legacy

  • isb-cgc:GDC_metadata.rel24_slide2caseIDmap

New programs and projects available in Google Cloud Storage

  • New project released: CGCI-HTMCP-CC - HIV+ Tumor Molecular Characterization Project - Cervical Cancer

  • RNA-Seq: Alignments and gene expression levels

  • miRNA-Seq: Alignments and miRNA expression levels

  • WGS: Alignments

  • Targeted Sequencing: Alignments

New data sets and RNA Sequence tables derived data tables added to isb-cgc.

BigQuery tables created

  • isb-cgc:BEATAML1_0.RNA_hg38_r19

  • isb-cgc:ORGANOID.RNA_hg38_r18

May 8, 2020

GDC data release 23.0 was posted on April 7, 2020.

Updates to existing programs and projects

  • HCMI-CMDC Aliquot-level MAFs were released

  • TARGET-ALL-P2 Aliquot-level MAFs were released

  • TARGET-ALL-P3 Aliquot-level MAFs were released

  • TARGET-AML Aliquot-level MAFs were released

  • TARGET-NBL Aliquot-level MAFs were released

  • TARGET-OS Aliquot-level MAFs were released

  • TARGET-WT Aliquot-level MAFs were released

  • All TCGA Projects Copy number segment and estimate files from SNP6 ASCAT were released

  • TARGET-ALL-P2 Copy number segment and estimate files from SNP6 ASCAT were released

  • TARGET-AML Copy number segment and estimate files from SNP6 ASCAT were released

  • HCMI-CMDC RNA-seq data was released

  • CGCI-BLGSP clinical data was updated

  • HCMI-CMDC clinical data was updated

  • WCDT-MCRPC clinical data was updated

BigQuery tables created

  • isb-cgc:GDC_metadata.rel23_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel23_caseData

  • isb-cgc:GDC_metadata.rel23_fileData_active

  • isb-cgc:GDC_metadata.rel23_fileData_legacy

  • isb-cgc:GDC_metadata.rel23_slide2caseIDmap

  • isb-cgc:GDC_metadata.rel23_GDCfileID_to_GCSurl

March 16, 2020

GDC data release 22.0 was posted on January 16, 2020.

New programs and projects available in Google Cloud Storage

  • WCDT-MCRPC (Genomic Characterization of Metastatic Castration Resistant Prostate Cancer), RNA-Seq and WGS Data included

Updates to existing programs and projects

  • HCMI-CMDC new RNA-Seq, WXS, WGS data was released.

  • CPTAC-3 new WXS, WGS, and RNA-Seq data and miRNA-Seq data for currently released cases was released

BigQuery tables created

  • isb-cgc:GDC_metadata.rel22_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel22_caseData

  • isb-cgc:GDC_metadata.rel22_fileData_active

  • isb-cgc:GDC_metadata.rel22_fileData_legacy

  • isb-cgc:GDC_metadata.rel22_slide2caseIDmap

  • isb-cgc:GDC_metadata.rel22_GDCfileID_to_GCSurl

January 11, 2020

GDC data release 21.0 was posted on December 10, 2019.

New programs and projects available in Google Cloud Storage

  • GENIE-MDA

  • GENIE-VICC

  • GENIE-DFCI

  • GENIE-MSK

  • GENIE-UHN

  • GENIE-JHU

  • GENIE-GRCC

  • GENIE-NKI

BigQuery tables created

  • isb-cgc:GDC_metadata.rel21_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel21_caseData

  • isb-cgc:GDC_metadata.rel21_fileData_active

  • isb-cgc:GDC_metadata.rel21_fileData_legacy

  • isb-cgc:GDC_metadata.rel21_slide2caseIDmap

December 20, 2019

GDC data release 19.0 was posted on September 17, 2019.

GDC data release 19.1 was posted on November 6, 2019.

New programs and projects available in Google Cloud Storage

  • BEATAML1.0-COHORT (Functional Genomic Landscape of Acute Myeloid Leukemia) WXS and RNA-Seq data was included.

Updates to existing programs and projects

  • TARGET-ALL-P1 new RNA-Seq data was released.

  • TARGET-ALL-P2 new RNA-Seq, WXS, and miRNA-Seq data was released.

  • TARGET-ALL-P3 new miRNA-Seq data was released.

  • TARGET-AML new WXS and WGS data was released.

  • TARGET-NBL new WXS and RNA-Seq data was released.

  • TARGET-RT new WGS and RNA-Seq data was released.

  • TARGET-WT new WGS, WXS, and RNA-Seq data was released.

  • CGCI-BLGSP new WGS data was released.

  • TARGET-ALL-P3 new Pindel VCFs was released.

  • MMRF new Pindel VCFs was released.

  • HCMI new Pindel VCFs was released.

  • CPTAC-3 new Pindel VCFs was released.

  • Disease-specific staging properties for many projects released.

BigQuery tables created

  • isb-cgc:GDC_metadata.rel19_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel19_caseData

  • isb-cgc:GDC_metadata.rel19_fileData_active

  • isb-cgc:GDC_metadata.rel19_fileData_legacy

  • isb-cgc:GDC_metadata.rel19_slide2caseIDmap

GDC data release 18 was posted on July 8, 2019.

New programs and projects available in Google Cloud Storage

  • ORGANOID-PANCREATIC (Pancreas Cancer Organoid Profiling)

  • MMRF-COMMPASS (Multiple Myeloma CoMMpass Study)

  • CGCI-BLGSP (Burkitt Lymphoma Genome Sequencing Project)

  • TARGET-ALL-P1 (Acute Lymphoblastic Leukemia - Phase I)

  • TARGET-ALL-P2 (Acute Lymphoblastic Leukemia - Phase II)

Updates to existing programs and projects

  • TARGET-ALL-P3 new RNA-Seq data was released.

  • TARGET-CCSK new RNA-Seq data was released.

  • TARGET-OS new RNA-Seq data was released.

BigQuery tables created

  • isb-cgc:GDC_metadata.rel18_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel18_caseData

  • isb-cgc:GDC_metadata.rel18_fileData_active

  • isb-cgc:GDC_metadata.rel18_fileData_legacy

  • isb-cgc:GDC_metadata.rel18_slide2caseIDmap

September 29, 2019

GDC data release 17.0 was posted on June 5, 2019.

GDC data release 17.1 was posted on June 12, 2019.

New programs and projects available in Google Cloud Storage

  • HCMI-CMDC 500 files, 2.8TB

  • BEATAML1.0-CRENOLANIB 700 files, 3.6TB

Updates to existing programs and projects

  • CPTAC-3 RNA-Seq - 7400 files, 16.6 TB

  • TCGA ATAC-Seq - 820 files, 9.2 TB

  • NCICCR-DLBCL RNA-Seq - 2900 files, 11.9 TB

  • CTSP-DLBCL1 RNA-Seq - 250 files, .96TB

  • Updates to TCGA clinical data

  • Migrations of three properties across all projects from diagnosis to demographic (vital_status, days_to_birth, days_to_death)

BigQuery tables created

  • isb-cgc:GDC_metadata.rel17_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel17_caseData

  • isb-cgc:GDC_metadata.rel17_fileData_active

  • isb-cgc:GDC_metadata.rel17_fileData_legacy

  • isb-cgc:GDC_metadata.rel17_slide2caseIDmap

April 4, 2019

GDC data release 16 was posted on March 26, 2019.

New programs and projects available in Google Cloud Storage

  • CPTAC-3

BigQuery tables created

  • isb-cgc:GDC_metadata.rel16_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel16_caseData

  • isb-cgc:GDC_metadata.rel16_fileData_active

  • isb-cgc:GDC_metadata.rel16_fileData_legacy

  • isb-cgc:GDC_metadata.rel16_slide2caseIDmap

March 6, 2019

GDC data release 15 was posted on February 20, 2019.

New programs and projects available in Google Cloud Storage

  • TARGET-ALL-P3

BigQuery tables created

  • isb-cgc:GDC_metadata.rel15_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel15_caseData

  • isb-cgc:GDC_metadata.rel15_fileData_current

  • isb-cgc:GDC_metadata.rel15_fileData_legacy

  • isb-cgc:GDC_metadata.rel15_slide2caseIDmap

January 4, 2019

GDC data release 14 was posted on December 18, 2018.

New programs and projects available in Google Cloud Storage

  • FM-AD

BigQuery tables created

  • isb-cgc:GDC_metadata.rel14_aliquot2caseIDmap

  • isb-cgc:GDC_metadata.rel14_caseData

  • isb-cgc:GDC_metadata.rel14_fileData_current

  • isb-cgc:GDC_metadata.rel14_fileData_legacy

  • isb-cgc:GDC_metadata.rel14_GDCfileID_to_GCSurl

  • isb-cgc:GDC_metadata.rel14_GDCfileID_to_GCSurl_NEW

  • isb-cgc:GDC_metadata.rel14_slide2caseIDmap

  • isb-cgc:TCGA_hg38_data_v0.miRNAseq_Expression

  • isb-cgc:TCGA_hg38_data_v0.miRNAseq_Isoform_Expression

October 2, 2018

GDC data release 13 was posted on September 27, 2018.

New programs and projects available in Google Cloud Storage

  • VAREPOP-APOLLO

  • CTSP-DLBCL1

  • NCICCR-DLBCL

DR13, active archive contains 428,543 files (DR12 contained 356,381 files)

  • 116 files were removed: 88 VCF files, 24 BAM files, and 2 miRNA “mirnas.quantification” files and (corresponding) 2 miRNA “isoforms.quantification” files.

  • 72278 files were added: 47248 BAI files, 23203 TBI files, 1287 BAM files, 504 SEG files, 36 SVS files.

June 25, 2018:

GDC data release 12 was posted on Wednesday, June 13, 2018.

  • There is absolutely no change in the legacy archive data between DR11 and DR12

  • There is also no change in the total number of cases in either archive

  • The number of files in the current archive has increased from 329,165 to 356,381:

  • 67,220 files were removed

  • 94,436 files were added

More details about the changes to the current archive of TCGA data:

Copy Number Variation | Genotyping Array | TXT files:

  • 22376 Copy Number Segment files replaced (ie removed and added)

  • 22376 Masked Copy Number Segment files replaced

Biospecimen | BCR XML files:

  • 11294 files replaced

Clinical | BCR XML files:

  • 11160 files removed / 11167 files added (ie 7 extra files)

Biospecimen | Diagnostic Slide | SVS files:

  • 11730 Slide Image files added

Biospecimen | BCR SSF XML files:

  • 10557 Biospecimen Supplement files added

Biospecimen | BCR Auxiliary XML files:

  • 2884 Biospecimen Supplement files added

Clinical | BCR OMF XML files:

  • 1051 Clinical Supplement files added

Biospecimen | BCR Biotab files:

  • 340 Biospecimen Supplement files added

Clinical | BCR Biotab files:

  • 226 Clinical Supplement files added

Simple Nucleotide Variation | WXS | VCF | Varscan2 files :

  • 1 Raw Simple Somatic Mutation file removed (2017-03-04)

  • 1 Annotated Somatic Mutation file removed (2017-06-17)

Both for ESCA samples:

TCGA-VR-A8ET-01A-11D-A403-09;TCGA-VR-A8ET-10B-01D-A403-09

For TARGET data:

RNA-Seq data:

  • 3 BAM files and 9 Gene Expression Quantification files removed

  • Sample barcodes: TARGET-30-PAKYZS-01A-01R, TARGET-30-PAMEZH-01A-01R, TARGET-30-PANRRW-01A-01R

  • Raw CGI Variant | WGS | Combined Nucleotide Variation | VCF files:435 files added

June 4, 2018:

The metadata tables for GDC data release 11 are now available in BigQuery.

May 8, 2018:

The gnomAD database (release 2.0.2, dated October 2017) is now available in BigQuery! isb-cgc:genome_reference.gnomAD_20171003_GRCh37.

April 30, 2018:

Recently released (2018-04-01) ClinVar VCFs are now available in BigQuery! Two new tables (ClinVar_20180401_GRCh37 and ClinVar_20180401_GRCh38) can be found in our genome_reference dataset; also available is dbSNP build 151 (announced 2018-04-24): isb-cgc:genome_reference.dbSNP_b151_GRCh37p13_All.

February 22, 2018:

A genenames_mapping table has been added to our numerous reference sources in BigQuery to simplify mapping between HGNC IDs, HGNC symbols, Entrez Gene IDs, Ensembl Gene IDs, Pubmed IDs, and RefSeq IDs!

June 9, 2018:

The metadata tables for GDC data release 10 are now available in BigQuery.

May 8, 2018:

The release 85 of the COSMIC database is now available in BigQuery.

February 13, 2018:

The release 84 of the COSMIC database is now available in BigQuery.

December 19, 2017:

The ISB-CGC cohort metadata has been update to reflect the new and update TARGET gene expression data provided by the GDC in their data release 9.

December 6, 2017:

The GDC release 9 included some updated and new TARGET gene expression data. The BigQuery table isb-cgc:TARGET_hg38_data_v0.RNAseq_Gene_Expression has been updated to reflect this.

November 7, 2017:

The release 83 of the COSMIC database is now available in BigQuery.

November 3, 2017:

The metadata tables for GDC data release 9 are now available in BigQuery.

October 30, 2017:

The ‘harmonized’ hg38 TCGA VCF files (raw and annotated) are now available in the ISB-CGC controlled-data repository in Google Cloud Storage.

August 30, 2017:

The hg38 TARGET VCF files (raw and annotated) are now available in the ISB-CGC controlled-data repository in Google Cloud Storage.

August 3, 2017:

Release 82 of the COSMIC database is now available in BigQuery.

June 30, 2017:

The genome sequence hg19 and hg38 TARGET WXS, RNA-Seq, and miRNA-Seq BAM files are now available in the ISB-CGC controlled-data repository in Google Cloud Storage.

May 9, 2017:

Release 81 of the COSMIC database is now available in BigQuery.

May 5, 2017:

A table mapping between UniProtKB accessions and identifiers has been added to our reference dataset: isb-cgc:genome_reference.UniProtKB_idmapping.

April 10, 2017:

We have re-organized our TCGA clinical, biospecimen, and molecular data into new datasets in BigQuery.

Please find them below:

The hg19 data can also be found in the GDC’s legacy archive, while the hg38 data is available at the GDC data portal.

March 30, 2017:

The ‘harmonized’ hg38 TCGA miRNA-Seq BAM files from the initial GDC data release are now available in the ISB-CGC controlled-data repository in Google Cloud Storage.

February 20, 2017:

In collaboration with the Sanger Institute, the COSMIC database is now available in BigQuery (registered users only).

February 5, 2017:

Genomic coordinates (in GFF3 format) for human microRNAs added for miRBase v20 and v21 to the isb-cgc:genome_reference BigQuery dataset.

January 30, 2017:

The final, unified “MC3” TCGA somatic mutations call set is available in the BigQuery. isb-cgc:hg19_data_previews dataset (also available on Synapse).

January 10, 2017:

miRBase_v20 table added to the isb-cgc:genome_reference BigQuery dataset.

January 4, 2017:

Ensembl gene-set releases 75 (GRCh37) and 87 (GRCh38) are now also available in the isb-cgc:genome_reference BigQuery dataset.

December 30, 2016:

The ‘harmonized’ hg38 TCGA WXS BAM files and RNA-Seq BAM files from the initial GDC data release (1.0), as well as the legacy hg19. TCGA ‘Level 2’ Genome-Wide SNP6 array genotype files (‘birdseed’) files are now available in the ISB-CGC controlled-data repository in Google Cloud Storage.

November 14, 2016:

TCGA radiology and tissue slide images are now available in Google Cloud Storage! This includes radiology images (DICOM files) from the Cancer Imaging Archive (TCIA) and tissue slide images from the NCI-GDC data portal (SVS files).

November 16, 2016:

TCGA proteomics data from the CPTAC (Phase II) is now available in Google Cloud Storage.

September 10, 2016:

GENCODE versions 19, 22, 23, and 24 are all now available in the isb-cgc:genome_reference BigQuery dataset, with an updated and more complete schema. – Note also that the naming convention is now GENCODE_v19 rather than GENCODE_r19; also that v19 is the last version based on hg19/GRCh37, and all subsequent versions are based on hg38/GRCh38.

August 31, 2016:

A table based on the latest liftOver hg19-to-hg38 chain files is available in the isb-cgc:tcga_genome_reference BigQuery dataset.

August 26, 2016:

A set of tables based on running Picard over ~67,000 TCGA bam files in GCS have been added to the isb-cgc:tcga_seq_metadata BigQuery dataset: information contained in these tables includes bam-index stats, insert-size metrics, quality-distribution metrics, and quality-yield metrics – these tables can be used in conjunction with the FastQC-based tables to look for bam and/or fastq data files that meet your analysis criteria.

August 21, 2016:

New miRBase_v21 table added to the isb-cgc:genome_reference BigQuery dataset.

August 20, 2016:

Updated hg19 and hg38 Kaviar tables added to the isb-cgc:genome_reference BigQuery dataset.

August 17, 2016:

New isb-cgc:GDC_metadata BigQuery dataset containing metadata for both legacy and current files hosted at the NCI-GDC.

July 28, 2016:

New isb-cgc:tcga_201607_beta BigQuery dataset based on the final TCGA data upload from the DCC. This dataset largely mirrors the previous isb-cgc:tcga_20510_alpha dataset and is now also supporting the ISB-CGC Web-App. The curated TCGA cohort tables in the isb-cgc:tcga_cohorts BigQuery dataset have also been updated.

June 24, 2016:

An updated listing of all ISB-CGC hosted data in Google Cloud Storage (GCS) is now available in the GCS_listing_24jun2016 table in the isb-cgc:tcga_seq_metadata dataset in BigQuery, in addition the CGHub_Manifest_24jun2016 table contains the final CGHub Manifest prior to the transition of all data to the Genomic Data Commons.

June 18, 2016:

New GENCODE_r24 table added to the isb-cgc:genome_reference BigQuery dataset.

May 13, 2016:

New NCBI_Viral_Annotations_Taxid10239 table added to the isb-cgc:genome_reference BigQuery dataset.

May 9, 2016:

New Ensembl2Reactome and miRBase2Reactome tables added to the isb-cgc:genome_reference BigQuery dataset.

May 3, 2016:

New isb-cgc:tcga_seq_metadata BigQuery dataset contains metadata and FastQC metrics for thousands of TCGA DNA-seq and RNA-seq data files: - CGHub_Manifest table contains metadata for all TCGA files at CGHub as of April 27th, 2016 - GCS_listing_27apr2016 table contains metadata for all TCGA files hosted by ISB-CGC in GCS - RNAseq_FastQC table contains metrics derived from FastQC runs on the RNAseq data files, including urls to the FastQC html reports that you can cut and paste directly into your browser - WXS_FastQC table contains metrics derived from FastQC runs on the exome DNAseq data files

April 28, 2016

GO_Ontology and GO_Annotations tables added to the isb-cgc:genome_reference BigQuery dataset.

March 14, 2016

With the release of our Web-App, controlled-data is now accessible (programmatically) to users who have previously obtained dbGaP approval for TCGA data and go through the NIH authentication process built-in to the Web-App.

February 26, 2016

New CCLE dataset in BigQuery isb-cgc:ccle_201602_alpha includes sample metadata, mutation calls, copy-number segments, and expression data (metadata includes full cloud-storage-path for world-readable BAM and SNP CEL files, and Genomics dataset- and readgroupset-ids for sequence data imported into Google Genomics).

February 22, 2016

Kaviar database now available in the isb-cgc:genome_reference BigQuery dataset.

February 19, 2016

CCLE RNAseq and DNAseq bam files imported into Google Genomics.

January 10, 2016

GENCODE_r19 and miRBase_v20 tables added to the isb-cgc:genome_reference BigQuery dataset.

December 26, 2015

Public release of new isb-cgc:genome_reference BigQuery dataset: the first table is based on the just-published miRTarBase release 6.1.

December, 12, 2015

Curated TCGA cohort lists available in isb-cgc:tcga_cohorts BigQuery dataset.

December 3, 2015

Version v0.1.

First tagged release of the web-app.

November 16, 2015

Initial upload of data from CGHub into Google Cloud Storage (GCS) complete (not publicly released).

November 2, 2015

First public release of TCGA open-access data in BigQuery tables.

  • isb-cgc:tcga_201510_alpha dataset contains updated set of BigQuery tables, based on data available at the TCGA DCC as of October 2015

  • Includes Annotations table with information about redacted samples, etc

  • isb-cgc:platform_reference contains annotation information for the Illumina DNA Methylation platform

October 4, 2015

Complete data upload from TCGA DCC, including controlled-access data

September 21, 2015

Draft set of BigQuery tables (not publicly released)

  • isb-cgc:tcga_201507_alpha dataset containing clinical, biospecimen, somatic mutation calls and Level-3 TCGA data available at the TCGA DCC as of July 2015


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Migration to Project isb-cgc-bq Release Notes

December 15, 2020

Existing TCGA hg19 DNA Methylation tables in the isb-cgc project were copied to the isb-cgc-bq project, TCGA and TCGA_versioned data sets. Corresponding TCGA tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.TCGA.DNA_methylation_chrY_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chrY_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chrX_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chrX_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr22_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr22_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr21_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr21_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr20_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr20_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr19_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr19_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr18_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr18_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr17_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr17_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr16_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr16_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr15_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr15_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr14_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr14_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr13_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr13_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr12_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr12_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr11_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr11_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr10_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr10_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr9_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr9_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr8_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr8_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr7_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr7_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr6_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr6_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr5_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr5_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr4_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr4_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr3_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr3_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr2_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr2_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr1_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr1_hg19_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_hg19_gdc_2017_01

December 11, 2020

Existing TCGA hg38 DNA Methylation tables in the isb-cgc project were copied to the isb-cgc-bq project, TCGA and TCGA_versioned data sets. Corresponding TCGA tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.TCGA.DNA_methylation_chrY_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chrY_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chrX_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chrX_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr22_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr22_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr21_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr21_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr20_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr20_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr19_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr19_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr18_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr18_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr17_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr17_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr16_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr16_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr15_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr15_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr14_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr14_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr13_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr13_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr12_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr12_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr11_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr11_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr10_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr10_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr9_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr9_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr8_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr8_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr7_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr7_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr6_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr6_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr5_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr5_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr4_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr4_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr3_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr3_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr2_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr2_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_chr1_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_chr1_hg38_gdc_2017_01

  • isb-cgc-bq.TCGA.DNA_methylation_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.DNA_methylation_hg38_gdc_2017_01

December 4, 2020

Existing pancancer_atlas tables in the isb-cgc project were copied to the isb-cgc-bq project. Corresponding tables in the isb-cgc project were deprecated.

BigQuery tables created

The following tables were created under the isb-cgc-bq:pancer-altas data set:

  • Original_TCGA_RPPA_pancan_clean

  • Original_pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16

  • Original_jhu_usc_edu_PANCAN_merged_HumanMethylation27_HumanMethylation450_betaValue_whitelisted

  • Original_jhu_usc_edu_PANCAN_HumanMethylation450_betaValue_whitelisted

  • Original_jhu_usc_edu_PANCAN_HumanMethylation27_betaValue_whitelisted

  • Original_EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp

  • Original_clinical_PANCAN_patient_with_followup

  • Filtered_TCGA_RPPA_pancan_clean

  • Filtered_pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16

  • Filtered_MC3_MAF_V5_one_per_tumor_sample

  • Filtered_jhu_usc_edu_PANCAN_merged_HumanMethylation27_HumanMethylation450_betaValue_whitelisted

  • Filtered_jhu_usc_edu_PANCAN_HumanMethylation450_betaValue_whitelisted

  • Filtered_jhu_usc_edu_PANCAN_HumanMethylation27_betaValue_whitelisted

  • Filtered_EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp

  • Filtered_clinical_PANCAN_patient_with_followup

  • Filtered_all_CNVR_data_by_gene

  • merged_sample_quality_annotations

  • TCGA_CDR

  • Whitelist_ParticipantBarcodes

  • BarcodeMap

November 13, 2020

Existing methylation annotation and liftover tables in the isb-cgc project were copied to the isb-cgc-bq project. Corresponding tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.annotations.methylation_annotation_hg19_illumina_current

  • isb-cgc-bq.annotations_versioned.methylation_annotation_hg19_illumina_2015_06

  • isb-cgc-bq.annotations.methylation_annotation_hg38_gdc_current

  • isb-cgc-bq.annotations_versioned.methylation_annotation_hg38_gdc_2016_11

  • isb-cgc-bq.annotations.liftover_hg19_to_hg38_current

  • isb-cgc-bq.annotations_versioned.liftover_hg19_to_hg38_2016_08

  • isb-cgc-bq.annotations.methylation_liftover_hg19_illumina_to_hg38_current

  • isb-cgc-bq.annotations_versioned.methylation_liftover_hg19_illumina_to_hg38_2016_08

November 9, 2020

Existing GENCODE tables in the isb-cgc project were copied to the isb-cgc-bq project. Corresponding GENCODE tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg19_v19

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v22

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v23

  • isb-cgc-bq.GENCODE_versioned.annotation_gtf_hg38_v24

October 22, 2020

Existing TARGET tables in the isb-cgc project (data sets TARGET_bioclin_v0 and TARGET_bioclin_v0) were copied to the isb-cgc-bq project, TARGET and TARGET_versioned data sets. Corresponding TARGET tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r14

  • isb-cgc-bq.TARGET_versioned.miRNAseq_isoform_hg38_gdc_r11

  • isb-cgc-bq.TARGET.miRNAseq_isoform_hg38_gdc_current

  • isb-cgc-bq.TARGET_versioned.miRNAseq_isoform_hg38_gdc_r14

  • isb-cgc-bq.TARGET_versioned.miRNAseq_hg38_gdc_r11

  • isb-cgc-bq.TARGET_versioned.miRNAseq_hg38_gdc_r14

  • isb-cgc-bq.TARGET_versioned.RNAseq_hg38_gdc_2017_12

  • isb-cgc-bq.TARGET.biospecimen_gdc_current

  • isb-cgc-bq.TARGET_versioned.biospecimen_gdc_2017_04

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_2019_06

  • isb-cgc-bq.TARGET_versioned.clinical_gdc_2017_04

October 07, 2020

Existing TCGA tables (except for DNA Methylation) in the isb-cgc project were copied to the isb-cgc-bq project, TCGA and TCGA_versioned data sets. Corresponding TCGA tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.TCGA.annotations_gdc_current

  • isb-cgc-bq.TCGA_versioned.annotations_gdc_2017_04

  • isb-cgc-bq.TCGA.biospecimen_gdc_current

  • isb-cgc-bq.TCGA_versioned.biospecimen_gdc_2017_02

  • isb-cgc-bq.TCGA_versioned.clinical_gdc_2018_06

  • isb-cgc-bq.TCGA_versioned.clinical_gdc_2019_06

  • isb-cgc-bq.TCGA.slide_images_gdc_current

  • isb-cgc-bq.TCGA_versioned.slide_images_gdc_r17

  • isb-cgc-bq.TCGA.radiology_images_tcia_current

  • isb-cgc-bq.TCGA_versioned.radiology_images_tcia_2018_06

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg19_gdc_r14

  • isb-cgc-bq.TCGA_versioned.per_sample_file_metadata_hg38_gdc_r14

  • isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_MC3_2017_02

  • isb-cgc-bq.TCGA_versioned.somatic_mutation_hg19_DCC_2017_02

  • isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r6

  • isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r7

  • isb-cgc-bq.TCGA.somatic_mutation_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.somatic_mutation_hg38_gdc_r10

  • isb-cgc-bq.TCGA.miRNAseq_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.miRNAseq_hg19_gdc_2017_03

  • isb-cgc-bq.TCGA.miRNAseq_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.miRNAseq_hg38_gdc_r14

  • isb-cgc-bq.TCGA.protein_expression_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.protein_expression_hg19_gdc_2017_02

  • isb-cgc-bq.TCGA.protein_expression_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.protein_expression_hg38_gdc_2017_02

  • isb-cgc-bq.TCGA.miRNAseq_isoform_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.miRNAseq_isoform_hg19_gdc_2017_02

  • isb-cgc-bq.TCGA.miRNAseq_isoform_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.miRNAseq_isoform_hg38_gdc_r14

  • isb-cgc-bq.TCGA.RNAseq_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.RNAseq_hg19_gdc_2017_02

  • isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.RNAseq_hg38_gdc_2017_12

  • isb-cgc-bq.TCGA_versioned.copy_number_segment_masked_hg38_gdc_2017_02

  • isb-cgc-bq.TCGA.copy_number_segment_masked_hg19_gdc_current

  • isb-cgc-bq.TCGA_versioned.copy_number_segment_masked_hg19_gdc_2017_02

  • isb-cgc-bq.TCGA.copy_number_segment_masked_hg38_gdc_current

  • isb-cgc-bq.TCGA_versioned.copy_number_segment_masked_hg38_gdc_r14

September 3, 2020

Existing CCLE tables in the isb-cgc project were copied to the isb-cgc-bq project, CCLE and CCLE_versioned data sets. Corresponding CCLE tables in the isb-cgc project were deprecated.

BigQuery tables created

  • isb-cgc-bq.CCLE_versioned.clinical_2019_06

  • isb-cgc-bq.CCLE.clinical_current

  • isb-cgc-bq.CCLE_versioned.biospecimen_2019_04

  • isb-cgc-bq.CCLE.biospecimen_current

  • isb-cgc-bq.CCLE_versioned.sample_information_hg19_2016_02

  • isb-cgc-bq.CCLE.sample_information_hg19_current

  • isb-cgc-bq.CCLE_versioned.RMA_expression_hg19_2016_02

  • isb-cgc-bq.CCLE.RMA_expression_hg19_current

  • isb-cgc-bq.CCLE_versioned.copy_number_segment_hg19_2016_02

  • isb-cgc-bq.CCLE.copy_number_segment_hg19_current

  • isb-cgc-bq.CCLE_versioned.somatic_mutation_hg19_2016_02

  • isb-cgc-bq.CCLE.somatic_mutation_hg19_current

  • isb-cgc-bq.CCLE_versioned.file_metadata_hg19_2016_03

  • isb-cgc-bq.CCLE_versioned.fastqc_metrics_hg19_2016_03

  • isb-cgc-bq.CCLE.fastqc_metrics_hg19_current

  • isb-cgc-bq.CCLE_versioned.per_sample_file_metadata_hg19_gdc_r14


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC BigQuery Table Search Release Notes

To learn about this discovery tool created by the ISB-CGC, please visit ISB-CGC BigQuery Table Search.

For more detailed information about the data stored in ISB-CGC BigQuery tables please visit ISB-CGC BigQuery Tables.

September 8, 2021

New Features

On the Search results, added an “Example Joins” column. This column specifies the number of example join queries, for the table on that row, which are provided by the BigQuery Table Search.

Functionality includes:

  • Click the number in the “Example Joins” column to see a list of examples.

  • From there, click on View Details for a particular example to see the SQL Query and a longer description.

  • On the View Details screen, click on COPY to copy the query to your clipboard.

December 8, 2020 v1.04

New Features

  • On the Filter panel, under Show More Filters, a filter BQ Project has been added. It has also been added to the Column selection dropdown list.

Bug Fixes

  • On multi-select filters (Program, Source, Data Type, Experimental Strategy), the X button to delete the selected value did not completely display. It has been fixed so that the entire X button displays.

July 23, 2020 v1.03

New Features

  • The Access filter has been added, which has options of All, Open Access and Controlled Access. Controlled Access data cannot be previewed, but can be opened in the Google BigQuery Console, if the user has the required permissions.

March 11, 2020 v1.02

New Features

  • Users now have the ability to access, query and inspect in detail BigQuery tables in Google Cloud Platform’s BigQuery console directly from the Table Search UI. Every table has a “Open” option which when clicked will send the user to the table in the BigQuery console on the Google Cloud.

  • An Open button, with the same functionality as described above, has been added to the Schema Description section.

  • Program and Experimental Strategy filters were added.

  • Values for the Data Type and Source filters have been modified in order to align more closely with GDC naming conventions.

January 30,2020 v1.01

New Features

  • A “Name” column consisting of user-friendly descriptive names for the BigQuery tables has been introduced.

  • The Name filter, a free-form text search field is now available allowing users to search for all or a portion of the user-friendly descriptive names.

  • Columns can be now added or removed from the display by using the Columns selector option.

  • By default, Dataset ID and Table ID are no longer initially displayed in the full column view, but can be added to the display using the columns selector.

  • The Full ID, which is denoted [projectID.datasetID.TableID] (concatenation of the project ID, dataset ID and the Table ID, each separated by a period symbol) is listed under the detailed table information section found after clicking on the blue plus sign.

  • A Copy button, found adjacent to the Full ID has been added. The Full ID adheres to BigQuery Standard SQL format and contains the necessary grave accents (`) required for executing SQL queries in BigQuery. When copied to the clipboard, the Full ID can be directly used to run queries in BigQuery Query Editor without any further manual modifications.

Enhancements

  • Individual table schemas captured by the “Fields” column in the CSV download now contain field information in comma-separated format.

November 26, 2019 v1.0

Initial Release

The ISB-CGC BigQuery Table Search UI is a discovery tool that allows users to explore and search for ISB-CGC hosted BigQuery tables. It can be accessed directly from the ISB-CGC homepage.

Major features in the initial release include:

  • The ability to search for BigQuery tables by multiple filters:

  • Status

  • Categories

  • Reference Genome Build

  • Source

  • Data Type

  • Dataset ID

  • Table ID

  • Table Description

  • Labels

  • Field Name

  • Display of search results in a tabular format, with the following information about BigQuery tables:

  • Dataset ID

  • Table ID

  • Status

  • Source

  • Data Type

  • Num Rows

  • Created Date

  • Detailed schema information for each table, including full table ID, table description, and field descriptions.

  • The ability to preview the first eight rows in the BigQuery table of choice.

  • The ability to download a CSV format file of search results.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Mitelman Database Release Notes

To search this database hosted by the ISB-CGC, please visit Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer.

For more detailed information about the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer, see Mitelman Database.

July 27, 2022

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 73,930

  • Total number of unique gene fusions 33,457

  • Total number of genes involved 14,061

June 6, 2022

Enhancements and New Features

Mitelman Database Now Includes Genomic Coordinates

Until June 2022, the resulting genetic location information retrieved from the database was only displayed in karyotypes. Now, genomic coordinates are also displayed. Thanks to procedures incorporated from the web-based tool CytoConverter, karyotypes are converted to genomic coordinates and can be optionally viewed by the Mitelman Database user.

The user has the option of viewing the genomic coordinate information for either individual karyotypes or for multiple karyotypes in a search result. For individual karyotypes, the corresponding chromosome and its start and end position are given. In addition, the type of imbalance (gain or loss) is noted. For multiple karyotypes in the search results, net imbalances across the selected group are displayed in chart, ideogram or tabular format; information includes the chromosome affected, start and end positions, and whether the segment has been lost or gained.

April 18, 2022

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 72,718

  • Total number of unique gene fusions 32,962

  • Total number of genes involved 14,016

January 18, 2022

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 72,421

  • Total number of unique gene fusions 32,855

  • Total number of genes involved 14,022

Enhancements and New Features

According to the recent recommendations of The Hugo Genome Nomenclature Committee (HGNC), the designations of all fusion genes have been changed from forward slash (/) to double colon (::). This affects the searches: “Gene Fusions”, Clinical Associations”, and “Recurrent Chromosome Aberrations”.

October 15, 2021

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 72,105

  • Total number of unique gene fusions 32,795

  • Total number of genes involved 14,023

Enhancements and New Features

  • Removed the size limit on the search results: User can perform blank searches to retrieve the full data.

  • View SQL Statements: User can view and utilize the new SQL statement that was used to perform the search.

July 15, 2021

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 71,734

  • Total number of unique gene fusions 32,721

  • Total number of genes involved 14,019

Enhancements

Security enhancement (including Data Tables package version update)

April 15, 2021

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 71,298

  • Total number of unique gene fusions 32,677

  • Total number of genes involved 14,020

Bug Fixes

Gene Fusion Search failed to return gene fusion results if searched by gene names with a hyphen (‘-’) in it (e.g. ARPC4-TTLL3). This has been fixed.

January 15, 2021

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 71,149

  • Total number of unique gene fusions 32,618

  • Total number of genes involved 14,016

October 26, 2020

Bug Fixes

Cases Cytogenetics Searcher: Using ‘Sole Abnormality’ flag with a ‘Breakpoint’ entry will now search cases with karyotypes of sole abnormality with the specified breakpoint.

October 15, 2020

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 70,818

  • Total number of unique gene fusions 32,578

  • Total number of genes involved 14,014

July 15, 2020

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 70,469

  • Total number of unique gene fusions 32,551

  • Total number of genes involved 14,014

April 15, 2020

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer quarterly update.

Updated totals

  • Total number of cases 70,236

  • Total number of unique gene fusions 31,626

  • Total number of genes involved 13,913

Other changes

  • New Mitelman Database Logo

August 27, 2019

Initial Release

  • Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer released on the ISB-CGC platform.

The following searches are available:

  • Cases Cytogenetics Searcher

  • Gene Fusions Searcher

  • Clinical Associations Searcher

  • Recurrent Chromosome Aberrations Searcher

  • References Searcher


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

The TP53 Database Release Notes

The TP53 Database compiles various types of data and information from the literature and generalist databases on human TP53 gene variations related to cancer. ISB-CGC started hosting this database on October 25, 2021.

  • Database Release Notes are on The TP53 Database application About page.

  • Application Release Notes are on The TP53 Database application Release Notes page.

To search this database hosted by the ISB-CGC, please visit The TP53 Database at https://tp53.isb-cgc.org.

For more detailed information about The TP53 Database, see the documentation.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

ISB-CGC Web App Release Notes

September 8, 2021

  • On the ISB-CGC home page, the banner with the link to the survey has been removed.

  • On the Tutorials for Workflow on Google Cloud page, a GeneFlow RNA-seq workflow has been added. (From the ISB-CGC home page, on the Pipelines and APIs box, click Launch. Then on the displayed Pipelines and APIs page, click the Tutorials for Workflow on Google Cloud box.)

  • From the Web App Programs page, remove the ability for the user to upload their own program.

  • On the Create Cohort – Filters page:

    • Add a Clear All option to the Cohort Filters panel.

    • Previously, if all the filters under a program name were removed, the program name remained in the Cohort Filters box. It has been changed so that if the last filter for the program is removed, the program name is also removed from the Cohort Filters panel.

  • On the ISB-CGC BigQuery Table Search, on the search results, added an “Example Joins” column. This column specifies the number of example join queries, for the table on that row, which are provided by the BigQuery Table Search. Functionality includes:

    • Click the number in the “Example Joins” column to see a list of examples.

    • From there, click on View Details for a particular example to see the SQL Query and a longer description.

    • On the View Details screen, click on COPY to copy the query to your clipboard.

July 19, 2021

  • The Warning Notice about accessing a government website that should pop up when ISB-CGC is accessed was missing. It has been reinstated.

  • There was an issue loading the Variable selection page. This has been fixed.

  • On the Create and Edit Variable screens, the data set/programs tabs have been changed to a drop down.

  • In some cases when creating a cohort with multiple programs, the Data Set Clinical Features panel was not displaying the appropriate information. This has been corrected.

  • The system now logs when users refresh their 24 hour access at DCF so that this can be monitored.

June 24, 2021

New Features

  • The Citations link on the top-level menu has been replaced with a Publications link. The page now includes ISB-CGC publications and posters, as well as citations.

  • Enhance the Cohort Builder UI and Cohort Details UI so that additional node based GDC and PDC Programs can be selected and edited. The Programs now available are:

    • GDC

      • TCGA

      • TARGET

      • CCLE

      • BEATAML1.0

      • FM

      • MMRF

      • OHSU

    • PDC

      • Georgetown Proteomics (GPRP)

  • On the Cohort Builder, replace the Selected Filters panel with the Cohort Filters panel. This panel displays all selected filters for all selected Programs.

April 13, 2021

New Features

  • On the home page, the title in each box is now clickable and will bring the user to that function.

  • A Citations link has been added to the top level menu. Papers which reference ISB-CGC are listed.

  • In the Cancer Data File Browser, when viewing a cohort that only has HG19 data, the Build filter will automatically be set to HG19.

Bug Fixes

  • Fixed the following issues in the Cancer Data File Browser:

    • When selecting Build of HG19 and Data Format of Zip, incorrect results were displayed.

    • When selecting Build of HG19 and Program of TCGA, the incorrect number of entries was displayed under the File Listing.

    • When selecting Build of HG19 and Data Format of Raw sequencing data, the number of entries displayed under the File Listing was off by one.

February 22, 2021

New Features

  • In the Cohort Builder (via filters), there is a new option to hide filter attributes with zero counts.

  • In the Cancer Data File Browser - Pathology Image Viewer, Pathology Report Viewer and Radiology Image Viewer, login is no longer required.

  • A survey link has been added to the ISB-CGC home page.

  • On the home page, the icons next to the title in each box are now clickable and will bring the user to that function.

  • Security update.

Bug Fixes

  • In the Cancer Data File Browser - Radiology Image Viewer, error handing for records with missing disease code or project short name has been added.

  • In the Cancer Data File Browser, the display of the ‘Next’ button for paginations has been fixed.

  • In the Cohort Builder (via filters), inaccurate filter and sample counts for numeric range type filters (e.g. Age at diagnosis) were corrected.

December 8, 2020

New Features

  • In the Cancer Data File Browser, BEATAML1.0 has been added as a choice under filter Program Name. It has also been added to the Cohort Builder/Data Explorer.

  • On the ISB-CGC home page, a Contact Us page has been added under the Help dropdown menu item.

  • On the ISB-CGC home page, a step by step guide called How to Discover Cancer Data through ISB-CGC has been added.

  • There is a registration maximum of six programs/datasets to a service account implementation from the DCF.

  • Bootstrap library upgraded from version 3.3.1 to 3.4.1.

Known Issues

  • Work is underway to rework our cohort creation page to better display images associated with samples.

  • The user data upload feature will return an error message stating, “Error submitting response : Could not connect to data upload server.”

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

August 19, 2020

New Features

  • ISB-CGC has a new home page, which prominently features ISB-CGC Data Browsers and Resources.

  • A Cancer Data File Browser is now available directly from the ISB-CGC home page. It is similar to the existing File Browser within the Web App, except:

    • Sign in is not needed.

    • It is not dependent on cohorts built through the Web App.

    • Output can be downloaded to CSV. To download to Google Storage Buckets or Google BigQuery tables, the user must sign in.

    • Program filter has been added.

  • The ISB-CGC home page includes a Programmatic API section. The Launch functionality includes links to:

    • ISB-CGC API;

    • Tutorials for Workflow on Google Cloud;

    • Comparison of Workflow Languages.

  • All NA filters have been renamed to None in both the File Browser page and the Cohort Builder page.

  • User details page has been modified to include more icons and buttons. The essential functionality and contents are not modified.

Known Issues

  • Work is underway to rework our cohort creation page to better display images associated with samples.

  • The user data upload feature will return an error message stating, “Error submitting response : Could not connect to data upload server.”

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

July 23, 2020

New Features

  • To increase system speed when filtering cohorts, switched metadata counting to use Apache Solr (instead of MySQL).

  • The WebApp is now performing its data retrieval and counts on ISB-CGC Google BigQuery tables which are based on the latest GDC data release. This means that you will see current data, but that the same queries in the WebApp could produce different results if they were run during different time periods, when the WebApp was based on different GDC data releases.

  • On the Create Cohorts – Filters page, on the left-hand filter panel, display the number of cases available for each filter, instead of the number of samples.

  • Within the Cohort Details page, on the Current Filters panel, when there are more filters than what fits on the initial screen, display the selected cohort filters in a gradient (fade-away) overlay instead of a clipped design.

  • The video tutorials have been moved to the ISB-CGC YouTube channel.

Bug Fixes

  • Clicking on the X on an existing cohort filter token in the Selected Filter panel did not delete the existing cohort filter token. (This issue was caused by a jQuery update.) It has now been fixed.

  • When a user tried to register for controlled access for 12 or more programs, this caused an error from Data Commons Framework (DCF) to occur. This was fixed by limiting the number of controlled access programs that a user could register for at one time to six.

Known Issues

  • The Program filter is listing ‘NA’ as an option.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

  • The user data upload feature will return an error message stating, “Error submitting response : Could not connect to data upload server.”

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

May 27, 2020

New Features

  • Modify the opt-in (subscription) form to have an “Ask me later” option.

  • Provide a link (https://isb-cgc.appspot.com/opt_in/form_reg_user/) to the opt-in page. The link will first prompt the user to login with their google ID (if they are not already logged in). After the login, the feedback page will open.

Bug Fixes

  • When writing and saving a comment in the cohort details or worksheet sections, the system displayed underlying code (such as escape characters) along with the text entered in the Comments panel. This has been corrected.

  • Some data results were not displaying when working with OncoGrid due to it being unable to handle the amount of data being processed. This has been fixed.

  • All plotting components under the Plot settings should be disabled when user views a shared workbook; however, ‘Plot by’ and ‘Plot as Log’ were not. This has been fixed.

  • On analysis plots for workbooks, sometimes the y-axis tick marks would overlap the y-axis label when using the zoom out feature. This has been fixed.

  • On the Create Cohorts - Filters page, when using the program TARGET with the filter Days to Birth, the Total Number of Cases and Total Number of Samples were not displaying. Also, the Save As New Cohort button was disabled. This has been corrected.

Known Issues

  • Work is underway to rework our cohort creation page to better display images associated with samples.

  • The user data upload feature will return an error message stating, “Error submitting response : Could not connect to data upload server.”

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

April 16, 2020

New Features

  • The Cohort Creation by Filter builder is now accessible without having to log in to ISB-CGC.

  • The Cohort, Workbooks, and Gene and Variable Favorites lists are now paginated to display 10 to 15 records at a time.

  • The ‘To complete this analysis’ section on the workbook creation page has changed from a checklist to an interactive tool. After each step is completed, its icon changes from an orange arrow to a green checkmark.

  • A link ‘Learn more about our available Analyses’ was added next to the Analysis Type selection field. Clicking on this link opens up a screen with a detailed explanation of all the analysis options.

Bug Fixes

  • On the File Browser, the search by CASE filter on the Radiology Images tab has been fixed.

  • ‘How to Cite Us’ text on the Home page has been updated to reflect the entire ISB-CGC platform.

  • When using a workbook, if you completely zoomed out of a plot, the chart was being reduced to half of the screen. This has been corrected.

Known Issues

  • Work is underway to rework our cohort creation page to better display images associated with samples.

  • The workbook zoom-out feature will cause text overlap in the y-axis panel of analysis.

  • The user data upload feature will return an error message stating, “Error submitting response : Could not connect to data upload server.”

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

March 11, 2020

New Features

  • An Opt-in page was created for the user to sign up for ISB-CGC announcements.

Bug Fixes

  • When working with the ISB-CGC API DELETE/cohorts/{cohort_id}, only able to delete cohorts owned by authenticated user.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

January 30, 2020

The following datasets (open and controlled access) have been added to the ISB-CGC for service account registration:

  1. Genomics Evidence Neoplasia Information Exchange (GENIE)

  2. The Pancreas Cancer Organoid Profiling (ORGANOID)

  3. The Multiple Myeloma CoMMpass Study (MMRF)

  4. Burkitt Lymphoma Genome Sequencing Project (CGCI)

  5. Acute Lymphoblastic Leukemia - Phase I (TARGET-ALL-P1)

  6. Acute Lymphoblastic Leukemia - Phase II (TARGET-ALL-P2)

  7. Functional Genomic Landscape of Acute Myeloid Leukemia (BEATAML1.0-COHORT)

New Features

  • The File Browser is enabled to define cancer names under the Disease Code filter in the left panel.

Bug Fixes

  • The Cohorts share button is now enabled from the cohorts list page.

  • The Cohort builder - filters, when using Pathologic Stage filter, the filters display in the correct format.

  • Add a gene & miRNA variable favorite list from menu bar selection is now enabled.

November 26, 2019 v1.21

New Features

APIs

  • Endpoint GET/data/available/registration lists all possible open and controlled programs available for registration with a service account.

  • Endpoint GET/data/available/cohorts list all possible programs and projects available to use to make a cohort of the data available.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion.

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

August 27, 2019 v1.20

The following datasets (open and controlled access) have been added to the ISB-CGC for service account registration:

  1. The Human Cancer Models Initiative (HCMI)

  2. The Functional Genomic Landscape of Acute Myeloid Leukemia (BEATAML1.0)

New Features

  • ISB-CGC APIs have been updated to a Swagger user interface as well as Google Endpoints OpenAPI, now known as APIsv4.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

July 18, 2019 v3.19

The following datasets (open and controlled access) have been added to the ISB-CGC for service account registration:

  1. The Clinical Proteomic Tumor Analysis Consortium (CPTAC)

New Features

Workbooks

  • Edit plot settings feature provides the ability to plot by either cases or samples barcode count for a bar chart, histogram, scatter plot, violin plot, and cubby hole plot analyses.

  • Detailed information provided by dbGaP for every program available when registering a Google service account.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

April 25, 2019 v3.18

The following datasets (open and controlled access) have been added to the ISB-CGC for service account registration:

  1. The National Cancer Institute Center for Cancer Research (NCICCR)

  2. Foundation Medicine (FM)

  3. Clinical Trial Sequencing Project (CTSP)

  4. Veterans Research for Precision Oncology Program (VAREPOP)

  5. Acute Lymphoblastic Leukemia - Phase III (TARGET-ALL-P3)

Enhancements

  • When working with Oncogrid, OncoPrint, or a SeqPeek plot on a workbook, you will receive an automated list of genes ready for analysis.

  • When on an additional workbook, text has been added to guide the user to select edit plot settings to choose a gene/miRNA/variable filter and cohort to used in the selected analysis.

  • The Workbook comments section has been reformatted to better align with analysis displayed.

  • On the cohort creation - filter page, the filters have been updated in the left filter panel to specify the count type displayed (samples).

Bug Fixes

  • Clicking on a legend entry to toggle the display of the data points on a scatter or violin plot will now work correctly, even if the legend text has a space.

  • Plotting with sample type filter on a workbook will now display counts correctly.

  • When working with the color by feature on either a Scatter plot or a Violin plot, the numerical values are now displayed as a color-gradient legend.

  • When using a workbook with OncoGrid analysis you are now able to plot using genomic build hg19.

  • When using a workbook with a Cubby Hole plot analysis text is no longer cut off when using sample type or residual tumor as a filter.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

March 8, 2019 v3.17

Enhancements

  • When working with a workbook many overall enhancements of user functionality have been improved.

  • Cubby hole plot analysis has been reformatted to better suit the end user by now allowing resizing and scrolling through the cubby hole plot analysis.

  • You are now able to work on a workbook via fullscreen for added comfort.

  • You are also now able to download plot data for Bar charts, Histogram charts, Scatter plots, Violin plot charts, and Cubby hole plots as a CSV file.

  • OncoGrid has been added as an analysis option when working with a workbook.

  • On the File Browser section you are now able to use full screen on all image viewers.

  • On the register/adjust a service account page, we’ve clarified the notification message if a key or role is found associated to a service account.

Bug Fixes

  • When using a workbook you will no longer see text overlap when working on a violin/scatter plot with the color by feature sample type as filter option.

  • When working on the Pathology images viewer you will no longer see text overlap on the top right hand side of viewer.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

January 22, 2019 v3.16

Enhancements

  • On the Gene list creation page, you can now upload line separated and tab separated gene lists to be used for analysis.

  • We have made some updates to the workbooks plotting section.

  • You are now able to redraw to the original plot after any changes.

  • Plots are now able to be saved as a .SVG, .PNG, or .JSON file.

Bug Fixes

  • On the cohort creation using the barcode upload feature, the table page list feature now is now displayed properly.

  • If you have not linked to the Data Commons Framework at all you are able to unregister a Google Cloud Project. If you are not linked to the Data Commons Framework, but others in the Google Cloud project are, only they will be able to unregister the GCP.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

December 5, 2018 v3.15

Enhancements

  • The ISB-CGC homepage has been updated to provide Funding and Partnership information, and the About Us section is now hidden by default.

  • An introduction video has been added to the videos tutorials section. This video covers the user interface, BigQuery and using the API endpoints.

  • Funding information has been updated on the ISB-CGC homepage.

  • On the Register/Adjust a service account page all spacing issues have been addressed.

  • On the Register/Adjust a service account pages you are now returned more detailed information. You will be returned verification results for all users on the Google Cloud Project, datasets permissions verification, registered service account verification results, and all service accounts verification results.

  • On the File Browser page, when working with on a cohort with CCLE data included for genomic build hg38 you are displayed a notification message for CSV export button.

  • On the File Browser a new column has been added for File Size for all tabs.

  • When exporting a large cohort on the File Browser page you are returned a notification message stating cohort export is underway to check BigQuery in a few minutes.

  • On the File Browser you are now able to view/download/print Pathology Reports in pdf format.

  • On the Pathology Images viewer, the GDC has released multiple versions of slide barcodes. To handle this we now sort the pathology image files by UUID.

  • On the the File Browser for Radiology Images, ISB-CGC has upgraded the viewer to run OHIF for better performance times and views.

Bug Fixes

  • When working on the File Browser export to BigQuery/Google Cloud Storage entering an invalid name will disable the export feature, even after toggling between datasets.

  • When on a Workbook, using an OncoPrint analysis using certain genes with no gene positions will return correct error message stating no internal feature ID was found.

  • Certain gene names which symbol ‘_’ included will now return data points when working with a Workbook.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to the Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcode error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated with them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • Work is underway to rework our cohort creation page to better display images associated with samples.

September 20, 2018 v3.14

Enhancements

  • When on the File browser page, the case barcode column is included when downloading the file manifest CSV format option.

  • You will now need to log into the Data Commons Framework to be able to access controlled data.

Bug Fixes

  • API endpoint cohort.creation will no longer include NULL values in sample counts when cohort is created.

  • On the File Browser tab using filter option NA will now return all entries associated to it.

  • Program TCGA and TARGET have new miRNA based on the GDC release 11 is now available in Google BigQuery and for plotting.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • On the File Browser page for Diagnostic images there is no GDC file UUID associated to them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • When using a workbook, a gene with symbol “_” will produce a error message saying, “There was an error retrieving plot data. Please try again.”

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

July 31, 2018 v3.13

Enhancements

  • When working on the File Browser you now have the ability to search by case barcode all on tabs(Pathology Images, Radiology Images, IGV Browser, All Files).

  • On the File Browser page for the Pathology Images tab, you can now also filter by Disease Code, Data Format, and Data Type. For the Radiology Images, a disease code was added.

  • On the File Browser page, you now have the ability to hide the filters and expand the file list to full width.

  • On the File Browser page, if you download the file manifest using the export CSV feature, you will see newly updated file paths. The older paths are still in existence but will be deleted within the next month.

  • On the File Browser page if you use a cohort with CCLE data present, switch to build hg38 and attempt to export you will return a notification no CCLE data will be present for build hg38.

  • On the homepage, we have added a carousel scrolling feature for all how-to videos for easy access.

  • A description has been added to all video tutorials.

  • The menu bar text variable favorites have been updated to be undifferentiated.

Bug Fixes

  • When creating a cohort using the filter selection option, if the filter options selected add up to zero the save cohort button will be disabled.

  • A workbook with user upload data and public data e.g TCGA data will plot any analyses.

  • For the export to GCS and BigQuery feature the export button will now disable when an invalid name is given.

  • On a registered Google Cloud Project detail page, datasets can no longer be duplicated within a project, and bucket names are globally unique (across all projects).

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • API endpoint cohort.creation will include NULL values in sample counts when the cohort is created.

  • On the File Browser page for Diagnostic images, there is no GDC file UUID associated to them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • When downloading the CSV file for Radiology Images tab on the File Browser page you will noticed there are no samples barcodes associated to Radiology Images. ISB-CGC will add a case barocde to the CSV file export table in the next release.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

June 18, 2018 v3.12

Enhancements

  • The ISB-CGC has enabled OncoPrint visualization tool for germline mutations (codebase obtained with permission from cBioPortal) as another Workbook analysis tool. For more information please go here.

  • You are now able to view Radiology Images from TCIA data through the File Browser using the Osimis viewer. For more information please go here here.

  • Two new videos have been added to our video tutorials section. You can now learn how to sign up with a Google account and how to make a gene list easily. For more information please go here. here

  • The Dashboard has been upgraded to include a collapse feature for all panels (workbooks and cohorts are opened by default) and a direct link to the File Browser has been added to the Cohorts panel.

  • Under cohort creation by filters, the Molecular tab for TCGA data has been upgraded to combine multiple gene mutation filters. Filters can be combined using AND (requires all filters to be met for the data to be filtered) or OR (at least one criteria needs to be met for the data to be displayed).

  • The CSV download, Export to BigQuery, and Export to GCS feature has been added to the IGV Browser, Pathology Images, and the Radiology Images tab on the File Browser.

  • On the File Browser All files tab the clinical filter now displays the accurate count available for analysis.

  • The File Browser has been upgraded to now include the option of which columns to display and the ability to jump to any page.

  • The site menu has been improved to allow faster load times and better overall performance. Please Note that Workbooks must now be created from a data source (Cohorts, Variable lists, Gene & miRNA lists) or from the Workbook list page.

Bug Fixes

  • When working on Firefox browser a violin plot will display the data plotted correctly when working on a Worksheet.

  • A cohort with user uploaded data present and public data present in our system e.g TCGA data, the cohort details page for the selected filters panel will sort the filters by their appropriate program.

  • On the cohort creation - barcode upload page the ‘Samples’ and ‘Cases’ column headers were sometimes swapped. This has been corrected.

  • When trying to reload a stored Seq-Peek plot from a Workbook the previous gene selection is stored and the plot will automatically be loaded.

  • On the File Browser IGV Browser tab when switching genomic builds the view column selection option will be disabled.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • API endpoint cohort.creation will include NULL values in sample counts when cohort is created.

  • Duplicate entries can be entered for the register a dataset and the register a bucket on the Google cloud project details page.

  • On the File Browser page for Diagnostic images there is no GDC file UUID associated to them.

  • Sharing a workbook with someone else will cause the analysis to reset.

  • A Workbook using a cohort that has user uploaded data and public TCGA data present will not return data for any analysis.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

May 3, 2018 v3.11

Enhancements

  • The export to BigQuery feature has been enhanced to include faster processing time for larger cohorts with e.g 30,000 > samples and 65,000 > file records.

  • You are now able to export cohort and cohort file manifests to a Google Cloud Storage using either .JSON or .CSV format from the cohort details page and from the File Browser page.

  • We have enhanced our instructions associated with buttons to further provide directions to the end-users.

  • On the File Browser page it is now possible to change how many entries are displayed at a time, as well as sort columns by clicking on the column header.

  • Google Cloud Project membership is now automatically updated every six hours. If you are adding someone new to the project they will be able to use the project after six hours maximum without someone having to log in and manually refresh the project.

Bug Fixes

  • You can no longer share a cohort with yourself (email currently logged into) and cause the file browser page to disable.

  • DNA methylation has been re-enabled to be used with hg38 and hg19 data when working with workbooks and plotting.

  • Sharing inputs have had their security restrictions tightened. This also includes the registering a service account page.

  • On the File Browser page when downloading the file manifest via the CSV button you are no longer able to re-select the CSV button while the file is building.

  • On the File Browser tab if you toggle between entries pages on the All Files tab it will not affect the IGV tab or Pathology Images tab entries counts display.

  • On the File Browser page you can now freely toggle between entries pages with no errors displayed.

  • On the File Browser page selecting filters from the left hand side while exploring pages will no longer crash and require you to back or refresh the page to fix.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • API endpoint cohort.creation will include NULL values in sample counts when cohort is created.

  • Duplicate entries can be entered for the register a dataset and the register a bucket on the Google cloud project details page.

  • A cohort with user uploaded data present and public data present in our system e.g TCGA data, the cohort details page for the selected filters panel does not properly display the filters selected.

  • On the File Browser page for Diagnostic images there is no GDC file UUID associated to them.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

April 2, 2018 v3.10

Enhancements

  • When working with the File List table you can now Export the cohort file list to BigQuery for later analysis.

  • When registering or adjusting a service account to use controlled data, the page will no longer briefly appear as if no datasets had been selected. This should reduce confusion.

  • Selecting the refresh project button from a registered Google Cloud Project details page will leave you on the details page rather than redirecting you to the registered Google cloud project list table page.

  • On the cohort creation page, using the barcode upload page, the valid/invalid entries table can now be sorted by on any column with either ascending/descending order.

  • Removing someone from the IAM and Admin list does not remove them from the web-app automatically. If the removed user still has the GCP present in their webapp interface attempting to register or refresh a service account will remove the GCP from the web app, and a display message informing them they are no longer a member of the project will be seen.

  • When working with any tables that can be sorted on smaller screens, there is no longer any text overlap in the table columns.

  • Character restrictions has been relaxed, you can now use characters such as []{}(); for entity names and descriptions.

Bug Fixes

  • SeqPeek and CNVR can only be plotted with TCGA data, but if a cohort contains no TCGA samples the SeqPeek analysis will now return an error message saying, “The chosen cohorts do not contain samples from programs with Gene Mutation data.”

  • API endpoint samples.get can now be used to return data for all three programs.

  • On the adjust service account page, when attempting to remove the service account from being able to access controlled data, and then immediately trying to add the service account back to controlled data, the system will require you to verify the service account’s users again.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • On the cohort File List Browser page, while you are downloading CSV files, other filters can be selected.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

February 28, 2018 v3.9

Enhancements

  • On the register a Google Cloud Project you now can only register the project ID. Registering the project name or project number will now result in an error message. Additionally, the GCP Project Name and ID will now both display on the GCP detail and list pages, and refreshing a GCP Project in the Web Application will update the Name if it was changed in the GCP console.

  • For cohort creation via sets of barcodes, the barcode set (pasted in the text box or uploaded as a file) can now be a simple list of sample or case barcodes separated by newlines, commas, or tabs; the program listing is no longer needed, and you don’t need to supply the barcodes in a distinct columnar format.. The previous 3-column format will continue to work as well.

  • On a worksheet, if no table is being searched the BQ table(s) used panel becomes inactive.

Bug Fixes

  • When editing the name of a cohort the cancel feature is now working properly.

  • When working on a worksheet the SeqPeek feature will now work with all genes.

  • All genes can be plotted on a worksheet when working with a histogram.

  • When registered Service Accounts for controlled data, the Adjust/Register can only be clicked once.

  • When working with SeqPeek, the BQ table(s) used panel will now refresh every time even if no new data is plotted.

  • When a user is removed from their Google project the user interface doesn’t remove the project from their list. Instead, the individual removed will receive error messages saying they are no longer on the project if they try to refresh the project or register the service account.

  • On a registered Google Cloud Project page, the refresh button will now properly add and remove users from the project if they are added or removed from the IAM and Admin list on the Google console.

  • When working on the Internet Explorer you can again create a cohort using the filter creation page.

  • When using the dbGaP eRA authentication you will now be logged out at 24 hours instead of 16 hours.

  • For cohort creation when uploading a large set of barcodes you will no longer return a 400 bad request error.

Known Issues

  • Analysis Type: Seq peek Formatting is Elongated on occasion

  • If the user shares a Cohort, neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale, it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC, you may get an invalid barcodes error message and unable to upload all the barcodes.

  • SeqPeek and CNVR can only be plotted with TCGA data, but if a cohort contains no TCGA samples the SeqPeek analysis will still search the TCGA BigQuery tables

  • API endpoint samples.get currently down and will return a 503 error for all three programs.

  • On the File Browser page, while you are downloading CSV files, other filters can be selected.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

February 1, 2018 v3.8

Enhancements

  • We have enabled DNA methylation data to be used when plotting with genomic build hg38.

  • The cohort view files page has been updated to File Browser. The File Browser page also now has new filters data level, data type, disease code, data format, and experimental strategy. A time stamp has also been added to the CSV file that can be downloaded.

  • The IGV browser and caMicroscope are now more clearly defined and separated on the File Browser page.

  • When uploading a set of barcodes to create a cohort the error message has been redefined to direct someone to the instructions.

Bug Fixes

  • You can now plot DNA methylation data using genomic build hg19 when working on a worksheet.

  • When registering a service account to controlled data you will no longer receive an error message when certain Google managed service accounts are also on the IAM and Admin page.

  • On a worksheet, if you add new cohorts to a worksheet with pre-existing cohorts. Now the older and newly added cohorts are present on the worksheet for analysis.

  • When working with a worksheet you are now able to plot gene names that contain periods.

Known Issues

  • You cannot make a cohort using the cohort creation filter option on an Internet Explorer browser.

  • Analysis Type: Seq peek Formatting Elongated on occasion.

  • If the user shares a Cohort neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC you may get an invalid barcodes error message and unable to upload all the barcodes.

  • SeqPeek can only be plotted with TCGA data, but if a cohort contains no TCGA samples the SeqPeek analysis will still search the TCGA BigQuery tables.

  • API endpoint samples.get currently down and will return a 503 error for all three programs.

  • Currently unable to use TARGET data with the IGV browser to view .bam files.

  • When editing the name of a cohort the cancel feature is not working properly.

  • When working on a worksheet the SeqPeek feature is currently not working with certain genes.

  • Certain genes will produce a blank chart with no data on a worksheet when working with a histogram.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

December 20, 2017 v3.7

Enhancements

  • Using the ‘View Files’ page you can now view TCGA pathology images using caMicroscope!

  • After logging into dbGaP you are now redirected to the user details page.

  • Due to recent updates with Google, we have implemented new security requirements when working with the service accounts and attempting the access the controlled data. For more information about new requirements please go here.

Bug Fixes

  • You will no longer experience a 502 error when trying to create a new variable favorite list if you have uploaded a lot of your own data using the user data upload feature.

Known Issues

  • Analysis Type: Seq Peek formatting elongated on occasion

  • If the user shares a Cohort neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When working on a workbook if you add new cohorts to the worksheet the pre-existing cohorts will be de-selected from the worksheet.

  • If you have uploaded a lot of data using the User Data Upload feature, it is likely you will experience 502 error page when attempting to create a new variable favorite list.

  • When uploading TARGET files using the cohort barcode creation feature from the GDC you may get an invalid barcodes error message and unable to upload all the barcodes.

  • Work is underway to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

November 20, 2017 v3.6

Enhancements

  • You can now send a cohort you have created in the web application to a new BigQuery dataset or append an existing table.

  • The cohort creation by uploading barcodes feature has been extended to include .JSON and .TSV files from the Genomic Data Commons data portal.

  • Created a new API endpoint to be used to return a GCS object URL given a GDC file identifier also known as a UUID.

  • Updated the registered Google Cloud Project to clearly state if the project’s service accounts are active or not.

  • You can now enter special characters into the comments section for workbooks and cohorts e.g URL

  • On the register a service account page the Compute Engine default service account is automatically added to the enter service ID text box.

  • When creating a new cohort we have implemented a text saying, “Creating cohort…” for instances when creating a new cohort takes a little longer than usual.

  • We have significantly sped up loading times for the cohorts detail and cohorts table list page for users who have 50 + cohorts which caused slow loading time.

Bug Fixes

  • A duplication of the exact cohort will no longer happen when you select the confirmation multiple times while the page is loading working with Set Operations.

  • On the cohort details, you can no longer select the clinical feature panel and edit filters without selecting the edit button first.

  • On the cohort creation page, you can use the clinical feature panel to select filters when working with the User data upload tab.

Known Issues

  • Analysis Type: Seq peek Formatting Elongated on occasion

  • If the user shares a Cohort neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • When working on a workbook if you add new cohorts to the worksheet the pre-existing cohorts will be de-selected from the worksheet.

  • If you have uploaded a lot of data using the User Data Upload feature, it is likely you will experience 502 error page when attempting to create a new variable favorite list.

  • When working with the API endpoints the sample.get for all three programs will return a 503 internal server error.

October 13, 2017 v3.5

Enhancements

  • You can now upload sample and case identifiers from programs TCGA, CCLE and TARGET to create a cohort.

  • We have begun to allow the addition/removal of a service account with a new button instead of the user having to re-register the service account every time.

  • For the Set Operations feature when working with cohorts has been enhanced and has become easier to work with.

  • For the Set Operation Complement feature you will now create a cohort faster than before.

  • You will now be displayed mouse over text when working with the New Workbook, Delete, Set Operations, and Share button on the Cohorts list details page.

  • The About Us link in the top left of the page has been re-named to Homepage.

Bug Fixes

  • All bam files for the TARGET program are available to be used with the IGV browser.

  • On the Cohort creation page, you can now select a filter for your Cohort by selecting an option from the Clinical Feature graphs using Histological Type for program CCLE.

Known Issues

  • Analysis Type: Seq peek Formatting Elongated on occasion

  • If the user shares a Cohort neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • The mouse-over feature is currently disabled for program TARGET with disease code ALL.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

  • We need to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

September 21,2017 v3.4

Enhancements

  • When plotting, certain values will now be displayed as categorical when before it was displayed as a numerical value e.g Tobacco Smoking History.

  • The Homepage has been updated to incorporate links for TARGET and CCLE programs.

  • The extended list of programs and projects on the new User Uploaded Data creation page is now displayed in alphabetical order.

  • On the user details page you are now shown a confirmation box when you attempt to unlink the NIH identity account associated to the Google Identity you originally logged in with.

  • When working with Workbooks you are now shown a table on the top right hand side of Worksheet which shows what BigQuery tables the information being displayed is from.

  • On the Cohort creation page you can now select a filter for your Cohort by selecting an option from the Clinical Features graphs.

  • On the user details page, if you attempt to associate you Google Identity to an NIH Identity that is already registered in the system to another Google Account you are given a yellow error message stating which email the NIH Identity is already associated to.

Bug Fixes

  • When working with Workbooks the log scale graphing option will be saved when a user comes back to the Worksheet at another time.

  • On the existing Cohorts table list page, the confirmation delete ‘blue x’ button will now remove a selected Cohort if you select another option e.g Set Operation.

  • The Google Cloud Project details page refresh wheel and delete icon are now working properly for service accounts.

  • The Cloud Project details page now lists the authorized datasets active with an associated service account.

  • When deleting a User Uploaded program you are now sent to the existing programs list page if you delete the program. If you delete the project you stay on the program details page.

  • The ownership of a Variable list, Gene and miRNa list, and User Uploaded Programs are now verified. This means you can no longer view any existing in system if you are not the original creator.

  • A confirmation on the Register a Service Account page has been implemented for service accounts when the user attempts to register.

  • On the Cohort creation when toggling between the tabs for the different programs, you now cannot switch tabs until the tab on display is loaded.

  • We need to rework our cohort creation page to better differentiate between samples which are from image data vs. those which are not.

Known Issues

  • Analysis Type : Seq peek Formatting Elongated on occasion

  • If the user shares a Cohort neither the owner nor the person who was granted access to Cohort will receive a confirmation email when sharing a Cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a Worksheet, then tries to implement the log scale it will not function properly.

  • The set operation for existing Cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • The mouse over feature is currently disabled for program TARGET with disease code ALL.

  • A very small amount of bam files for program TARGET currently have the wrong file name and cannot be used with the IGV browser.

  • When working on Firefox browser a violin plot does not display the data plotted correctly when working on a Worksheet.

August 23, 2017 v3.3

Enhancements

  • Users with NIH-approved access can now view and analyze TARGET (Therapeutically Applicable Research To Generate Effective Treatments) controlled data using service accounts and also on the IGV browser.

  • You will be returned a more detailed error message when invalid characters are used with user data uploading titles.

  • On the File list page you will be allowed to select only one genomic build at a time for clarity on which build will be used by the IGV browser.

  • When attempting to duplicate the registration of your Google Cloud Project you are given an error message saying, “A Google Cloud Project with the id xxx-xxx-xxxx already exists.”

  • If you attempt to register a service account with the same datasets it already has activated, you will be given an error message saying, “Service account xxxxxxxxxxxx-compute@developer.gserviceaccount.com already exists with these datasets, and so does not need to be registered.”

  • The Data Use Certification and Agreement covering your access to all controlled data has been added to the user details page in the interface.

  • The CCLE user.get API endpoint has been removed from the system due to the fact we do not currently host any controlled CCLE data.

  • The format of CSV file downloaded with Download IDs button from the cohort details page has been changed to display the case and sample barcodes as two separate columns.

  • From the User uploaded program detail page, you can now edit the project name and description by selecting the gear option.

Bug Fixes

  • When creating a large cohort you are no longer returned a red error message.

  • The sharing feature for Workbooks, Cohorts, and User Uploaded Programs has been re-activated. You must enter a valid email address that is present in the system to share the workbook, cohort, or user uploaded program. If they are not present in our system please feel free to invite them to the ISB-CGC website.

  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart, you can no longer select the log option. The log option only applies to numerical options.

  • When working with workbooks, selecting the Delete button multiple times will no longer result in an error, and instead return you to the Workbooks list page after successful deletion of the Workbook.

  • Users can plot user uploaded data when working with workbooks when using variables and cohorts from the same files that were uploaded.

  • The cohort.list API endpoint will display the correct cases count for cohorts listed.

  • The Download File List as CSV on the File List page will download the correct information when genomic build hg38 is selected.

  • You are no longer able to add XSS-vulnerable characters to the edit section for user uploaded data.

  • An improved error message is displayed when attempting to register a Google Project you are not associated with.

  • Making a new Gene and miRNA set from a Workbook will no longer result in lowercase gene and miRNA names.

  • The TCGA Sample.get API endpoint will no longer return a response with sample ID duplicates.

Known Issues

  • Analysis Type : Seq peek Formatting Elongated on occasion

  • If the user shares a cohort neither the owner nor the person who was granted access to cohort will receive a confirmation email when sharing a cohort.

  • CCLE data cannot be plotted when working with workbooks. ISB-CGC will resolve this functionality after the GDC formally releases CCLE data.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.

  • When working with working with workbooks the log option is not working properly for the plot settings.

  • The set operation for existing cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • When plotting, certain values will be displayed as numerical when it should be a categorical value e.g Tobacco Smoking History.

  • The mouse over feature is currently disabled for program TARGET with disease code ALL.

July 31, 2017 v3.2

Enhancements

  • You will be returned a more detailed error message when using invalid characters when working with user data uploading titles.

  • On the File list page you will are allowed to select only one genomic build at a time for better clarification of which build you will view on the IGV browser.

Bug Fixes

  • When working with Swap Values button on a worksheet, the log option selected for either axis is now carried over as well when the swap values button is selected.

  • On the IGV browser when working with TCGA data build hg38 the interface will no longer return a No feature found with name “efgr” at the bottom of the IGV browser page.

  • When working with the cohort.create API endpoint you have the ability to create a large cohort with the barcode filter without a timeout error.

  • When creating a cohort with the cohort.create API endpoint you can view the list of barcodes from the cohort details page in the ISB-CGC user interface irrelevant of size.

  • When working with the create a new variable favorites list page, you can now create a variable list using the USER DATA tab.

Known Issues

  • The sharing feature for Workbooks, Cohorts, and User Uploaded Programs is currently disabled

  • Analysis Type : Seq peek Formatting Elongated on occasion

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • Cannot plot any data if you use a CCLE data cohort on a worksheet.

  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.

  • The set operation for existing cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart you can select the log option. The log option only applies to numerical options.

  • When working with workbooks, if you select the delete confirmation button multiple times while the page is loading you will be sent to an error page.

  • You currently cannot plot user uploaded data when working with workbooks.

  • When plotting, certain values will be displayed as numerical when it should be a categorical value e.g Tobacco Smoking History.

  • The mouse over feature is currently disabled for program TARGET with disease code ALL.

  • The cohort.list API endpoint will display the incorrect cases count for cohort listed.

  • The Download File List as CSV on the File List page downloads the wrong information when genomic build hg38 is selected.

  • You are currently able to add non-whitelist characters to edit section for user uploaded data.

  • You are returned a vague error message on the register a Google Cloud Project page when attempting to register a Google Project you are not associated to.

  • The samples and cases filters have not been removed from the cohort.list API endpoint and are visible as a possible filter.

  • The user.get CCLE program API endpoint will return a 503 internal server error.

  • When creating large cohort you will be given a red error message saying, “There was an error saving your cohort; it may not have been saved correctly.”

June 14, 2017 v3.1

Known Issues

  • Analysis Type : Seq peek Formatting Elongated on occasion

  • The CCLE data in the Webapp is not exactly the same as the CCLE data in BigQuery.

  • Users cannot plot any data from a CCLE cohort on a worksheet.

  • In the Webapp, the log scale on graphs does not function properly for duplicated worksheets.

  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.

  • Swap values is not working properly for the plot settings.

  • The set operation for existing cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart you can select the log option. The log option only applies to numerical options.

  • When working with workbooks, if you select the delete confirmation button multiple times while the page is loading you will be sent to an error page.

  • You currently cannot plot user uploaded data when working with workbooks.

  • When plotting, certain values will be displayed as numerical when it should be a categorical value e.g Tobacco Smoking History.

  • On the IGV browser when working with TCGA data build hg38 you get a No feature found with name “efgr” at the bottom of the iGV browser page.

  • On the cohort creation page for TCGA data the filters disease code and project short name NA is an option which is not a valid disease.

  • The mouse over feature is currently disabled for program TARGET with disease code ALL.

  • The sharing feature for Workbooks, Cohorts, and User Uploaded Programs is currently disabled.

  • A number of TCGA and CCLE case IDs shown below will have been removed from all cohorts since they are no longer available from NCI’s Genomics Data Commons, and ISB-CGC is trying to mirror that data as closely as possible.

  • TCGA cases:

TCGA-33-4579, TCGA-35-3621, TCGA-66-2746, TCGA-66-2747, TCGA-66-2750, TCGA-66-2751, TCGA-66-2752, TCGA-AN-A0FE, TCGA-AN-A0FG, TCGA-BH-A0B2, TCGA-BR-4186, TCGA-BR-4190, TCGA-BR-4194, TCGA-BR-4195, TCGA-BR-4196, TCGA-BR-4197, TCGA-BR-4199, TCGA-BR-4200, TCGA-BR-4205, TCGA-BR-4259, TCGA-BR-4260, TCGA-BR-4261, TCGA-BR-4263, TCGA-BR-4264, TCGA-BR-4265, TCGA-BR-4266, TCGA-BR-4270, TCGA-BR-4271, TCGA-BR-4272, TCGA-BR-4273, TCGA-BR-4274, TCGA-BR-4276, TCGA-BR-4277, TCGA-BR-4278, TCGA-BR-4281, TCGA-BR-4282, TCGA-BR-4283, TCGA-BR-4284, TCGA-BR-4285, TCGA-BR-4286, TCGA-BR-4288, TCGA-BR-4291, TCGA-BR-4298, TCGA-BR-4375, TCGA-BR-4376, TCGA-DM-A286, TCGA-E2-A1IP, TCGA-F4-6857, TCGA-GN-A261, TCGA-O2-A5IC, TCGA-PN-A8M9

  • CCLE cases:

LS123, LS1034

  • The number of cases and samples when viewed in the User Interface as compared to the BigQuery tables vary across all three projects (TCGA, TARGET, and CCLE). This is because the user interface reflects the data available at the Genomic Data Commons, whereas data in BigQuery reflects either data at the original TCGA data coordinating center supplemented with Genomic Data Commons Data (for TCGA and CCLE), or for TARGET, data received from the TARGET data coordinating center, not the Genomic Data Commons.

  • We have removed Google Genomics functionality from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Also we are restructuring the handling of multiple Programs of data. Please feel free to provide feedback.

  • For TARGET data the clinical and Gene Expression files themselves are available in the system.

Enhancements

  • You will be returned a more detailed error message when uploading your own user data.

  • On the Data Availability section on the cohort details page now displays the HG38 somatic mutation information for program TCGA.

Bug Fixes

  • There is now a 2000 character limit for the workbook title section.

  • When selecting the cohort link to complete analysis section on a worksheet will send you to the existing cohort list table page.

  • Latency issues when working with the cohort creation page have been resolved.

  • When working with TCGA data the IGV browser will not give you a 401 or a 404 error.

  • The mouse over feature will display the long name for disease code and project short name for all programs.

  • On the cohort creation page you can now filter with the HG38 somatic mutation data by gene for program TCGA using the Molecular tab.

  • On the IGV Browser when working with TCGA genomic build hg38 you will no longer get a 404 error.

  • On the cohort creation page when working with User Data tab, the left filter panel sorts the other filter.

  • Cohorts created with API specific filters are now accessible to access by their cohort details page.

  • You are now able to plot miRNA data with genomic build hg38 for TARGET data.

May 25, 2017 v3.0

In collaboration with the GDC we now have TARGET pediatric cancer data available for analysis in the user interface. You are now able to create cohorts and plot analysis with information from TARGET, TCGA, and CCLE data.

In addition, we have replaced the previous APIs with a new version that supports the new user interface.

We have also released the analyzed data types that are based on genome build GRCh38 for TCGA and TARGET data. GRCh37 (HG19) is also still available for TCGA, TARGET, and CCLE datasets.

Workbooks, cohorts, and variables favorites list created before the data structure migration will still be available for analysis and have been labeled as legacy and version 1. If you have difficulty using version 1 workbooks, please contact us

Known Issues

  • Analysis Type : Seq peek Formatting Elongated on occasion

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If the user shares a cohort neither the owner nor the person who was granted access to cohort will receive a confirmation email.

  • Cannot plot any data if you use a CCLE data cohort on a worksheet.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.

  • On the cohort view files page there are capitalization bugs on the Platform filter.

  • Swap values is not working properly for the plot settings.

  • The set operation for existing cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart you can select the log option. The log option only applies to numerical options.

  • When working with workbooks, if you select the delete confirmation button multiple times while the page is loading you will be sent to an error page.

  • When working on a scatter plot the Tobacco Smoking being used as the Legend is displayed in numerical values when it should be displayed as categorical values.

  • The character limit for a workbook title name is currently inactive, if you exceed the possible limit you will be sent to an error page.

  • You currently cannot plot user uploaded data when working with workbooks.

  • Selecting cohort from worksheet “To Complete Analysis” section will send you to a 400 Bad Request error.

  • You will experience latency issues when working with the create a new cohort page.

  • When plotting, certain values will be displayed as numerical when it should be a categorical value e.g Tobacco Smoking History.

  • The Data File Availability Panel for program CCLE in currently inactive when on the cohort details page and also editing a cohort with CCLE data.

  • On the File List page you currently unable to access the bam files for the IGV Browser associated to build hg38 when working with TCGA data.

  • A number of TCGA and CCLE case IDs shown below will have been removed from all cohorts since they are no longer available from NCI’s Genomics Data Commons, and ISB-CGC is trying to mirror that data as much as possible.

  • TCGA cases:

TCGA-33-4579, TCGA-35-3621, TCGA-66-2746, TCGA-66-2747, TCGA-66-2750, TCGA-66-2751, TCGA-66-2752, TCGA-AN-A0FE, TCGA-AN-A0FG, TCGA-BH-A0B2, TCGA-BR-4186, TCGA-BR-4190, TCGA-BR-4194, TCGA-BR-4195, TCGA-BR-4196, TCGA-BR-4197, TCGA-BR-4199, TCGA-BR-4200, TCGA-BR-4205, TCGA-BR-4259, TCGA-BR-4260, TCGA-BR-4261, TCGA-BR-4263, TCGA-BR-4264, TCGA-BR-4265, TCGA-BR-4266, TCGA-BR-4270, TCGA-BR-4271, TCGA-BR-4272, TCGA-BR-4273, TCGA-BR-4274, TCGA-BR-4276, TCGA-BR-4277, TCGA-BR-4278, TCGA-BR-4281, TCGA-BR-4282, TCGA-BR-4283, TCGA-BR-4284, TCGA-BR-4285, TCGA-BR-4286, TCGA-BR-4288, TCGA-BR-4291, TCGA-BR-4298, TCGA-BR-4375, TCGA-BR-4376, TCGA-DM-A286, TCGA-E2-A1IP, TCGA-F4-6857, TCGA-GN-A261, TCGA-O2-A5IC, TCGA-PN-A8M9
  • CCLE cases:

LS123, LS1034 - The number of cases and samples when viewed in the User Interface as compared to the BigQuery tables vary across all three projects (TCGA, TARGET, and CCLE). This is because the user interface reflects the data available at the Genomic Data Commons, whereas data in BigQuery reflects either (for TCGA and CCLE) data at the original TCGA data coordinating center supplemented with Genomic Data Commons Data, or for TARGET, data received from the TARGET data coordinating center, not the Genomic Data Commons. - We have removed Google Genomics functionality from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Also we are restructuring the handling of multiple Programs of data. Please feel free to provide feedback. - For TARGET data the clinical and Gene Expression files themselves are available in the system. The bam files will be available soon!

Enhancements

  • You will be returned a more detailed error message when uploading your own user data.

  • The user interface now displays the same nomenclature as the Genomic Data Commons (GDC).

Bug Fixes

  • The user data upload is enabled and users can now upload their own datasets and create cohorts using existing programs and newly uploaded data by the user.

  • You can now have multiple Google Cloud Projects associated to your account and use only one bucket and dataset on one project with no interference.

April 12, 2017 v1.15

Known Issues

  • We are currently having issues viewing bam files using the IGV browser for TCGA and CCLE data. We are working to fix the issue and it should be resolved as soon as possible.

February 26, 2017 v1.14

Known Issues

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If the user shares a cohort neither the owner nor the person who was granted access to cohort will receive a confirmation email.

  • Cannot plot any data if you use a CCLE data cohort on a worksheet.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.

  • On the cohort view files page there are capitalization bugs on the Platform filter.

  • Swap values is not working properly for the plot settings.

  • The set operation for existing cohorts complement is behaving exceptionally slow.

  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.

  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart you can select the log option. The log option only applies to numerical options.

  • If multiple Google Cloud Projects are registered through the user interface, it is advised to to add Google buckets and BigQuery datasets to both projects currently.

  • When working with workbooks, if you select the delete confirmation button multiple times while the page is loading you will be sent to an error page.

  • When working on a scatter plot the Tobacco Smoking being used as the Legend is displayed in numerical values when it should be displayed as categorical values.

  • The character limit for a workbook title name is currently inactive, if you exceed the possible limit you will be sent to an error page.

  • We have removed Google Genomics functionality from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Also we are restructuring the handling of multiple Programs of data. Please feel free to provide feedback.

  • There will be a reduced number of releases and features over the next month (or so) while we do some rework required for enabling the distribution of additional data sets and types copied from the NCI-GDC. The new data type is TARGET data, and different analyzed data types are based on the hg38 genome builds. Stay tuned in likely the early part of 2017.

  • User data uploads are currently disabled. Any projects you have previously uploaded will continue to be available in your Saved Projects list, and you can continue to work with them, but new data cannot be added at this time. We are working on bringing this function up again, please stay tuned.

Bug Fixes

  • User will no longer be sent to the Social Network Login page when trying to login. If this occurs, please feel free to send ISB-CGC feedback using this link feedback.

November 30, 2016 v1.13

Known Issues

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • If the user shares a cohort they do not receive a confirmation email.

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • If a researcher leaves the workbooks inactive the page freezes.

  • On the existing cohort list page for the delete button, select the blue x does nothing. It should be disabled.

  • On the cohort view files page there are capitalization bugs on the Platform filter.

  • Swap values is not working properly for the plot settings.

  • Some plot setting are saved or retrieved when working with worksheets.

  • The set operation for existing cohorts intersection is behaving exceptionally slow.

  • We have removed Google Genomics functionality from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Also we are restructuring the handling of multiple Programs of data. Please feel free to provide here.

  • There will be a reduced number of releases and features over the next month (or so) while we do some rework required for enabling the distribution of additional data sets and types copied from the NCI-GDC. The new data type is TARGET data, and different analyzed data types are based on the hg38 genome builds. Stay tuned in likely the early part of 2017.

Bug Fixes

  • The user can no longer see BCGSC expression as an option when plotting genes if user does not select center filter on worksheet.

  • Worksheets added to an existing workbook now behave the same as the original worksheet.

  • Cohort set operations no longer performing exceptionally slow.

November 16, 2016 v1.12

Known Issues

  • Analysis Type : Seq peek Formatting is Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • If the user shares a cohort they do not receive a confirmation email.

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • If a researcher leaves the workbooks inactive the page freezes.

  • On the existing cohort list page for the delete button, selecting the blue x does nothing. It will be be disabled in a future release.

  • On the cohort view files page there are capitalization bugs on the Platform filter.

  • Swap values is not working properly for the plot settings.

  • Some plot setting are saved or retrieved when working with worksheets.

  • Worksheets added to an existing workbook behave differently than the original worksheet.

  • The user can see BCGSC expression as an option when plotting genes if user does not select center filter on worksheet.

  • The set operation for existing cohorts intersection is behaving exceptionally slow.

  • We are removing Google Genomics from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Please feel free to provide feedback.

Enhancements

  • A warning will be displayed if the user is trying to plot with required data missing e.g. must select an analysis, gene or variable, and a cohort to create a plot.

  • On the project details page user will be sent to upload new study in existing project tab when they select upload data.

  • When the user plots a graph with NA values, you will be returned a notification stating no valid data was found.

  • There is no longer text overlapping on the Cloud Hosted Datasets readthedocs page in the documentation.

Bug Fixes

  • The user can no longer add the same gene symbol twice if list to the same worksheet even if they have given their list different names.

  • When the user selects multiple cohorts for color by feature for scatter plot all cohorts selected display on the graph.

  • On the existing cohorts table for public cohorts, the new workbook and set operations buttons are now active.

  • For all analysis types the x-axis and y-axis with certain variables text will no longer overlap and is displayed clearly.

  • The upload data button is disabled on the review files page when no buckets or datasets are associated.

  • Someone with multiple eRA accounts will be no longer have issues when trying to access controlled data.

November 2, 2016 v1.11

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count off by one.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • If the user shares a cohort they do not receive a confirmation email.

  • When the user selects multiple cohorts for color by feature for scatter plot they do not display in chart.

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

  • When the user plots a graph with NA values the UI returns a blank graph.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • If a researcher leaves the workbooks inactive the page freezes.

  • On the existing cohort list page for the delete button, selecting the blue x does nothing. It should be disabled.

  • On the cohort view files page capitalization bugs on the Platform filter.

  • Swap values is not working properly for the plot settings.

  • Some plot settings are saved or retrieved when working with worksheets.

  • On the existing cohorts table for public cohorts, the new workbook and set operations buttons are currently inactive.

  • Worksheets added to an existing workbook behave differently than the original worksheet.

Enhancements

  • Introduce user data upload functionality see documentation here.

  • More fluid zoom feature when working with analysis worksheets.

  • Case Sensitivity is now maintained in creating and displaying Workbook names throughout the entire User Interface.

  • You can now create a new cohort from the menu bar.

  • Variables menu bar is displayed similar to the rest of the favorites variables.

  • On the dashboard, all create new buttons/links are identical.

  • Owner of what is shared either a workbook or a cohort is able to remove multiple viewers. Viewers are also able to remove themselves.

  • Removed BCGSC gene expression from the UI gene specification selection for plot analysis.

Bug Fixes

  • X or Y- Axis for text no longer overlaps on worksheet for any analysis type, except for violin plot.

  • The Legend is no longer displayed elongated when you use multiple cohort for color by feature for violin plot.

  • miRNA_expression_values_fixed table in dataset 2016_07_09_tcga_data_open reflect only hg19.mirbase20 files.

  • You are now able to duplicate a workbook that has been shared with you by someone else.

  • Added pseudo-counts to the mosaic plots on the create new cohort page. This allows you to be sure of always being able to see (and select) the smallest contributors in these mosaics.

  • Removing the filter from the filter confirmation from the create new cohort page, this will remove it from the rest of filter selections.

  • Select the “check-all” feature on the create new cohort page will no longer cause duplicates on the selected filters panel.

  • Create cohort from plot selection now works with all analysis types.

  • Data inconsistencies between the create new cohort histogram filter and the most recent BigQuery datasets has been addressed and resolved.

September 21, 2016 v1.10

Enhancements

  • Text in confirmation box of a duplication of a workbook has been enhanced.

  • On the registered Google Cloud Projects page, icon has been added for the user to go directly to the Google Cloud Console page if desired.

  • When the a Service Account is removed from the Access Control List, the project owner is sent an email with an explanation as to why the account was removed.

  • IGV File List page displays of which page user is browsing.

Bug Fixes

  • For a Cubby hole plot the x - axis name can be seen clearly.

  • On a duplicate worksheet when working with gene specifications, user is able to select between all options multiple times.

  • Page becomes elongated when the user builds a Cubby Hole plot.

  • The selected variables for the plot setting on a worksheet are saved after the user leaves the workbook.

  • When registering a Google Cloud Project the user is displayed the list of emails associated to the GCP only once.

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • The Bar chart on the worksheet panel renders overlapping text.

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count off by one.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • If the user shares a cohort they do not receive a confirmation email.

  • The Legend is displayed elongated when you use multiple cohort for color by feature for violin plot.

  • When the user selects multiple cohorts for color by feature for scatter plot they do not display in chart.

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

  • When the user plots a graph with NA values the UI returns a blank graph.

  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.

  • There are duplicate rows in the molecular data table in BigQuery.

September 7, 2016 v1.9

Enhancements

  • Dictionary mapping feature types to units for use in plot displays added to worksheets.

  • The user now has the option to make the axis logarithmic if the plot can display continuous numerical data for eg. mRNA expression levels.

  • The NIH username entry is now case insensitive for dbGaP authorization.

  • The mouse over feature works when the user has created a long workbook name on the existing workbooks table page.

  • The mouse over functionality was added to the worksheet name within a workbook.

Bug Fixes

  • The order by ascending or descending feature is now working properly for the existing workbooks table page.

  • Tobacco Smoking History filter in the create cohort page displays the filters in descriptive values.

  • The user can now select all existing cohorts when on the add cohort(s) to worksheet page.

  • The gene specification selection on the worksheet page is now working properly.

  • When a user shares a workbook with someone the person who received viewer access to the workbook is sent a confirmation email. If the person who shared the workbook then deletes the workbook before it’s opened, then the person clicks the invitation link the person is sent to the unknown invitation page. The button to go back to the Dashboard page appears like this, “Your Dashboard”

  • The user is sent an email when the Service Account is removed the Access controlled list for having a user associated to the project who is not dbGaP authorized.

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • The Bar chart on the worksheet panel renders overlapping text.

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count off by one.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • Page becomes elongated when the user builds a Cubby Hole plot.

  • X-axis name cut off for cubby hole plot when x-axis has only 3 criteria.

  • If the user shares a cohort they do not receive a confirmation email.

  • The Legend is displayed elongated when you use multiple cohort for color by feature for violin plot.

  • When the user selects multiple cohorts for color by feature for scatter plot they do not display in chart.

  • When the user creates a duplicate worksheet,the bar chart with a gene with specification protein can freeze when selecting an option for the Select Feature.

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

  • When the user plots a graph with NA values the UI returns a blank graph.

  • When a user duplicates a worksheet, some functionality related to plotting will not function properly on the duplicate worksheet.

August 24, 2016 v1.8

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • The Bar chart on the worksheet panel renders overlapping text.

  • Analysis Type : Seq peek Formatting Elongated.

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count off by one.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • Page becomes elongated when the user builds a Cubby Hole plot.

  • X-axis name cut off for cubby hole plot when x-axis has only 3 criteria.

  • When the user shares a cohort they do not receive a confirmation email.

  • User will be spammed with email every one minute when their service account is removed from the ACL control list. To stop this, please either delete your service account from the ISB-CGC interface, or remove the GCP project member(s) who is (are) not authorized to access the controlled data set. (see documentation here). We are planning to reduce the frequency of the notification emails to once per day.

  • The Legend is displayed elongated when you use multiple cohort for color by feature for violin plot.

  • When the user selects multiple cohorts for color by feature for scatter plot they do not display in chart.

  • When the user creates a duplicate worksheet,the bar chart with a gene with specification protein can freeze when selecting an option for the Select Feature.

  • When a user shares a workbook with someone the person who received viewer access to the workbook is sent a confirmation email. If the person who shared the workbook then deletes the workbook before it’s opened, then the person clicks the invitation link the person is sent to the unknown invitation page. The button to go back to the Dashboard page appears like this, “Your Dashboard{“

  • Cannot plot any data if you use CCLE data cohort on a worksheet.

Enhancements

  • When the researcher is on the Register Service Account page, after they have submitted the Service Account associated to their Google Cloud Project a table that shows who is authorized will be prompted.

  • There is now a column that says “Has NIH Identity”, before it said, “Has eRA Commons”.

  • When the researcher creates a new cohort with more than 20 filters chosen the URL exceeds the limit of 2K characters and this affects the count for the Details panel. Therefore the user is now prompted with an alert box that will say, “You have selected too many filters. The current counts shown will not be accurate until one or more filter options are removed.” if this is ever the case.

  • In the user details page, if the researcher has not registered a Google Cloud Project it will say, “Register a Google Cloud Project” on the link.

Bug Fixes

  • The researcher can now delete whom they share cohort with from existing cohorts table.

  • After 24-hours of use, a dbGaP authorized user can re-authenticate through the link provided in the user details page.

  • The variable favorites list table page can now support a long title for the variable list.

  • The filter name will appear aligned in the verification panel when the filter is name too long for the create in cohort filter confirmation selection on the create new cohort page.

  • Grouped Data Type filter counts (Methylation, RNA Seq, miRNA Seq) now behave like the other count groups. The counts will behave as grouped values.

  • The user can no longer select a categorical variable for selection for Histogram plot.

  • The Filter token displays are now shown in ‘readable’ names when working with cohort filters.

  • Controlled access BAM files are now viewable viewable in the IGV browser after the user has authorized their credentials.

  • The user can now unlink an eRA commons account from their Google Identity in the user detail page.

  • The violin plot was inconsistently failing. We have updated the JavaScript, therefore the Violin plot no longer fail.

August 10, 2016 v1.7

New Features

  • The researcher can now create a cohort of participants and samples based on the presence of a gene mutation in a specified gene. Look for the new “Molecular” tab when you are creating a cohort.

  • The bioinformatics programmer now has the ability to associate their Google Cloud Project’s Service Account. This allows the researcher to run computational pipelines from Google Virtual Machines using TCGA Controlled data (e.g. BAM files) for seven days before they have to reauthorize. For more information please select here.

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • The Bar chart on the worksheet panel renders overlapping text.

  • Cannot delete whom you share cohort with from existing cohorts table.

  • Analysis Type : Seq peek Formatting Elongated

  • The CCLE data in GUI is not exactly coordinated the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count is off by one.

  • After 24-hours of use, a dbGaP authorized user has to logout and then log back in to be prompted with NIH login link to re-access controlled data.

  • User will occasionally be sent to the Social Network Login page when trying to login. If this occurs, please go the the home page of the Web Application and try again.

  • Page becomes elongated when the user builds a Cubby Hole plot.

  • X-axis name cut off for Cubby Hole plot when x-axis has only 3 criteria.

  • When the user shares a cohort they do not receive a confirmation email.

  • When a name is too long for variable favorites list table, the Last Updated” column will appear cut off.

  • Filter name will appear off the verification panel when the filter is name too long for the create in cohort filter selection.

  • Grouped Data Type filter counts (Methylation, RNA Seq, miRNA Seq) don’t behave like other count groups. The counts behave as though the values were for distinct categories.

  • User will be spammed with email every one minute when their service account is removed from the ACL control list. To stop this, please either delete your service account from the ISB-CGC interface, or remove the GCP project member(s) who is (are) not authorized to access the controlled data set. (see documentation here). We are planning to reduce the frequency of the notification emails to once per day.

  • The user can select a categorical variable for selection for Histogram plot, and will return a graph with no data.

  • The Legend is displayed elongated when you use multiple cohort for color by feature for violin plot.

  • When the user selects multiple cohorts for color by feature for scatter plot they do not display in chart.

  • When the user creates a duplicate worksheet,the bar chart with a gene with specification protein can freeze when selecting an option for the Select Feature.

Enhancements

  • The user now has the option to select all or deselect all possible filters for any tab that has more than 10 possible options in the create new cohort page.

  • The user can now set all existing tables by either ascending or descending order.

  • The cohort_id has been added to the detail cohort page. This allows the user to reference a desired cohort with ease in the API endpoints.

  • When creating a new cohort, the user is given the full description for sample type in the selected filters panel.

Bug Fixes

  • Histological Type entries in create new cohort page on the user interface now match the Google BigQuery entries in terms of capitalization.

  • Filters for data type counts in left panel currently is now working properly.

  • When a user sets a cohort as Color by feature for violin plot legend will be set to cohort. Then when the user sets another color by feature it will update the legend.

  • The user can no longer make a gene list without selecting a gene first.

  • The user can now list the Last Modified section for the existing cohort table by either ascending or descending order.

  • In the create new cohort page for the data type tab, the user can now select either True or False for DNA Sequencing, Protein, and SNP Copy Number filters.

  • When the user edits a new cohort and sets the edited cohort to return zero samples, the user will be prompted to select different set of filters.

July 20, 2016 v1.6

Known Issues

  • The user can add same gene twice if two identical worksheets with different names are uploaded.

  • The Bar chart on the worksheet panel renders overlapping text.

  • User cannot delete whom you share cohort with from existing cohorts table.

  • Analysis Type : Seq peek Formatting Elongated.

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count off by one.

  • Histological Type entries in create new cohort page on the user interface should match the Google BigQuery entries in terms of capitalization.

  • When a user sets a cohort as Color by feature for violin plot legend will remain cohort.

  • After 24 hour dbGaP authorization runs out the user is unable to re authenticate. (If you have this issue, please log out and log back in to be prompted with login link for dbGaP authorization.)

Enhancements

  • Created ability in GUI to make cohorts based on presence of an HPV status.

  • Created ability in GUI to make cohorts based on BMI value.

  • In the details panel for existing cohort have a section that shows the ISB-CGC cohort_id.

  • Enhancements of GUI to view submenu item in different screen sizes and resolutions.

  • New version of IGV javascript installed.

Bug Fixes

  • User can no longer add same filter to existing cohorts.

  • Optimized Security in the user interface.

  • If a user opens a shared cohort it will appear once on the dashboard.

  • Pathologic State Filter in create cohort Stage is displayed capitalized.

  • Filter counts with 0 value do list when editing a pre-existing cohort.

  • Filters for data type counting in left panel is working properly.

  • After 24 hour dbGaP authorization runs out the user is able to re authenticate.

  • User can not create new gene list without giving the gene list a name.

July 6, 2016 v1.5

Known Issues

  • The user can add same gene twice if list to the same worksheet it they have different names.

  • The user can add same filter to existing cohorts.

  • The Bar chart on the worksheet panel renders overlapping text.

  • Cannot delete whom you share cohort with from existing cohorts table.

  • Analysis Type : Seqpeek Formatting Elongated.

  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.

  • If a user opens a shared cohort it will appear twice on the dashboard.

  • If a user creates a cohort with sample type filter Cell Lines and CCLE the total number of samples count are off by one.

  • Pathologic State Filter in create cohort Stage should be displayed capitalized.

  • Histological Type entries in create new cohort page on the user interface should match the Google BigQuery entries in terms of capitalization.

  • Filter counts with 0 value don’t list when editing a pre-existing cohort.

  • Filters for data type counting in left panel currently is not working properly.

Enhancements

  • A user can only select the cloud storage checkbox if he or she has been authenticated and authorized through the user details page. Otherwise the user can view the cloud storage checkbox but there will be a disabled cursor icon when the user hovers over in an attempt to select the checkbox.

  • The counts for the queries were refactored to match what was done for the APIs .

  • The Download File List as CSV was refactored to a maximum of 65,000 files at once.

  • Date formats on Workbooks, Cohort, Gene, and Variables list pages all reflect the same format.

  • The Last Updated columns to variable and gene lists were added to the user Dashboard

Bug Fixes

  • The user can now select a cohort in the color by feature section for the violin and the scatter plots in the worksheet section.

  • The Gene list variable used for analysis in the worksheet plot settings section is the exact gene as compared to a gene that contains the string.

  • The Comments button for both the workbook and the cohort section, when the user clicks the request multiple times within one second the user interface will not post duplicate comments in the comments section.

  • The user can now select gene HP in Create Gene list favorite page to be used for analysis. For worksheet analysis the user now has ability to select different genes once one already selected and utilized for analysis.

  • In the variable favorites table, the menu for a specific variable will no longer be cut off once a certain set of variables list are exceeded.

  • A 400 Error pop up window will no longer appear as the user transitions from the File List page to IGV browser page.

  • The Public Data Availability section will no longer display any cut off if the user drags data type to the left of the page away from the panel itself, in detail page of existing cohort or the create new cohort page.

  • When the user edits a cohort, details section will display which filter(s) were applied for each update.

  • Cloud storage path in CSV file download for GA/BCGSC and GA/UNC V2 platforms can now be viewed.

  • The menu bar will display existing list for variable favorites list, gene favorites list, cohorts, and workbooks with no cut off.

  • When the user has selected a variable for the y-axis, the chart will display the selected variable in the charts.

  • When the user clicks Save Changes when modifying an existing cohort the user can will no longer be spammed with multiple cohorts created at once when clicking the button multiple times within one second.

  • The Save cohort Endpoint default example for v1 now works properly.

  • For the cohort_list API endpoint v1 will now pull only the cohort_id you specified.

June 8, 2016 v1.4

Known Issues

  • The user can add same gene twice if list has different names.

  • The user can add same filter to existing cohorts.

  • In the Create new Cohort page, the left filters (#) does not re-populate as you select filters to match the sample number in clinical feature panel.

  • The bar chart renders overlapping text in the x-axis and y-axis for certain variables.

  • A user cannot delete whom you share a cohort with from the existing cohorts table.

  • On a worksheet with the Analysis Type : Seq peek, the formatting will display Elongated when the user selects a certain gene.

  • CCLE data in GUI is currently not parallel the CCLE data in BigQuery.

  • User currently cannot select a cohort in the color by feature section in a worksheet.

  • The Gene list used for analysis currently uses genes similar as to original gene and well as the specific gene added to list, in the plot settings menu.

  • The comments button for both workbooks/cohorts, if user clicks the comment button multiple times within one second will post duplicate comment.

  • User currently cannot select gene HP or gene’s with only two letters in the Create Gene list favorite page.

  • In Violin plot - the user has no ability to select a different gene once one is already selected.

  • In the variable favorites table, the menu for a specific variable will be cut off once a certain set of variables list are exceeded.

  • A 400 Error pop up window will appear as the user transitions from the File List page to IGV browser page.

  • Public Data Availability section will be cut is user drags data type title to the left of the page away from the panel itself,in detail page of existing cohort.

Enhancements

  • Upgraded system from using Django 1.8 to Django 1.9.

  • A link to the google cloud platform has been added to the user details page.

  • The TCGA filter is selected as the default project when creating a new cohort.

  • When the user clicks on the browser back button, the user will remain on the same worksheet that they were previously on.

  • When the user goes adds a new gene list, variable favorites list, and/or cohort from the worksheet data type panel, the button will display “Apply to Worksheet”.

  • The feedback/help section has been moved to the top of the page to provide the user a more convenient way to send us feedback.

Bug Fixes

  • User can no longer add a duplicate gene to same gene favorites list.

  • To edit a gene name the user must now delete and re-type the desired gene name.

  • The functionality of a duplicate worksheet drop down menu reflects the same functionality of the original worksheet.

  • The Last Updated section reflects any changes made to the variable list, cohort list, and gene list in their corresponding tables.

  • The File list page now allows the user to add a maximum of five files to use in the IGV browser between all the pages in the file list table.

  • When a user hovers over clinical feature panel for Sample Type and Tumor Tissue Type the top row when hovered over the name is displayed clearly.

  • Order by Ascending/Descending is working properly for Existing Cohorts table page.

  • The user is now able to plot gene’s with a hyphen(-) in the gene name itself.

  • The user is now able to download a maximum of 85,000 files at a time, in the File List page for a selected cohort.

May 10, 2016 v1.3

Known Issues

  • A user can add same gene twice if identical gene list have different names.

  • The user can add same filter already selected to an existing cohort.

  • The create new Cohort left filters number count does not re-populate as you select filters to match sample number count in clinical feature panel.

  • When a Bar chart renders overlapping text is displayed on the x-axis of the plot.

  • Cannot delete whom you share a cohort with from the existing cohorts table only from the details page of a cohort.

  • Analysis Type : Seq peek formatting is elongated when a user selects certain gene for analysis. Using the gene TP53 can reproduce this issue.

  • The CCLE data in GUI currently does not parallel the CCLE data in BigQuery.

  • A user can add a duplicate gene to same gene favorites list in the create new gene list page.

  • By double clicking a gene name in the create new gene list page, the gene will expand but display a blank space.

  • A duplicate worksheet will display the color by feature variables twice in the drop down list.

  • A user currently cannot select a cohort in the color by feature section.

  • The Gene list drop down list used for analysis should be exact gene only.

  • The comments button for both workbook and cohort comments section, if the user is to click comment button multiple time within one second, this action will post a duplicate comment.

  • The last Update section should reflect any changes made to variable list, cohort, and gene list for their corresponding tables.

  • The user cannot select the gene HP in the Create Gene list favorite page.

Enhancements

  • Data Use Certification Agreement link updated and the help link was removed. -

  • The Data Type section in the Create new Cohort page name change from MIRNA Sequencing to miRNA Sequencing and SNP CN to SNP Copy-Number.

  • The number of patients is now dynamically displayed in the create new cohort page when selecting filters in the details panel.

  • The number of samples is now dynamically displayed in the create new cohort page when selecting filters in the details panel.

  • By default in the create new cohort page, you will have the TCGA data filter selected.

  • When creating a cohort, checking feature boxes will be throttled so as to avoid miss-represented data.

  • Tooltips were added to the Sample Type section in the clinical features panel.

  • Minor changes were made in personal details page.

Bug Fixes

  • The Clinical Features Panel in the create new cohort page will no longer display BRCA even if unselected.

  • The last updated section in existing workbooks panel does update when changes are made to existing workbook.

  • Set operation Union patient number is working correctly.

  • Upon duplicating a cohort it will duplicate the selected filter(s) as well.

  • User is able to download file list as csv for any cohort with any filter selected.

  • There is no legend cut off for violin plot or any other analysis type when the color by feature is set to Prior Diagnosis or any other variable.

  • When user switches gene in plot settings the feature choices for that specification will refresh.

  • The variable clinical search feature works properly when the user searches for clinical variables and then are used for analysis.

April 27, 2016 v1.2

Known Issues

  • Can add same gene twice if list has different names.

  • User can add same filter to existing cohorts.

  • Create new Cohort left filters (#) does not re-populate as you select filters to match sample # in clinical feature panel.

  • Clinical Features Panel in create new cohort page will still display BRCA even if unselected.

  • Last updated section in existing workbooks panel does not update when changes are made to existing workbook.

  • Bar chart renders overlapping text.

  • Set operation Union patient # off by one.

  • Legend Name cut off when name is too long.

  • Upon duplicating a cohort it duplicates the selected filter as well.

  • Cannot delete whom you share cohort with from existing cohorts table.

  • Unable to down file list as csv for any other cohort only selected filter CCLE.

  • Legend Cut Off for violin plot when color by feature set to Prior Diagnosis.

  • When user switches gene in plot settings the feature choices for that specification disappears.

Enhancements

  • The comments section now has a max number of characters 1000 limit.

  • Link created to Extend controlled access period to 24-hours from the moment the link is clicked.

Bug Fixes

  • A user can now click new worksheet multiple times within a few seconds and only produce one sheet.

  • The user must now add a new filter in an existing cohort to edit it the cohort.

  • The duplicate button for an existing cohort will only make one duplicate at a time.

  • Clicking 150+ selected filters will not create an error page.

  • Cancel button on Create new gene list page will send you to Gene list favorites table menu.

  • Violin plot : User can not add categorial value to y-axis.

  • If user edits an existing cohort, the old filter(s) will not be removed.

  • If a new worksheet is generated, the worksheet functionality is working properly.

  • User will get the ‘500: There was an error while handling your request. If you are trying to access a cohort please log out - and log back in. Sorry for the inconvenience.’ if the user is inactive for more in 15 minutes when trying to create/use existing cohort.

  • Clinical Feature Panel is displayed properly and reacts to filters being added/removed quickly.

  • The user must have text to add a comment.

  • All columns in file list table will be transferred/displayed when exported as csv file.

April 14, 2016 v1.1

Known Issues

  • If user clicks create in new worksheet too many times within a few seconds will create duplicate worksheets

  • Can add same gene twice if list has different names

  • Apply filters button work when no filter is selected in edit cohorts page

  • If user clicks create in new cohorts too many times within a few seconds will create duplicate cohorts

  • User can add same filter to existing cohorts

  • Clicking 150+ selected filters will create error page

  • Create new Cohort left filters (#) does not re-populate as you select filters to match sample # in clinical feature panel

  • Clinical Features Panel in create new cohort page will still display BRCA even if unselected

  • Cancel button on Create new gene list page will send you to Data Source | Gene Favorites page

  • Violin plot : User can add categorial value to y-axis

  • Last updated section in existing workbooks panel does not update when changes are made to existing workbook

  • If user edits an existing cohort the old filter(s) will be removed

Enhancements

  • Tool tips added for disease code in create new cohort page

  • Disease in longname in tool tips the first letter is capitalized

Bug Fixes

  • The user detail page will now display the correct date

  • The plot settings for a new worksheet are now working properly

  • Plot settings for duplicate worksheets are now working properly

  • The plot settings will now match the analysis type for existing worksheet plot

  • The user can now edit existing cohort name

  • Set Operations : Intersection working properly

  • Set Operations : Union working properly

  • Set Operations : Complement is now working properly

  • User is now able to delete selected filters from selected filter panel in new cohort page using the blue X

  • Editing an existing variable favorites list will display previously selected variables

  • (Already in documentation) Green checkmark will appear for IGV link

  • Update plot button will now work on a duplicate worksheet(can be added with 3)

  • User can now delete all cohorts with the select all feature

  • Fixed bugs with Data Type Create new cohort generating errors

  • The user can now search for variable favorite with the miRNA feature

  • The user can now search for a variable favorite through the clinical search feature

March 14, 2016 v1.0

  • When working with a worksheet two plots will be generated occasionally.

  • Axis labels and tick values sometimes overlap and get cutoff.

  • Page elongated when Cubby Hole plot generated and there are lots of values in the y axis.

December 23, 2015 v0.2

  • Treemap graphs in cohort details and cohort creation pages will not apply its own filters to itself. For example, if you select a study, the study treemap graph will not update.

  • Cohort file list download not working.

December 3, 2015 v0.1

  • First tagged release of the web-app


Have feedback or corrections? Please email us at feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Frequently Asked Questions (FAQ)

ISB-CGC Accounts and Cloud Projects

Do I have to request an ISB-CGC account before I can try the web app?

No, you can just “sign in” to the Web App using your Google identity.

I want to be able to run big jobs using Google Compute Engine on the TCGA data hosted by the ISB-CGC. What should I do?

You will need to request a Google Cloud Platform (GCP) project. Please see How to Request Cloud Credits for more details about requesting a project.

Can I use any email address as a Google identity?

Yes, you can. If your email address is not already linked to a Google account, you can create a Google account with your current email address. Please note, however that although these two accounts will then share the same name, they will still be two separate accounts, with two separate passwords, etc. (It is also possible that your institutional email address is already a Google account, if your institution uses Google Apps. This is how to find out).

How do I connect my Google Cloud Project to the ISB-CGC?

Your Google Cloud Project gives you access to all of the technologies that make up the Google Cloud Platform. These technologies include BigQuery, Cloud Storage, Compute Engine, etc. The ISB-CGC makes use of a variety of these technologies to provide access to the TCGA data, as well as many other data sets. Please see the Google Cloud Project Setup and Data Access section in the Quick Start Guide.

The connection between your Google Cloud Project (whether it is an ISB-CGC sponsored and funded project or your own personal project) and the ISB-CGC is your Google identity (also referred to as your “user credentials”).

Access to all ISB-CGC hosted data is controlled using the Data Commons Framework Gen3 which defines the permissions attached to each data set, bucket, or object.

What project information do I input on the Register a Google Cloud Project page?

You will need to input the Google Cloud Project ID which can be found on the Dashboard page of the Google Console under Project info.

_images/project_info.PNG

Why do I add the service account 907668440978-oskt05du3ao083cke14641u35deokgjj@developer.gserviceaccount.com to my Google Cloud Project?

This service account is needed in your Google Cloud Project IAM page for the ISB-CGC project to be able to automatically verify that all users of your Google Cloud Project have the same appropriate access rights to the protected data that has been requested for the project.

What service account do I use on the Register a Service Account page to be able to gain access to protected data?

On the Register a Service account page you are asked to input a service account ID. You need to go to the IAM and Admin page which can be found in your console for your Google Cloud Project to find the correct service account. The service account you would like to use is named, “Compute Engine default service account”. This service account is the default option on the Register A Service Account page. Please DO NOT use the service account 144657163696-utjumdn9c03fof16ig7bjak44hfj53o6@developer.gserviceaccount.com (you will be prevented from using this account by our software and an error message will be sent indicating this).

Why can’t I reauthorize my Service Account on my Google Cloud Project?

Your service account may have had its permissions revoked (because, for example, the 7-day limit has been reached, or you have added a member to the GCP who is not authorized to use controlled data the service account is linked with or has not logged into the ISB-CGC UI and authenticated using their dbGaP credentials). If permissions were revoked because an unauthorized user was added to the project, the Google Cloud Project owner will be sent an email specifying the Service Account, and Google Cloud Project which resulted in the access being revoked. If the user has not logged into the ISB-CGC Web App and/or has not authenticated, you will be given a red error message saying, “There was an error in processing your service account. Please try again.” when attempting to refresh using the refresh wheel. To see which new user hasn’t logged in or authenticated, please go to either the Register a Service Account page or the Adjust a Service Account page and see which user it is within the table for which the data set is not selected and there are X’s in the Registered and Has NIH Identity.

_images/authorizedtable.PNG

Ensure that the user has 1) Logged into the ISB-CGC web app and 2) Has registered their NIH Identity with their user interface identity.

To reauthorize the service account 1) Remedy the problem that resulted in access being denied, and 2) Select the “Adjust A Service Account” icon(plus sign) next to Current Access Expires.

Another reason could be if some users are marked as unable to access datasets they should have access to, make sure they have logged into the system and linked their eRA Commons/NIH Identity to their Google Identity.

What happens if I accidently delete the default service account from a Google Cloud Project?

If you accidently delete the default service account associated to the Google Cloud Project you are working in, you can no longer authorize the service account during instance creation, associate the service account to controlled access data, and many other functionalities will no longer work.

If you then try to add the service account back to the Google Cloud Project, this error occurs:

ERROR: (gcloud.compute.instances.create) Some requests did not succeed: - The resource ‘xx…@project.gserviceaccount.com’ of type ‘serviceAccount’ was not found.

Unfortunately at this time, there is no direct way to recover the default service account.

One workaround to recreate the Google Compute Engine default service account is to disable and reenable Google Compute Engine API in your project. This will only work if you have no Google Compute Engine resources (e.g VMs, Disks, Snapshots etc) in your project; otherwise, you will get “Backend Provisioning Error” when you try to disable Compute Engine API.

Another solution would be creating a new project and redeploying your instances there.

Google has an internal feature request to prevent accidental deletion of default service accounts.

There is a Google forum discussion that can be found here with more details and explanation.

ISB-CGC Web Interface

I ran the same query in the Web App that I’ve run before, but the results were different. Why is that?

The Web App performs its data retrieval and counts on ISB-CGC Google BigQuery tables which are based on the latest GDC data release. So, it’s possible that a new GDC release occurred since you last performed that query.

Why do I sometimes get a “Do you want to leave this site?” pop-up box when leaving a page or canceling a workflow edit?

This is a security feature when working with forms found in most web browsers; it lets you know that you may have made some changes which will be lost when you navigate away from the page. If you intend to cancel what you were doing, you can safely ignore it.

Why did I get a 401 error on the IGV Browser?

You will see the 401 error only if your pop-up blocker is enabled for the ISB-CGC website. Please disable the pop-up blocker on the top right-hand side of the screen by selecting to always allow pop-ups from ISB-CGC.

_images/401ErrorIGVBrowser.PNG

Why does the web browser crash if too many IGV Browser tabs are opened at once?

The web browser may crash when too many IGV Browser tabs are open due to the memory intensive nature of loading bam files. When working with the IGV Browser, please be mindful of having multiple tabs of the IGV Browser open.

_images/IGVBrowserCrash.png

Does SeqPeek and CNVR plotting only work with TCGA data?

We currently have no data associated with CNVR or Seqpeek for TARGET or CCLE. Therefore, SeqPeek and CNVR will only work with TCGA data.

Data Access

Does all TCGA data require dbGaP authorization prior to access?

No, generally only the low-level sequence (DNA and RNA) and SNP-array data (CEL files) require dbGaP authorization. All of the “high-level” molecular data, as well as the clinical data are open-access and much of this has been made available in a convenient set of BigQuery tables.

Where can I find the TCGA data that ISB-CGC has made publicly available in BigQuery tables?

The BigQuery web interface can be accessed at https://console.cloud.google.com/bigquery. If you have not already added the ISB-CGC data sets to your BigQuery “view”, click on the blue arrow next to your project name at the top of the left side-bar, select “Switch to Project”, then “Display Project…”, and enter “isb-cgc-bq” (without quotes) in the text box labeled “Project ID”. For older ISB-CGC data sets, repeat and enter “isb-cgc”. All ISB-CGC public BigQuery data sets and tables will now be visible in the left side-bar of the BigQuery web interface. Note that in order to use BigQuery, you need to be a member of a Google Cloud Project.

How can I apply for access to low-level DNA and RNA sequence data?

In order to access the TCGA or All other controlled-access data available, you will need to apply to dbGaP. Please also review our section on Understanding Data Security.

I have dbGaP authorization. How do I provide this information to the ISB-CGC platform?

In order for us to verify your dbGaP authorization, you first need to associate your Google Identity (used to sign-in to the Web App) with a valid NIH login (eg your eRA Commons ID). After you have signed in, click on your avatar (next to your name in the upper-right corner) and you will be taken to your account details page where you can verify your dbGaP authorization. You will be redirected to the NIH iTrust login page and after you successfully authenticate, you will be brought back to the ISB-CGC Web App. After you successfully authenticate, we will verify that you also have dbGaP authorization for the TCGA controlled-access data and other programs you have dbGaP access to.

We also ask that you review our section on Understanding Data Security.

My professor has dbGaP authorization. Do I have to have my own authorization too?

Yes, your professor will need to add you as a “data downloader” to his/her dbGaP application so that you have your own dbGaP authorization associated with your own eRA Commons ID. (This video explains how an authorized user of controlled-access data can assign a downloader role to someone in his/her institution.)

I already authenticated using my eRA Commons ID but now I want to use a different Google identity to access the ISB-CGC Web App. Can I reauthenticate using the same eRA Commons ID?

Yes, but you will first need to sign in using your previous Google identity and “unlink” your eRA Commons ID from that one before you can link it with your new Google Identity. An eRA Commons ID cannot be associated with more than one Google Identity within the ISB-CGC platform at any one time.

Can I authenticate to NIH programmatically?

No, the current NIH authentication flow requires web-based authentication and must therefore be done from within the ISB-CGC Web App. Once you have authenticated to NIH via the Web App, and your dbGaP authorization has been verified, the Google identity associated with your account will have access to the controlled-data for 24 hours.

Data Content

I get a different number of samples in BigQuery than I do with the same query in the Web App. Why?

Older programs like TCGA have both legacy data (data from the original program) and harmonized data (data run through the Genomics Data Commons). The Web App primarily uses harmonized data whereas BigQuery contains both legacy and harmonized data. In addition, some cases and samples have been removed from the Web App if annotation suggests the data from those cases or samples are incorrect, misleading or from cases of uncertain origin. Most of these cases and samples are still in BigQuery and users are encouraged to check the annotations tables.

Python Users

I want to write Python scripts that access the TCGA data hosted by the ISB-CGC. Do you have some examples that can get me started?

Yes, of course! The best place to start is with our Community Notebooks or our repository in GitHub. You can run any of these examples yourself. It includes an introduction explaining what Notebooks are, how to get started as a novice user, and how to run more advanced analyses once you are comfortable.

R Users

I want to use R and Bioconductor packages to work with the TCGA data. How can I do that?

You can run RStudio locally or deploy a dockerized version on a Google Compute Engine VM. You can find some great examples to get you started in our Community Notebooks or our repository in Community Notebooks GitHub.

For an example on how to use Bioconductor packages with TCGA data in BigQuery, please check out our interactive tutorial found here.

Regulome Explorer Users

Can I run Regulome Explorer Analyses using TCGA tables of heterogeneous data in BigQuery?

Yes, of course! A series of Python Notebooks have been created to replicate Regulome Explorer and includes detailed information on the statistical methods implemented. To get started, please visit our Regulome Explorer page in readthedocs or our Repository in Regulome Explorer GitHub.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.

Contact Us

For general information about the ISB-CGC please contact us at feedback@isb-cgc.org. We are especially keen on learning about your particular use-cases, and how we can help you take advantage of the latest in cloud-computing technologies to answer your research questions.

For feature requests or bug reports, please send e-mail to feedback@isb-cgc.org.

We have virtual Office Hours on Tuesdays and Thursdays for any questions on ISB-CGC functionality or data that you may have. We look forward to speaking with you.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.


Have feedback or corrections? Please email us at feedback@isb-cgc.org.