***************************** Getting Started with Analysis ***************************** ISB-CGC enables people to analyze cloud-based cancer data. Learn more about the different analytical methods ISB-CGC users can employ. Google Cloud Project Setup and Data Access ########################################################## A Google Cloud Project (GCP) is required to make use of all of the data, tools, and Google Cloud functionality. **Obtain a Google identity** - Do you or your institution already have a Google identity, such as a Gmail account? If so, you can proceed to the next step. - If not, it only takes a minute to `create a Google identity `_. You can even link a non-Gmail account (eg. scientist@nih.gov) as a Google identity by `this `_ method. **Request Google Cloud Credits** - Take advantage of a one-time `$300 Google Credit `_. - If you have already used this one-time offer (or there is some other reason you cannot use it), see this information about how to request `ISB-CGC Cloud Credits `_. **Set up a Google Cloud Project** - See Google's documentation about how to `create a Google Cloud Project `_. - Learn about how to `add members and roles to a project `_. - `Enable Required Google Cloud APIs `_ **Connect to ISB-CGC's cancer data tables in Google BigQuery** - To obtain access to the ISB-CGC open access project tables in BigQuery, users can link these tables to their GCP project as described `here `_. **Access open-access data** - All individual processed data files are accessible through GDC Google Cloud Storage buckets; ISB-CGC provides pointers to these files. Examples of how to find these URLs are in `this section `_, on each Program's documentation page; these SQL queries can also be incorporated into notebooks or workflows. **Getting Started with Analysis** Now you're ready to perform analysis. ISB-CGC offers analysis with Google BigQuery and analysis using APIs and VMs. Interactive web-based Cancer Data Analysis & Exploration ########################################################## Explore and analyze ISB-CGC cancer data through a suite of graphical user interfaces (GUIs) that allow users to select and filter data from one or more public data sets (such as TCGA, CCLE, and TARGET), combine these with your own uploaded data and analyze using a variety of built-in visualization tools. .. list-table:: :widths: 60, 40 :header-rows: 0 * - Integrative Genomics Viewer (IGV) | *Explore and visualize genomic data. IGV is no longer integrated with ISB-CGC* - * `Integrative Genomics Viewer (IGV) website `_ * - Mitelman Database for Chromosome Aberrations and Gene Fusions in Cancer | *Explore relationships between chromosomal changes and cancer* - * `ISB-CGC Mitelman Database Documentation `_ * `ISB-CGC Mitelman Database `_ * - The *TP53* Database | *The TP53 Database is no longer hosted by ISB-CGC. Explore TP53 variant data that have been reported in the published literature or are available in other public databases.* - * The *TP53* `Database `_ Cancer data analysis using Google BigQuery ########################################################## Processed data are consolidated by data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, Protein Expression, etc.) from sources including the Genomics Data Commons (GDC) and Proteomics Data Commons (PDC) and transformed into ISB-CGC Google BigQuery tables. This allows users to quickly analyze information from thousands of patients in curated BigQuery tables using Structured Query Language (SQL). SQL can be used from the Google BigQuery Console but can also be embedded within Python, R and complex workflows, providing users with flexibility. The easy, yet cost effective, “burstability” of BigQuery allows you to, within minutes (as compared to days or weeks on a non-cloud based system), calculate statistical correlations across millions of combinations of data points. .. list-table:: :widths: 60, 40 :header-rows: 0 * - **BigQuery Table Search User Interface** | *Learn more about ISB-CGC hosted BigQuery tables* - * `ISB-CGC BigQuery Table Search Documentation `_ * `ISB-CGC BigQuery Table Search `_ * - **Google BigQuery Console** | *Use SQL to analyze and query ISB-CGC cancer data stored in Google’s cloud-based data warehouse* - * `ISB-CGC BigQuery Documentation `_ * `Google BigQuery Documentation `_ * `Google Cloud BigQuery Console `_ * - **Notebooks** | *Seamlessly integrate ISB-CGC tables with R and Python to conduct robust analyses* - * `ISB-CGC Notebook Documentation `_ * `ISB-CGC Statistical Notebook Documentation `_ * `ISB-CGC Machine Learning Notebook Documentation `_ * `ISB-CGC HTAN Notebook Documentation `_ * `ISB-CGC Mitelman Database Notebook Documentation `_ Cancer data analysis using APIs & Google Cloud Virtual Machines ################################################################# ISB-CGC enables the use of as many workflow technologies as possible through documentation, support, and necessary infrastructure. .. list-table:: :widths: 60, 40 :header-rows: 0 * - **ISB-CGC APIs** | *Programmatically access data and user-generated cancer patient cohort information* - * `ISB-CGC API Documentation `_ * `ISB-CGC API `_ * - **Connecting to GA4GH:** | *Easily connect to APIs from ISB-CGC* - * `How to find a tool using GA4GH TRS Notebook `_ * `How to use a GA4GH tool using WES Notebook `_ * - **Running workflows on ISB-CGC** | *Execute open-source and custom pipelines/algorithms on scalable virtual machines* - * `ISB-CGC Workflow Documentation `_ * We recommend tools such as the `Google Cloud SDK `_, `Google Compute Engine `_, `Virtual Machines `_ and `Docker `_ to assist your analyses.