Accessing Controlled Data¶
You can gain access to controlled data by two different methods via ISB-CGC. The methods can be used simultaneously if needed.
Select this method for controlled access via personal user credentials:
- Provides access to controlled data for 24 hours at a time;
- Uses your personal credentials;
- Example uses: the ISB-CGC Web App, R Studio or running short jobs on Google Compute Engine that complete in under 24 hours
Select this method for controlled access via service account credentials:
- Provides access to controlled data for seven days at a time;
- Uses the credentials of a service account, acting on your behalf (To learn about service accounts, refer to the Google documentation.);
- Example uses: using a Google Cloud Project; running a program from a Google Compute Engine (GCE) Virtual Machine (VM) that takes longer than 24 hours to complete
If you are looking to gain access to COSMIC data, please see the COSMIC documentation.
You’ll need the following before requesting controlled access via ISB-CGC:
- A Google identity;
- An NIH or electronic Research Administration (eRA) account;
- Database of Genotypes and Phenotypes (dbGaP) permission for each type of controlled access data of interest, linked to your NIH or eRA account;
- Your Google identify linked to your NIH/eRA account via the ISB-CGC Web App.
1) Google identity¶
If you don’t have a Google identity yet, please see the ISB-CGC Quick-Start Guide.
2) NIH or eRA account¶
Intramural researchers can use their NIH log-in account, and extramural researchers will need to have a personal eRA account. Either way, the user’s NIH/eRA account needs to be affiliated with their institution’s eRA account. Your principal investigator (PI) or other authorized person can create your personal eRA account and link it to your institution’s eRA account.
If you already have an NIH/eRA account, you can log into eRA at https://public.era.nih.gov/commons.
- If the Institution listed for you is not your current one, ask your PI to change it for you.
- If you are the PI or other authorized person, you can create, link and update accounts from here.
Visit electronic Research Administration (eRA) for more information on registering for a NIH eRA account.
Controlled Access Via Personal User Credentials¶
The first time that you perform the above steps, you are automatically granted controlled access via your personal uer credentials. This access lasts for 24 hours, though it can be extended. Subsequently, to obtain access, sign into the Web App, click on your persona (or Account Details on the drop down menu next to your name). Click the Get Controlled Access button below Obtain controlled access for 24 hours.
Controlled Access Via Service Account Credentials¶
To access controlled data programmatically, such as through Google Cloud or when running a VM, you’ll need to register a GCP and service account. Follow these steps:
Controlled Access in the Google BigQuery Console¶
The BigQuery project “isb-cgc-cbq” contains the ISB-CGC controlled access data which is stored in BigQuery tables. To obtain access to these ISB-CGC tables within the Google BigQuery Console, you must link to them within the BigQuery Console. Before doing so, you must have followed all the prerequisites above, including linking your Google identity to your NIH/eRA account via the ISB-CGC Web App.
When you access BigQuery from your Google Cloud Platform Console (see here for more information on this), you will be presented with the following page:
The blue arrow will produce a drop down list; select ‘Switch to Project’; then click ‘display project…’
You will then be presented with the following page:
As shown in the image below you will need to type in “isb-cgc-cbq” in the project id and then click okay.
Once this has been completed you will be able to see the appropriate controlled access ISB-CGC BigQuery data sets on the left hand side (see screenshot below).