Setting up GCSFuse

When you are running workflow on a virtual machine (VM), often your input files are stored in Google Cloud Storage buckets. One way to access them is to mount the Cloud Storage buckets as file systems on your VM. Google Cloud Storage FUSE (GCSFuse) allows you to mount Cloud Storage buckets to easily read and write from your VM to your Cloud Storage buckets. More detailed information can be found on the Google Cloud documentation page.

How-to video | Installing Page

Step 1: create a Virtual Machine (VM) instance big enough to hold your data

This guide recommends your VM be created with: Ubuntu 16.04 LTS, and with the Allow full access to all Cloud APIs option.

Note

It’s very important to have a VM big enough, or your gcsfuse will not mount properly.

Step 2: installing gcsfuse

The following commands can be used to install gcsfuse:

$ sudo -i
$ cd /
$ cd opt
$ export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
$ echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install gcsfuse
#### Close the VM console and reopen #####

Step 3: mounting/ linking the target bucket to your VM directory

Making a directory to hold your bucket:

### at your home directory and not in opt ###
### in this tutorial yourNewDirectory will be testGcsfuse ###

$ mkdir <yourNewDirectory>

Note

To access restricted data with your Google credential, before going further, use this command: $gcloud auth application-default login

Mount a bucket to your folder:

$ gcsfuse <bucketname> <myfolder/to/mount>

Example: gs://gdc-ccle-open/ is the bucket you want to mount to your VM

$ gcsfuse gdc-ccle-open testGcsfuse

Mount a subdirectory from your bucket to your VM folder:

$ gcsfuse --only-dir <subdirectory> <bucketName> <myFolder/to/mount>

Example: you have a bam file with the address gs://gdc-ccle-open/692a845c-7957-41f2-b679-5434c69ba25b/G27328.Calu-6.1.bam, and you only need to mount the directory that holds that bam file:

$ gcsfuse --only-dir 692a845c-7957-41f2-b679-5434c69ba25b gdc-ccle-open testGcsfuse

You should see something like this:

../../_images/WorkflowWithGCSFUSE.jpg ../../_images/WorkflowWithGCSFUSE_2.jpg

Step 4: running your workflow with a local VM directory

Write your workflow with the input pointing to that directory, as follows:

../../_images/WorkflowWithGCSFUSE_1.jpg

Have feedback or corrections? Please email us at feedback@isb-cgc.org. Follow us on BlueSky and X!