Setting up GCSFuse
When you are running workflow on a virtual machine (VM), often your input files are stored in Google Cloud Storage buckets. One way to access them is to mount the Cloud Storage buckets as file systems on your VM. Google Cloud Storage FUSE (GCSFuse) allows you to mount Cloud Storage buckets to easily read and write from your VM to your Cloud Storage buckets. More detailed information can be found on the Google Cloud documentation page.
Step 1: create a Virtual Machine (VM) instance big enough to hold your data
This guide recommends your VM be created with: Ubuntu 16.04 LTS, and with the Allow full access to all Cloud APIs option.
It’s very important to have a VM big enough, or your gcsfuse will not mount properly.
Step 2: installing gcsfuse
The following commands can be used to install gcsfuse:
$ sudo -i $ cd / $ cd opt $ export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s` $ echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list $ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - $ sudo apt-get update $ sudo apt-get install gcsfuse #### Close the VM console and reopen #####
Step 4: running your workflow with a local VM directory
Write your workflow with the input pointing to that directory, as follows: