Performant Uploads with TUS
OverviewQuestions:
Objectives:
Requirements:
Setup TUSd
Configure Galaxy to use it to process uploads
- Galaxy Server administration
- Ansible: slides slides - tutorial hands-on
- Galaxy Installation with Ansible: slides slides - tutorial hands-on
Time estimation: 30 minutesSupporting Materials:
Last modification: Apr 18, 2023License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITShort Link: https://gxy.io/GTN:T00024
Here youβll learn to setup TUS an open source resumable file upload server to process uploads for Galaxy. We use an external process here to offload the main Galaxy processes for more important work and not impact the entire system during periods of heavy uploading.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon
TUS and Galaxy
To allow your user to upload via TUS, you will need to:
- configure Galaxy to know where the files are uploaded.
- install TUSd
- configure Nginx to proxy TUSd
Installing and Configuring
Hands-on: Setting up ftp upload with Ansible
In your playbook directory, add the
galaxyproject.tusd
role to yourrequirements.yml
--- a/requirements.yml +++ b/requirements.yml @@ -14,3 +14,6 @@ # gxadmin (used in cleanup, and later monitoring.) - src: galaxyproject.gxadmin version: 0.0.12 +# TUS (uploads) +- name: galaxyproject.tusd + version: 0.0.1
If you havenβt worked with diffs before, this can be something quite new or different.
If we have two files, letβs say a grocery list, in two files. Weβll call them βaβ and βbβ.
Input: Old$ cat old
π
π
π
π
π
π₯Output: New$ cat new
π
π
π
π
π
π₯We can see that they have some different entries. Weβve removed π because theyβre awful, and replaced them with an π
Diff lets us compare these files
$ diff old new
5c5
< π
---
> πHere we see that π is only in a, and π is only in b. But otherwise the files are identical.
There are a couple different formats to diffs, one is the βunified diffβ
$ diff -U2 old new
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@
π
π
-π
+π
π₯This is basically what you see in the training materials which gives you a lot of context about the changes:
--- old
is the βoldβ file in our view+++ new
is the βnewβ file- @@ these lines tell us where the change occurs and how many lines are added or removed.
- Lines starting with a - are removed from our βnewβ file
- Lines with a + have been added.
So when you go to apply these diffs to your files in the training:
- Ignore the header
- Remove lines starting with - from your file
- Add lines starting with + to your file
The other lines (π/π and π₯) above just provide βcontextβ, they help you know where a change belongs in a file, but should not be edited when youβre making the above change. Given the above diff, you would find a line with a π, and replace it with a π
Added & Removed Lines
Removals are very easy to spot, we just have removed lines
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@
π
π
-π₯And additions likewise are very easy, just add a new line, between the other lines in your file.
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@
π
+π
π
πCompletely new files
Completely new files look a bit different, there the βoldβ file is
/dev/null
, the empty file in a Linux machine.$ diff -U2 /dev/null old
--- /dev/null 2022-02-15 11:47:16.100000270 +0100
+++ old 2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@
+π
+π
+π
+π
+π
+π₯And removed files are similar, except with the new file being /dev/null
--- old 2022-02-16 14:06:19.697132568 +0100
+++ /dev/null 2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@
-π
-π
-π
-π
-π
-π₯Install the role with:
Input: Bashansible-galaxy install -p roles -r requirements.yml
Configure it in your group variables
--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -67,6 +67,9 @@ galaxy_config: # Tool security outputs_to_working_directory: true new_user_dataset_access_role_default_private: true # Make datasets private by default + # TUS + galaxy_infrastructure_url: "https://{{ inventory_hostname }}" + tus_upload_store: "{{ galaxy_tus_upload_store }}" gravity: process_manager: systemd galaxy_root: "{{ galaxy_root }}/server" @@ -87,6 +90,10 @@ galaxy_config: celery: concurrency: 2 loglevel: DEBUG + tusd: + enable: true + tusd_path: /usr/local/sbin/tusd + upload_dir: "{{ galaxy_tus_upload_store }}" handlers: handler: processes: 2 @@ -156,3 +163,7 @@ nginx_conf_http: nginx_ssl_role: usegalaxy_eu.certbot nginx_conf_ssl_certificate: /etc/ssl/certs/fullchain.pem nginx_conf_ssl_certificate_key: /etc/ssl/user/privkey-www-data.pem + +# TUS +galaxy_tusd_port: 1080 +galaxy_tus_upload_store: /data/tus
We proxy the service next to Galaxy. As it resides βunderβ the Galaxy path, clients will send cookies and authentication headers to TUS, which it can use to process the uploads before telling Galaxy when theyβre done.
--- a/templates/nginx/galaxy.j2 +++ b/templates/nginx/galaxy.j2 @@ -28,6 +28,22 @@ server { proxy_set_header Upgrade $http_upgrade; } + location /api/upload/resumable_upload { + # Disable request and response buffering + proxy_request_buffering off; + proxy_buffering off; + proxy_http_version 1.1; + + # Add X-Forwarded-* headers + proxy_set_header X-Forwarded-Host $host; + proxy_set_header X-Forwarded-Proto $scheme; + + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + client_max_body_size 0; + proxy_pass http://localhost:{{ galaxy_tusd_port }}/files; + } + # Static files can be more efficiently served by Nginx. Why send the # request to Gunicorn which should be spending its time doing more useful # things like serving Galaxy!
Add to the end of your Galaxy playbook
--- a/galaxy.yml +++ b/galaxy.yml @@ -30,6 +30,7 @@ name: ['tmpreaper'] when: ansible_os_family == 'Debian' roles: + - galaxyproject.tusd - galaxyproject.galaxy - role: galaxyproject.miniconda become: true
Run the playbook
Input: Bashansible-playbook galaxy.yml
Congratulations, youβve set up TUS for Galaxy.
Check it works
Hands-on: Check that it works.
SSH into your machine
Check the active status of tusd by
systemctl status galaxy-tusd
.Upload a small file! (Pasted text will not pass via TUS)
Check the directory
/data/tus/
has been created and itβs contentsInput: Bashsudo tree /data/tus/
Youβll see files in that directory, a file thatβs been uploaded and an βinfoβ file which contains metadata about the upload.
1.sh
Hands-on: Time to git commitItβs time to commit your work! Check the status with
git status
Add your changed files with
git add ... # any files you see that are changed
And then commit it!
git commit -m 'Finished Performant Uploads with TUS'
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If youβre using
git
to track your progress, remember to add your changes and commit with a good commit message!
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon