Uploading data

Note

If you are using the Bianca Cluster at UPPMAX for analysing your data, there are some additional steps. These will be clearly marked in the guide.

This guide contains instructions on how to upload data to FEGA Sweden. We will take you through the process step by step. The guide comprises four major steps, with two additional steps for bianca users:

Install the sda-cli tool

Note

This guide expects you to perform the following steps on the system where you keep the data you intend to encrypt and submit. Some systems (like bianca) do not allow internet access though, so in those cases start in a part of the systems that allow internet access (uppmax in the case of bianca).

  1. Download the sda-cli executable for your operating system from the GitHub repository

    If you are on Linux, the command might look something like this:

    wget https://github.com/NBISweden/sda-cli/releases/download/v0.0.2/sda-cli_0.0.2_Linux_x86_64.tar.gz
  2. Extract the binary by using the tar command:

    tar -xvzf sda-cli_0.0.2_Linux_x86_64.tar.gz

    The sda-cli executable should now be in the same directory as the downloaded file.

    NOTE: documentation for the sda-cli tool is available here but this guide includes all that you need to encrypt and upload data to FEGA.

Download the encryption key

  1. Download the crypt4gh public key

    For FEGA Sweden to be able to read the uploaded files, they need to be encrypted with the correct public key. This key can be downloaded from this repository with this command:

    wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub
Bianca

Move files to bianca

  1. Copy sda-cli and crypt4gh_key.pub to the wharf

    Bianca uses a system called the wharf to transfer files without having a direct connection to the internet. You can upload files to this system from the outside using SFTP, or from UPPMAX by simply mounting the wharf drives. You can read more about this in the transit user guide.

    From a terminal window, run the following commands to upload your sda-cli binary:

    Login to transit area by running:

    ssh [username]@transit.uppmax.uu.se

    Mount to the Bianca project folder by using the following command:

    mount_wharf [project_name]

    where [project_name] is your bianca project, formatted like ‘sens20XXX’.

    Note

    Mounting the wharf follows the same procedure as logging in to Bianca, so you will need to provide your password along with your two-factor authentication code.

    Copy the sda-cli binary to the project drive in the wharf using the cp command:

    cp sda-cli ~/<project_name>/

    Copy the crypt4gh_key.pub file to the project drive in the wharf:

    cp crypt4gh_key.pub ~/<project_name>/
  2. Copy sda-cli and crypt4gh_key.pub from the wharf to Bianca

    You can now log out from transit and log in to Bianca to transfer the files out of the wharf and in to your project.

    Login to Bianca by running the following command:

    ssh -A [username]-[project_name]@bianca.uppmax.uu.se

    (Remember to use your two-factor authenticator along with your password)

    Copy the files from /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name] to the folder you want to store the image using the cp command:

    cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/sda-cli .
    cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/crypt4gh_key.pub .

    Running the ls command in this folder, you should be able to see the sda-cli and crypt4gh_key.pub files.

Encrypt the files

  1. Encrypt the files

    Now that you have the public key, and the tools you need, you can encrypt the submission files. An encryption key will be created automatically by the tool.

    ./sda-cli encrypt -key crypt4gh_key.pub -f <file_1_to_encrypt> <file_2_to_encrypt> ...

    The tool will automatically create checksum files called:

    checksum_encrypted.md5
    checksum_encrypted.sha256
    checksum_unencrypted.md5
    checksum_unencrypted.sha256

    Make sure to save these files, you will need them during submission.

Bianca

Move files from bianca

  1. Move encrypted files to the wharf for upload to FEGA Sweden

    On Bianca, move the encrypted files to your wharf directory:

    mv *.c4gh /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]

    Log back in to UPPMAX transit and mount the wharf drive:

    ssh [username]@transit.uppmax.uu.se
    mount_wharf [project_name]

    (remember to use password + two-factor authentication code).

    The encrypted files should be secure enough to keep outside of Bianca while they are uploaded to the submission system. If you prefer to make the upload from a different system than UPPMAX, see the Bianca user guide on how to download the encrypted files from the wharf using SFTP.

Submit the files

Once your files are encrypted, you are ready to start uploading them.

  1. Obtain the configuration file

    The s3cmd tool requires a configuration file with the relevant settings. You can get the configuration file by logging in to our service with your Elixir ID.

    Note

    If you choose not to use the downloaded configuration file, we recommend setting the multipart chunk size significantly higher than the default 5 Mbyte. It can be set up to 2 Gbytes but values above 100 Mbyte will probably have only little effect on the throughput. The following section requires the usage of [username] when uploading files. The username refers to the value of the secret_key in the downloaded configuration file. Make sure to get it from the configuration file and use it every time the s3cmd command is issued.

  2. Upload the files​

    Files can be uploaded with or without folders. ​ Files can be uploaded individually using: ​

    ./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...

    The folder structure of the uploaded files will be preserved in the remote archive. ​

    Many times it might be easier to upload a directory directly though. This can be done with the -r flag: ​

    ./sda-cli upload -config <configuration_file> -r <folder_to_upload>

    More information on the capabilites of the sda-cli can be found using the

    ./sda-cli help

    command if needed.