Uploading data

Note

If you are using the Bianca Cluster at UPPMAX for analysing your data, there are some additional steps. These will be clearly marked in the guide.

This guide contains instructions on how to upload data to FEGA Sweden. We will take you through the process step by step. The guide comprises six major steps, of which two are only relevant for Bianca users:

  1. Install the sda-cli tool
  2. Download the encryption key
  3. Move files to Bianca (Bianca only)
  4. Encrypt the files
  5. Move files from Bianca (Bianca only)
  6. Submit your files

Install the sda-cli tool

Note

This guide expects you to perform the following steps on the system where you keep the data you intend to encrypt and submit. Some systems (like Bianca) do not allow internet access though, so in those cases start in a part of the systems that allow internet access (UPPMAX in the case of Bianca).

  1. Download the sda-cli executable that matches your system from the GitHub repository.

  2. Extract the binary by using the tar command:

    tar -xvzf sda-cli_.vX_Linux_x86_64.tar.gz

    The sda-cli executable should now be in the same directory as the downloaded file.

  1. Download the sda-cli executable that matches your system from the GitHub repository.

  2. Extract the binary by using the tar command:

    tar -xvzf sda-cli_.vX_Darwin_x86_64.tar.gz

    The sda-cli executable should now be in the same directory as the downloaded file.

  1. Download the sda-cli executable that matches your system from the GitHub repository.

  2. Extract the binary by using the tar command:

    tar -xvzf sda-cli_.vX_Windows_x86_64.zip

    The sda-cli executable should now be in the same directory as the downloaded file.

Note

User documentation for sda-cli is available in the GitHub repository. This guide should however include the information needed to encrypt and upload data to FEGA Sweden.

Download the encryption key

For FEGA Sweden to be able to read the uploaded files, they need to be encrypted with the correct public key. This key can be downloaded with the following command:

wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub
curl -OL https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub
curl -OL https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub
Bianca

Move files to Bianca

  1. Copy sda-cli and crypt4gh_key.pub to the wharf

    Bianca uses a system called the wharf to transfer files without having a direct connection to the internet. You can upload files to this system from the outside using SFTP, or from UPPMAX by simply mounting the wharf drives. You can read more about this in the transit user guide.

    From a terminal window, run the following commands to upload your sda-cli binary:

    Login to transit area by running:

    ssh [username]@transit.uppmax.uu.se

    Mount to the Bianca project folder by using the following command:

    mount_wharf [project_name]

    where [project_name] is your Bianca project, formatted like ‘sens20XXX’.

    Note

    Mounting the wharf follows the same procedure as logging in to Bianca, so you will need to provide your password along with your two-factor authentication code.

    Copy the sda-cli binary to the project drive in the wharf using the cp command:

    cp sda-cli ~/<project_name>/

    Copy the crypt4gh_key.pub file to the project drive in the wharf:

    cp crypt4gh_key.pub ~/<project_name>/
  2. Copy sda-cli and crypt4gh_key.pub from the wharf to Bianca

    You can now log out from transit and log in to Bianca to transfer the files out of the wharf and in to your project.

    Login to Bianca by running the following command:

    ssh -A [username]-[project_name]@bianca.uppmax.uu.se

    (Remember to use your two-factor authenticator along with your password)

    Copy the files from /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name] to the folder you want to store the image using the cp command:

    cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/sda-cli .
    cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/crypt4gh_key.pub .

    Running the ls command in this folder, you should be able to see the sda-cli and crypt4gh_key.pub files.

Encrypt the files

  1. Encrypt the files

    Now that you have the public key, and the tools you need, you can encrypt the submission files. An encryption key will be created automatically by the tool.

    ./sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ...
    ./sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ...
    sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ...

    The tool will automatically create checksum files called:

    checksum_encrypted.md5
    checksum_encrypted.sha256
    checksum_unencrypted.md5
    checksum_unencrypted.sha256

    Make sure to save these files, you will need them during submission.

Bianca

Move files from Bianca

  1. Move encrypted files to the wharf for upload to FEGA Sweden

    On Bianca, move the encrypted files to your wharf directory:

    mv *.c4gh /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]

    Log back in to UPPMAX transit and mount the wharf drive:

    ssh [username]@transit.uppmax.uu.se
    mount_wharf [project_name]

    (remember to use password + two-factor authentication code).

    The encrypted files should be secure enough to keep outside of Bianca while they are uploaded to the submission system. If you prefer to make the upload from a different system than UPPMAX, see the Bianca user guide on how to download the encrypted files from the wharf using SFTP.

Submit your files

Once your files are encrypted, you are ready to start uploading them.

  1. Obtain the configuration file

    The sda-cli tool requires a configuration file with the relevant settings. You can get the configuration file by logging in to our service.

    Note

    The sda-cli tool builds on the s3 technology for storing data. If you choose to not use the provided configuration file, we recommend setting the multipart chunk size significantly higher than the default 5 Mbyte. It can be set up to 2 Gbytes but values above 100 Mbyte will probably have only little effect on the throughput. The following section requires the usage of [username] when uploading files. The username refers to the value of the secret_key in the downloaded configuration file. Make sure to get it from the configuration file and use it every time the sda-cli command is issued.

  2. Upload the files

    Files can be uploaded with or without folders. Files can be uploaded individually using:

    ./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...
    ./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...
    sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...

    The folder structure of the uploaded files will be preserved in the remote archive.

    Many times it might be easier to upload a directory directly though. This can be done with the -r flag:

    ./sda-cli upload -config <configuration_file> -r <folder_to_upload>
    ./sda-cli upload -config <configuration_file> -r <folder_to_upload>
    sda-cli upload -config <configuration_file> -r <folder_to_upload>

    More information on the capabilites of the sda-cli can be found using the tool’s built-in help:

    ./sda-cli help
    ./sda-cli help
    sda-cli help