Uploading data
If you are using the Bianca Cluster at UPPMAX for analysing your data, there are some additional steps. These will be clearly marked in the guide.
This guide contains instructions on how to upload data to FEGA Sweden. We will take you through the process step by step. The guide comprises six major steps, two of which are only relevant for Bianca users:
- Install the sda-cli tool
- Download the encryption key
- Move files to Bianca (Bianca only)
- Encrypt the files
- Move files from Bianca (Bianca only)
- Submit your files
Install the sda-cli tool
This guide expects you to perform the following steps on the system where you keep the data you intend to encrypt and submit. Some systems (such as Bianca) do not allow internet access, so in those cases start in a part of the system that does allow internet access (UPPMAX in the case of Bianca).
Download the latest release of sda-cli from GitHub:
https://github.com/NBISweden/sda-cli/releases/latest
Choose the asset that matches your operating system (Linux, macOS, or Windows) and system architecture (arm64 or x86_64). For macOS, the release asset names use Darwin.
On the release page, download the
.tar.gzfile that matches your Linux system and architecture.Open a terminal, go to the directory where the file was downloaded, and extract it. For example:
tar -xvzf sda-cli_vX.Y.Z_Linux_x86_64.tar.gz
The extracted file will be the sda-cli binary.
You may want to move the sda-cli binary to a directory in your PATH such as /usr/local/bin to make it easier to run.
On the release page, download the
.tar.gzfile that matches your macOS system and architecture.Open a terminal, go to the directory where the file was downloaded, and extract it. For example:
tar -xvzf sda-cli_vX.Y.Z_Darwin_x86_64.tar.gz
The extracted file will be the sda-cli binary.
You may want to move the sda-cli binary to a directory in your PATH such as /usr/local/bin to make it easier to run.
On the release page, download the
.zipfile that matches your Windows system and architecture.Open Command Prompt, go to the directory where the file was downloaded, and extract it. For example:
tar -xf sda-cli_vX.Y.Z_Windows_x86_64.zip
The extracted folder will contain the sda-cli.exe binary. Change into that folder before running the commands in the next step.
You can also extract the .zip file by right-clicking it in File Explorer and selecting Extract All….
User documentation for sda-cli is available in the GitHub repository. This guide should however include the information needed to encrypt and upload data to FEGA Sweden.
Download the encryption key
For FEGA Sweden to be able to read the uploaded files, they need to be encrypted with the correct public key. This key can be downloaded with the following command:
wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pubcurl -OL https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pubcurl.exe -OL https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pubMove files to Bianca
Copy
sda-cliandcrypt4gh_key.pubto the wharfBianca uses a system called the wharf to transfer files without having a direct connection to the internet. You can upload files to this system from the outside using SFTP, or from UPPMAX by simply mounting the wharf drives. You can read more about this in the transit user guide.
From a terminal window, run the following commands to upload your
sda-clibinary:Log in to the transit area by running:
ssh [username]@transit.uppmax.uu.seMount the Bianca project folder by using the following command:
mount_wharf [project_name]where
[project_name]is your Bianca project, formatted as ‘sens20XXX’.NoteMounting the wharf follows the same procedure as logging in to Bianca, so you will need to provide your password along with your two-factor authentication code.
Copy the
sda-clibinary to the project drive in the wharf using thecpcommand:cp sda-cli ~/<project_name>/Copy the
crypt4gh_key.pubfile to the project drive in the wharf:cp crypt4gh_key.pub ~/<project_name>/Copy
sda-cliandcrypt4gh_key.pubfrom the wharf to BiancaYou can now log out from transit and log in to Bianca to transfer the files out of the wharf and in to your project.
Log in to Bianca by running the following command:
ssh -A [username]-[project_name]@bianca.uppmax.uu.se(Remember to use your two-factor authenticator along with your password)
Copy the files from
/proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]to the folder where you want to store the files using thecpcommand:cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/sda-cli . cp /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]/crypt4gh_key.pub .Running the
lscommand in this folder, you should be able to see thesda-cliandcrypt4gh_key.pubfiles.
Encrypt the files
Encrypt the files
Now that you have the public key, and the tools you need, you can encrypt the submission files. An encryption key will be created automatically by the tool.
./sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ..../sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ...sda-cli encrypt -key crypt4gh_key.pub <file_1_to_encrypt> <file_2_to_encrypt> ...NotePlease note that encrypting files may vary in duration depending on file size and number. While it can be quick, the process may also take several hours to complete.
When encrypting files, you can specify an output directory where the encrypted files will be saved. This helps keep both the original and encrypted files organized and avoids the need to move them manually afterward.
The tool will automatically create checksum files called:
checksum_encrypted.md5 checksum_encrypted.sha256 checksum_unencrypted.md5 checksum_unencrypted.sha256We recommend that you keep these files with your project documentation, as they can be useful for verifying data integrity later.
Move files from Bianca
Move encrypted files to the wharf for upload to FEGA Sweden
On Bianca, move the encrypted files to your wharf directory:
mv *.c4gh /proj/[project_name]/nobackup/wharf/[username]/[username]-[project_name]Log back in to UPPMAX transit and mount the wharf drive:
ssh [username]@transit.uppmax.uu.se mount_wharf [project_name](remember to use password + two-factor authentication code).
The encrypted files should be secure enough to keep outside of Bianca while they are uploaded to the submission system. If you prefer to make the upload from a different system than UPPMAX, see the Bianca user guide on how to download the encrypted files from the wharf using SFTP.
Submit your files
Once your files are encrypted, you are ready to start uploading them.
Obtain the configuration file
The
sda-clitool requires a configuration file with the relevant settings. You can get the configuration file by logging in to our service.NoteThe
sda-clitool builds on the s3 technology for storing data. If you choose to not use the provided configuration file, we recommend setting the multipart chunk size significantly higher than the default 5 Mbyte. It can be set up to 2 Gbytes but values above 100 Mbyte will probably have only little effect on the throughput. The following section requires the usage of[username]when uploading files. The username refers to the value of thesecret_keyin the downloaded configuration file. Make sure to get it from the configuration file and use it every time thesda-clicommand is issued.Upload the files
Files can be uploaded with or without folders. Files can be uploaded individually using:
./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ..../sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> ...The folder structure of the uploaded files will be preserved in the remote archive.
Many times it might be easier to upload a directory directly though. This can be done with the
-rflag:./sda-cli upload -config <configuration_file> -r <folder_to_upload>./sda-cli upload -config <configuration_file> -r <folder_to_upload>sda-cli upload -config <configuration_file> -r <folder_to_upload>More information on the capabilities of
sda-clican be found using the tool’s built-in help:./sda-cli help./sda-cli helpsda-cli help
After completing this step, verify that your files have been uploaded by logging in to the FEGA Sweden Submitter Portal.
In the upper-right corner of the page, click on the menu icon (three horizontal lines) and select Files. This will take you to your Inbox, where you can see all the files that have been uploaded. Note that it may take some time before the files show up in the portal.
For completing your submission, you need to associate uploaded files with the appropriate sections of the metadata in the Submitter Portal.