Upload Agent: Advanced

Introduction

The DNAnexus Upload Agent is a fast and convenient command-line client that can be used to upload files to DNAnexus. For uploading multiple or large files, Upload Agent is particularly recommended due to its ability to resume previously interrupted uploads.

Installing Upload Agent: Follow the instructions on the Upload Agent download page to download the Upload Agent executable.

Note: For the rest of this document, we will use ua to represent the Upload Agent executable, and you should include it in the system's PATH or replace it with the path to where you have saved the Upload Agent executable on your local file system.

This page is an advanced guide to Upload Agent. For a general overview, see the Basic Guide to Upload Agent.

Advanced Usage

Uploading data from stdin

Upload Agent can upload data from stdin directly into a file. Note that when this option is used, only one file can be created. This option is very useful to pipe output from a program and upload it as a file.

$ my_application | ua --read-from-stdin my_file.txt File "my_file.txt" was uploaded successfully. Closing... file-xxxx

This command will read data interactively from the terminal until the stream is terminated with <CTRL>+D, which represents the end of the file (EOF).

$ ./ua --read-from-stdin my_file.txt > hello > world > <CTRL>+D # EOF

Redirecting uploaded files

Redirecting to a folder

You can change the final path of the file in the project via the flags --folder and --name. The following command uploads my_file_1.txt into the folder called oldData and behaves as if the file had been called file_1 (the new file name is file_1.gz, after compression). If you don't want your files to be automatically compressed, see the next section, Turn off automatic compression.

$ ./ua --folder "/oldData" --name "file_1" my_file_1.txt Uploading file my-file-1.txt to file object file-xxxx File "my-file-1.txt" was uploaded successfully. Closing... file-xxxx

Turning off automatic compression

By default, Upload Agent compresses all the previously uncompressed file(s) before uploading and appends .gz to the end of the file's name. You can override this behavior with the --do-not-compress flag.

$ ./ua --do-not-compress uncompressed.txt Uploading file uncompressed.txt to file object file-xxxx File "uncompressed.txt" was uploaded successfully. Closing... file-xxxx

Do not resume previous uploads

By default, Upload Agent attempts to resume all the uploads it can. In the case, where you would like to upload the same file twice, you can override this behavior with the --do-not-resume flag.

$ ./ua--do-not-resume dont_resume.txt Uploading file dont_resume.txt to file object file-xxxx File "dont_resume.txt" was uploaded successfully. Closing... file-xxxx

In the situation where the Upload Agent fails to upload a file, or has partially uploaded a file, we can resume the upload by specifying the same command again. When resuming an upload, we generate a file signature using the following information:

  • size
  • modifiedTimestamp
  • toCompress (boolean whether the file was uploaded original with --do-not-compress)
  • chunkSize
  • the canonical path to the file

This information is summarized as a metadata field on the file object. For more information about metadata on DNAnexus, please see the Manage-Metadata page. When you upload a file using Upload Agent, it will quickly calculate this file signature and search your current project for any file with the same signature. If it finds such an object, and if the file upload is incomplete, it will try to resume the upload. If the file upload is complete, then the file signature is added as a property.

Waiting for a file to close

When scripting, the ua command can wait until uploaded files are in the closed state before proceeding to the next command by using the --wait-on-close flag.You do not have to wait for a file to be closed to give it as input to app or applet, as the platform will automatically wait for the file to be closed before starting the job. However, if you would like to copy a file between projects, then you must wait for it to be in the closed state.

$ ./ua --wait-on-close close_me.txt Uploading file close_me.txt to file object file-xxxx File "close_me.txt" was uploaded successfully. Closing... file-xxxx

Show upload progress

You can turn on progress reporting (printed to stderr) with the --progress flag.

$ ./ua --progress large_file.txt Uploading file large-file.txt to file object file-xxxx large-file.txt 75.12% complete ... Average transfer speed = 3.58 MB/sec ... Instantaneous transfer speed = 3.77 MB/sec

Upload with Metadata

Files on the DNAnexus platform can have associated metadata. For more information about metadata on DNAnexus, please see the Manage Metadata page.

Details

Assigning file details

Upload Agent can set details for a file using the --details flag. The details must be passed as a valid JSON string. For more information about JSON, please see the Wikipedia page on JSON (external link).

$ ./ua myfile.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}' Uploading myfile file.txt to file object file-xxxx File "myfile.txt" was uploaded successfully. Closing... File-xxxx

Assigning details to multiple files

The following command will set the same details to all the files being uploaded.

$ ./ua first_file.txtmyfile.txt second_file.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}' Uploading file first_file.txt to file object file-xxxx Uploading file second_file.txt to file object file-yyyy File "first_file.txt" was uploaded successfully. Closing... File "second_file.txt" was uploaded successfully. Closing… file-xxxx file-yyyy

Assigning different details to multiple files

$ ./ua first_file.txtmyfile.txt second_file.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}' --details '{"Field 3": [A,B,C], "Field 4": "content"}' Uploading file first_file.txt to file object file-xxxx Uploading file second_file.txt to file object file-yyyy File "first_file.txt" was uploaded successfully. Closing... File "second_file.txt" was uploaded successfully. Closing… file-xxxx file-yyyy
Note: Please provide the --details flag as many times as the number of files.

Properties

Upload Agent can assign properties to a file during upload using the --property flag. For more information about properties, see the Manage Metadata page.

Assigning a property to a single file

$ ./ua myfile.txt --property key=value Uploading file myfile.txt to file object file-xxxx File "myfile.txt" was uploaded successfully. Closing... file-xxxx

Assigning multiple properties to a single file

$ ./ua myfile.txt --property property1=my_property --property property2="another property" Uploading file myfile.txt to file object file-xxxx File "myfile.txt" was uploaded successfully. Closing... file-xxxx

Advanced Usage

Changing the number of threads

You can specify a different number of threads for compression and a different number of outgoing HTTPS connections that will be opened to upload the file chunks by using the flags --compress-threads and --upload-threads, respectively. The number of threads used to read the input files can be changed by the --read-threads flag.

For example, if you are uploading some files from a eight-core machine, we recommend that you limit the usage to 75% of the machine's capabilities as a safety measure and evenly divide the usage amongst the three options. So, the number of cores for reading the input data (--read-threads), compressing (--compress-threads) and uploading (--upload-threads) the files would be two each. The command would look something like this:

$ ./ua --compress-threads 2 --upload-threads 2 --read-threads 2 10 file.txt Uploading file file.txt to file object file-xxxx File "file.txt" was uploaded successfully. Closing... file-xxxx

Using a different chunk size

You can change the chunk size that is uploaded at a time in each thread using the flag --chunk-size. The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes). We have a maximum limit of 10,000 chunks.

This parameter is dependent on the memory available on the machine. We highly recommend that you keep the default value. However, if your network connection is particularly slow, we suggest that you use a smaller chunk size. The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes).

$ ./ua --chunk-size 209715200 large-file.txt Uploading file large-file.txt to file object file-xxxx File "large-file.txt" was uploaded successfully. Closing... file-xxxx

Setting files as hidden

By default, Upload Agent sets all files as visible. You can override this behavior with the --visibility flag.

$ ./ua myfile.txt --visibility hidden Uploading file myfile.txt to file object file-xxxx File "myfile.txt" was uploaded successfully. Closing... file-xxxx $ dx ls $ dx ls -a # the -a flag shows hidden files myfile.txt

Specification

Output

Upon successful completion, the file IDs of the newly created remote files are printed to standard output (each on a new line). If a particular file upload was unsuccessful, then the string "Failed" is printed instead of the file ID. The lines are printed in same order as the files specified on command line for upload.

Errors

In case an error occurs, Upload Agent does not exit immediately. Instead, all other files are still uploaded and the program exits with a non-zero status code, printing "Failed" instead of the file ID of the failed upload(s). See the Output section below for more information.

Non-zero error code

The program exits with a non-zero error code if any of the following errors occur:

  • A valid authentication token was not provided.
  • A connection to the API server could not be made.
  • A file to be uploaded does not exist or is not accessible.
  • If --do-not-resume is not set and the user tries to upload the same file to a project more than once.
  • An unknown command line option or illegal value for an option is provided.
  • The project is not specified; the specified project does not exist; or the authentication token provided does not permit CONTRIBUTE access to the specified project.
  • The project specifier cannot be unambiguously resolved, e.g. if two or more projects match the given project name.
  • A folder or file object could not be created.
  • A file could not be closed (i.e., the /file-xxxx/close API call failed).
  • An error occurs while compressing a chunk, e.g., the machine ran out of memory.

File not fully uploaded

A file may not be fully uploaded if any of the following errors occur:

  • There is an ambiguity in resolving a resume target of a local file and --do-not-resume is not set, e.g. if a local file has been uploaded more than once to a project (partially or fully) and it cannot unambiguously determined which remote file upload should be resumed.
  • A chunk could not be uploaded in the specified number of tries.
  • A file could not be closed because one of the chunks was compressed below the 5MB limit. In this case, you should try uploading the failed file with either the --do-not-compress option, or by setting a larger --chunk-size.

Last edited by Samantha Zarate, 2017-11-09 21:18:03

 Feedback