Upload Agent

Introduction

The DNAnexus Upload Agent is a command-line client that can be used to upload files to DNAnexus. It is a fast and convenient alternative to using the web interface for creating and uploading files. Here are some of the advantages of using the DNAnexus Upload Agent:

  • Data is uploaded using multiple parallel connections, leading to faster upload speeds.
  • Text files (such are FASTA, FASTQ, qseq and others) are automatically compressed before upload, for better efficiency.
  • Multiple files can be created and uploaded in a single execution.
  • Previously interrupted uploads can be automatically resumed.
  • No user interaction is required; the client can be used to automate uploads of sequencing runs.

Quickstart

  1. Download and extract the upload agent. Currently, the DNAnexus Upload Agent is available for 64-bit Linux, Mac OS X, and Windows architectures. For the rest of this document, we will use ua to represent the executable's name, but you should replace it with the name of the executable you've extracted.

  2. Get an API token:

    1. Log in to the DNAnexus website.
    2. Click on your name in the top-right corner to get a drop-down menu, and click on "Profile".
    3. Click on the "API Tokens" tab and then on the green button labeled "+ New Token".
    4. Fill out the dialog box with the desired settings, and then click on "Generate Token".
    5. You should now see a long alphanumeric string; copy it for use with the Upload Agent.
      Note: You may notice that the Upload Agent will "just work" if you have already logged in via the command-line client dx. However, you may still want to use the method described above to acquire an API token, as your login token has expiration and inactivity timeout restrictions which can disrupt long-running or unattended uploads. You can set the expiration date of your API token in the New Token dialog.
  3. Try some of the examples below to get started.

General usage

Synopsis

./ua [options] <file> [...]

Options

Option Description
General
[--help | -h] Prints a help message.
[--env | -e] Prints environment information, e.g., Apiserver Host, Port, Protocol, etc.
File parameter
file Path of the local file or directory to be uploaded. This should be specified once for each file to be uploaded. If file is a directory, all files in the directory will be uploaded. For recursive uploads see the --recursive flag.
Connection parameters
[--auth-token | -a] auth-token Specifies the authentication token to be used while contacting the server. If not present, information is obtained from the DX_SECURITY_CONTEXT environment variable.
File creation parameters
[--project | -p] project-spec Specifies the project in which a file should be created. project-spec may be the name or ID of a project. If this option is not explicitly specified, then value of environment variable DX_WORKSPACE_ID (if present) will be used if the program is run from within a job, otherwise value of environment variable DX_PROJECT_CONTEXT_ID (if present) will be used.

If specified once, all files will be uploaded into the same project. If specified more than once, the number of projects specified must be equal to the number of files, and the ith file will be created in the project identified by the ith occurrence of this argument.
[--folder | -f] folder-path Specifies the complete path of the folder in which the file should be created. folder-path may be any valid folder path. If not present, the root folder (the folder named “/”) will be used. If specified once, all files will be uploaded into the same folder. If specified more than once, the number of folders specified must be equal to the number of files, and the ith file will be uploaded into the folder named by the ith occurrence. Any folder(s) that do not exist will be created, including any parent folders.
[--name | -n] remote-file-name Specifies the name of the remote file that will be created. remote-file-name may be any valid DNAnexus file name. If not present, the name of each file object created will be the filename of the local file being uploaded. If present, the number of occurrences of this parameter must be equal to the number of files being uploaded, and the ith file will be created with the name given by the ith occurrence of this parameter.
Note: If the file is compressed while uploading (default behavior for uncompressed files), then extension .gz will be appended to remote-file-name(or to the local filename, if --name was not used.)
[--visibility] hidden Sets the visibility of the file. Files are visible by default. If multiple files are specified, the same visibility is applied to all the files.
[--property] property-name=property-value Sets a property for this file. It can be used mulitple times to set as many properties as needed. If multiple files are specified, the same properties are applied to all the files.
[--type] file-type Sets the type of data for the file. It can be used mulitple times to set as many types as needed. If multiple files are specified, the same types are applied to all the files.
[--tag] tag Sets a tag on the file. It can be used mulitple times to set as many tags as needed. If multiple files are specified, the same tags are applied to all the files.
[--details] JSON-data Sets the details for the file. JSON-data must be a valid JSON formated string containing the details. If multiple files are specified, the same details are applied to all the files.
[--recursive] When this option is specified the upload agent will upload all the contents of the directory recursively.
[--read-from-stdin | -i] This option can be used to read data from stdin and upload it directly into a file.
Configuration parameters
[--compress-threads | -c] num-threads Specifies the number of parallel compress threads that should be used. num-threads must be a positive integer. The default is 7.
[--upload-threads | -u] num-threads Specifies the number of parallel upload threads that should be used. num-threads must be a positive integer. The default is 7.
--read-threads num-threads Specifies the number of parallel disk read threads. num-threads must be a positive integer. The default is 1.
[--chunk-size | -s] chunk-bytes Specifies how the files will be divided into chunks for upload. chunk-bytes must be a positive integer. The default chunk size is 100,000,000 bytes.
[--progress | -g] If set, then periodically progress of individual file uploads, and data transfer rates are printed on stderr. By default, progress is not reported.
[--tries | -r] num-tries Specifies the maximum number of times the program should try to upload each chunk. If a chunk cannot be successfully uploaded in num-tries tries, the file would be uploaded with the missing chunk, but won't be closed. A user can then resume the file upload (missing chunk) next time. Default is 3.
--do-not-compress If set, then file(s) won't be compressed before uploading. By default, every uncompressed file is compressed (gzip) before uploading to DNAnexus.
--do-not-resume If set, then upload agent will not try to resume any of the file(s). By default, if upload agent finds the same file (partially or fully completed), then upload will always be resumed. Note: To resume a file upload, the same values for --project, --chunk-size, and --do-not-compress should be used in the next call.
--wait-on-close If set, then upload agent will not exit until all files have been successfully closed (i.e., the close operation has been performed on each file object, and each file has moved from the closing state to the closed state). This is useful in order to guarantee that the files are available for further operations (e.g., invoking an analysis). By default, the upload agent does not wait for files to be closed before exiting (i.e., it may exit with some files being still in closing state).
[--verbose | -v] If set, prints a detailed log of operations being performed. Intended for developer use. By default, verbose mode is off.
[--test] This can be used to test the current configuration of the Upload Agent. It prints information regarding the current version, environment info, proxy settings, operating system, certificates and internet connectivity.

Specification

Output

Upon successful completion, the file IDs of the newly created remote files are printed to standard output (each on a new line). If a particular file upload was unsuccessful, then the string failed is printed instead of the file ID. The lines are printed in same order as the files specified on command line for upload.

Errors

The program exits with a non-zero error code if any of the following errors occur:

  • A valid authentication token was not provided.
  • A connection to the API server could not be made.
  • A file to be uploaded does not exist, or is not accessible.
  • The total number of files exceeds 1000 files.
  • If --do-not-resume is not set, and user tries to upload same file to same project more than once.
  • An unknown command line option, or illegal value for an option is provided.
  • The project is not specified; the specified project does not exist; the authentication token provided does not permit CONTRIBUTE access to the specified project.
  • The project specifier cannot be unambiguously resolved, e.g., if two or more projects match the given project name.
  • The folder could not be created, or a file object could not be created.
  • A file could not be closed (i.e., the /file-xxxx/close API call failed).
  • An error occurs while compressing a chunk, e.g., out of memory, etc.
  • Internal error.

A file may not be fully uploaded if any of the following errors occur:

  • If there is an ambiguity in resolving a resume target of a local file (and --do-not-resume is not set), e.g., if a local file has been uploaded more than once to a project (partially or fully), we cannot unambiguously determine which remote file upload should be resumed.
  • A chunk could not be uploaded in the specified number of tries.
  • A file could not be closed, because one of the chunks was compressed below the 5MB limit. In this case the user should try uploading the failed file with either --do-not-compress option, or by setting a larger --chunk-size.

Note: In case of errors above, Upload Agent does not exit immediately. Instead other file(s) are still uploaded, and the program then exits with non-zero status code, printing failed instead of the file ID of failed upload(s) (See output).

Examples

In the examples below, we assume that you have appropriately set your Environment Variables, specifically, it's assumed that authentication token, and, current workspace (project) are set. Remember that you can always override these environment variables by using --auth-token, and --project options on command line (see options section above for detail).

To see the current environment variables being used by UA, run: $ ./ua --env

Upload a single file

Let's upload a local file named "my-file.txt" to the project called "my-project".

$ ./ua --auth-token PgBLmyBweT2lTNr1zLk76Zd08rx5146m --project my-project my-file.txt
Output (stdout):
file-B1qXbP800007Z7f6k54Q0001
As seen in the output above, only the file ID is printed on stdout. More information is printed on stderr. For example, in the case above, the stderr output would be:
Uploading file my-file.txt to file object file-B1qXfYj0000K15z4jx4Q0001
File "my-file.txt" was uploaded successfully. Closing...

Notice that if you now log in to the website or use dx to look at the list the files in your project, you will find a file with the name my-file.txt.gz. Your file was automatically compressed using gzip and renamed to have the .gz extension.

Upload multiple files to the same project

Let's upload two local files "my-file-1.txt" and "my-file-2.txt" to the project "my-project".

$ ./ua --project my-project my-file-1.txt my-file-2.txt
Output (stdout):
file-B1qXj800000K15z4jx4Q0003
file-B1qXj8000008jV0Xjb4Q0001
The file IDs above are in same order as the list of files on command line, e.g., the first and second lines correspond to the new file IDs from uploading "my-file-1.txt" and "my-file-2.txt".

As usual, more information is printed on stderr. Output (stderr):

Uploading file my-file-1.txt to file object file-B1qXj800000K15z4jx4Q0003
Uploading file my-file-2.txt to file object file-B1qXj8000008jV0Xjb4Q0001
File "my-file-1.txt" was uploaded successfully. Closing...
File "my-file-2.txt" was uploaded successfully. Closing...

Upload a file to a particular folder and under a different name

You can change the final path that the file will have in the specified project via the flags --folder and --name. The following command uploads my-file-1.txt into the folder called oldData and behaves as if the file had been called File-1 (the new name is File-1.gz, after compression). If you don't want your files to be automatically compressed, see the next example.

$ ./ua --folder "/oldData" --name "File-1" my-file-1.txt

Output (stdout):

file-B1qY2xQ0000K15z4jx4Q0005

Turn off automatic compression

By default, Upload Agent compresses all the previously uncompressed file(s) before uploading (and appends .gz at the end of file name). You can override this behavior with the --do-not-compress flag.

$ ./ua --do-not-compress uncompressed.txt

Do not attempt resuming a previous upload

By default, Upload Agent attempts to resume all the uploads it can. You can override this behavior with the --do-not-resume flag.

$ ./ua --do-not-resume dont_resume.txt

Wait for a file to reach the closed state before exiting

If you want to call Upload Agent in a script and make it wait until the file is in the closed state, you can do so by adding the --wait-on-close flag. You do not have to wait for a file to be closed to give it as input to app or applet (the platform will automatically wait for the file to be closed before starting the job), but if you would like to copy the file between projects, then you must wait for it to be in the closed state.

$ ./ua --wait-on-close close_me.txt

Show upload progress

You can turn on progress reporting (printed on stderr) with the --progress flag.

$ ./ua --progress large_file.txt

Output (stderr):

Uploading file large-file.txt to file object file-B1qYK3Q00009jPJ1kk4Q0001
large-file.txt 75.12% complete ... Average transfer speed =   3.58 MB/sec ... Instantaneous transfer speed =   3.77 MB/sec

Change the number of threads

You can specify a different number of threads for compression and a different number of outgoing HTTPS connections that will be opened to upload the file chunks by using the flags --compress-threads and --upload-threads, respectively. The following command uses two parallel threads to compress the data, and ten parallel http/https connections will be opened to upload the file chunks.

$ ./ua --compress-threads 2 --upload-threads 10 file.txt 

Use a different chunk size

You can change the chunk size that is uploaded at a time in each thread using the flag --chunk-size. The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes). It is highly recommended that you keep the default value.

$ ./ua --chunk-size 209715200 large-file.txt

Set files as hidden

By default, Upload Agent sets all files as visible. You can override this behavior with the --visibility flag.

$ ./ua myfile.txt --visibility hidden

Set file properties

Upload Agent can set multiple properties for a file using the --property flag.

$ ./ua myfile.txt --property property1=my_property property2="another property"

Set file type

Upload Agent can set multiple file types for a file using the --type flag.

$ ./ua myfile.txt --type1 typeOne --type2=typeTwo

Set file details

Upload Agent can set details for a file using the --details flag. The details must be passed as a valid JSON string

$ ./ua myfile.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}'

Upload directories

Upload Agent can upload all the files in a directory. The destination of the files depends on the directory name given as input. If the name contains a trailing slash, the destination of the files is the current remote directory.

$ ./ua dir_name/
Without a trailing slash a new remote directory will be created and the files will be uploaded to that new directory (dir_name).
$ ./ua dir_name
Note that there is a limit of 1000 files to upload in a single operation.

Upload directories recursively

Upload Agent can upload a directory recursively by using the --recursive flag as an argument. The destination directory follows the same rules as above.

$ ./ua dir_name --recursive

Upload data from stdin

Upload Agent can upload data from stdin directly into a file. Note that when this option is used only one file can be created. This option is very useful to pipe output from one program and upload it as a file.

$ my_application | ua --read-from-stdin my_file.txt
This command will read data interactively from the terminal until the stream is terminated with ctrl+D.
$ ./ua --read-from-stdin my_file.txt

Running a simple diagnostic test

Running the Upload Agent with the --test flag will run a simple test to verify that UA is correctly configured. The output of a successful configuration will look similar to output below. Any errors detected will be printed to the screen.

$ ua --test
Upload Agent Version: 1.5.19
  git version: v0.201.0+g8dd2d2f
  libboost version: 1.55.0
  libcurl version: 7.45.0
Upload Agent v1.5.19, environment info:
  API server protocol: https
  API server host:     api.dnanexus.com
  API server port:     443
  Current User: user-<username>
  Current Project: <project name> (project-id)
Proxy Settings:
  No proxy set in environment.
Operating System:
  Name:    Linux
  Release: 4.4.0-59-generic
  Version: #80~14.04.1-Ubuntu SMP <Date>
  Machine: x86_64
CA Certificate: <path to certificate file>
Resolving Amazon S3:
  Resolved to <amazon s3 ip address>
Testing connection:
  Sucessfully contacted google.com over http: (200)
  Sucessfully contacted google.com over https: (200)

Print help message

$ ./ua --help
Output (stderr):
Usage: ./ua [options] <file> [...]

Available options:
-h [ --help ]                      Produce a help message
  --version                          Print the version
  -e [ --env ]                       Print environment information
  -a [ --auth-token ] arg            Specify the authentication token
  -p [ --project ] arg               Name or ID of the destination project
  -f [ --folder ] arg (=/)           Name of the destination folder
  -n [ --name ] arg                  Name of the remote file (Note: Extension 
                                     ".gz" will be appended if the file is 
                                     compressed before uploading)
  --visibility arg (=visible)        Use "--visibility hidden" to set the 
                                     file's visibility as hidden.
  --property arg                     Key-value pair to add as a property; 
                                     repeat as necessary, e.g. "--property 
                                     key1=val1 --property key2=val2"
  --type arg                         Type of the data object; repeat as 
                                     necessary, e.g. "--type type1 --type 
                                     type2"
  --tag arg                          Tag of the data object; repeat as 
                                     necessary, e.g. "--tag tag1 --tag tag2"
  --details arg                      JSON to store as details
  --recursive                        Recursively upload the directories
  --read-threads arg (=2)            Number of parallel disk read threads
  -c [ --compress-threads ] arg (=7) Number of parallel compression threads
  -u [ --upload-threads ] arg (=8)   Number of parallel upload threads
  -s [ --chunk-size ] arg (=75M)     Size of chunks in which the file should be
                                     uploaded. Specify an integer size in bytes
                                     or append optional units (B, K, M, G). 
                                     E.g., '50M' sets chunk size to 50 
                                     megabytes.
  --throttle arg                     Limit maximum upload speed. Specify an 
                                     integer to set speed in bytes/second or 
                                     append optional units (B, K, M, G). E.g., 
                                     '3M' limits upload speed to 3 
                                     megabytes/second. If not set, uploads are 
                                     not throttled.
  -r [ --tries ] arg (=3)            Number of tries to upload each chunk
  --do-not-compress                  Do not compress file(s) before upload
  -g [ --progress ]                  Report upload progress
  -v [ --verbose ]                   Verbose logging
  --wait-on-close                    Wait for file objects to be closed before 
                                     exiting
  --do-not-resume                    Do not attempt to resume any incomplete 
                                     uploads
  --test                             Test upload agent settings
  -i [ --read-from-stdin ]           Read file content from stdin
                   
Note that the help message is printed on standard error (stderr), and not standard out (stdout).

Last edited by nicolasbockg, 2017-03-01 17:44:21

 Feedback