Running Workflows

You can run workflows from the command-line using the command dx run. The inputs to these workflows can be from any project for which you have VIEW access.

We will be using the publicly available Exome Analysis Workflow as an example (platform login required to access this link). For help with viewing public projects, see Change to a public project. For a tutorial on how to build and run a workflow using the DNAnexus web UI, visit the Running a Workflow UI page.

NOTE: This page is exclusively dedicated to running workflows. For information about running apps and applets, please see the Running Apps and Applets page.

Running in Interactive Mode

If dx run is run without specifying an input, interactive mode will be launched. You will then be prompted to enter each required input, after which you will be given the option to select from a list of optional parameters to modify. Optional parameters listed will include all those that can be modified for each stage of the workflow. The interface will then output a JSON file detailing the input specified and generate an analysis ID of the form analysis-xxxx unique to this particular run of the workflow.

Below is an example of running the Exome Analysis Workflow (platform login required to access this link) interactively.

$ dx run Exome\ Analysis\ Workflow Entering interactive mode for input selection. Input: Reads (stage-xxxx.reads_fastqgz) Class: file Enter file ID or path ( twice for compatible files in current directory, '?' for more options) stage-xxxx.reads_fastqgz: Input/SRR504516_1.fastq.gz Input: BWA reference genome index (stage-xxxx.genomeindex_targz) Class: file Suggestions: project-BQpp3Y804Y0xbyG4GJPQ01xv://file-* (DNAnexus Reference Genomes) Enter file ID or path ( twice for compatible files in current directory, '?' for more options) stage-xxxx.genomeindex_targz: Input/human_g1k_v37.bwa-index.tar.gz Select an optional parameter to set by its # (^D or to finish): [0] Reads (right mates) (stage-xxxx.reads2_fastqgz) [1] Add read group information to the mappings (required by downstream GATK)? (stage-xxxx.add_read_group) [default=true] [2] Read group id (stage-xxxx.read_group_id) [default={"$dnanexus_link": {"input": "reads_fastqgz", "metadata": "name"}}] [3] Read group platform (stage-xxxx.read_group_platform) [default="ILLUMINA"] . . . [29] Vendor exome (stage-xxxx.vendor_exome) [default="agilent_sureselect_human_all_exon_50mb"] [30] Validation stringency (stage-xxxx.validation_stringency) [default="SILENT"] [31] Advanced command line options (stage-xxxx.advanced_options) Optional param #: 0 Input: Reads (right mates) (stage-xxxx.reads2_fastqgz) Class: file Enter file ID or path ( twice for compatible files in current directory, '?' for more options) stage-xxxx.reads2_fastqgz: Input/SRR504516_2.fastq.gz Select an optional parameter to set by its # (^D or to finish): [0] Reads (right mates) (stage-xxxx.reads2_fastqgz) [={"$dnanexus_link": {"project": "project-xxxx", "id": "file-xxxx"}}] [1] Add read group information to the mappings (required by downstream GATK)? (stage-xxxx.add_read_group) [default=true] [2] Read group id (stage-xxxx.read_group_id) [default={"$dnanexus_link": {"input": "reads_fastqgz", "metadata": "name"}}] . . . [29] Vendor exome (stage-xxxx.vendor_exome) [default="agilent_sureselect_human_all_exon_50mb"] [30] Validation stringency (stage-xxxx.validation_stringency) [default="SILENT"] [31] Advanced command line options (stage-xxxx.advanced_options) Optional param #: \ Using input JSON: { "stage-xxxx.reads_fastqgz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" } }, "stage-xxxx.genomeindex_targz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-yyyy" } }, "stage-xxxx.reads2_fastqgz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-zzzz" } } } Confirm running the executable with this input [Y/n]: <ENTER> Calling workflow-xxxx with output destination project-xxxx:/ Analysis ID: analysis-xxxx

Running in Non-Interactive Mode

You can specify each input on the command-line using the -i or --input flags using the syntax -i<stage ID>.<input name>=<input value>. <input-value> must take the form of a DNAnexus object ID or a file named in the project currently selected. It is also possible to specify the number of a stage in place of the stage ID for a given workflow, where stages are indexed starting at zero. The inputs in the following example are specified for the first stage of the workflow only to illustrate this point. Note that the parentheses around the <input-value> in the help string are omitted when entering input.

Possible values for the input name field can be found by running the command dx run workflow-xxxx -h, as shown below using the Exome Analysis Workflow (platform login required to access this link).

$ dx run Exome\ Analysis\ Workflow -h usage: dx run Exome Analysis Workflow [iINPUT_NAME=VALUE …] Workflow: Exome Analysis Workflow Inputs: stage-xxxx Reads: -istage-xxxx.reads_fastqgz=(file) A file, in gzipped FASTQ format, with the reads to be mapped (or the left reads, for paired pairs). Reads (right mates): -istage-xxxx.reads2_fastqgz=(file) (Optional) A file, in gzipped FASTQ format, with the right reads to be mapped (for paired reads). BWA reference genome index: -istage-xxxx.genomeindex_targz=(file) A file, in gzipped tar archive format, with the reference genome sequence already indexed with BWA. Suggestions: project-BQpp3Y804Y0xbyG4GJPQ01xv://file-* (DNAnexus Reference Genomes) stage-xxxx: Common Add read group information to the mappings (required by downstream GATK)?: ... stage-xxxx: Advanced Advanced command line options: ... Outputs: Sorted mappings: stage-xxxx.sorted_bam (file) A coordinate-sorted BAM file with the resulting mappings. Sorted mappings index: stage-xxxx.sorted_bai (file) The associated BAM index file. ... Per-target coverage file: stage-xxxx.pertarget_coverage_tsv (file) A text file containing per-target coverage information. See the app help for more information.

This help message describes the inputs for each stage of the workflow in the order they are specified. For each stage of the workflow, the help message will first list the required inputs for that stage, specifying the requisite type in the <input-value> field. Next, the message describes common options for that stage (as seen in that stage's corresponding UI on the platform). Lastly, it will list advanced command-line options for that stage. If any stage's input is linked to the output of a prior stage, the help message shows the default value for that stage as a DNAnexus link of the form {"$dnanexus_link": {"outputField": "<prior stage output name>", "stage": "stage-xxxx" }}.

Similarly, this link format can be used to specify output from any prior stage in the workflow as input for the current stage. We see that the Exome Analysis Workflow has two required file inputs in addition to those already specified by default: -i<stage ID>.reads_fastqgzs and -i<stage ID>.reads2_fastqgzs. As these inputs are for the first stage of the Exome Analysis Workflow, <stage ID> can be replaced with 0.

NOTE: Workflow stages are zero-indexed; the first stage of a workflow is denoted as stage 0.

The example below shows how to run the same Exome Analysis Workflow on a FASTQ file containing reads, as well as a BWA reference genome, using the default parameters for each subsequent stage.

$ dx run Exome\ Analysis\ Workflow \ -i0.reads_fastqgz=Input/SRR504516_1.fastq.gz \ -istage-xxxx.genomeindex_targz=Input/human_g1k_v37.bwa-index.tar.gz -y Using input JSON: { "stage-xxxx.reads_fastqgz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" } }, "stage-xxxx.genomeindex_targz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" } } } Calling workflow-xxxx with output destination project-xxxx:/ Analysis ID: analysis-xxxx

Specifying array input

Array input can be specified by specifying multiple inputs for a single parameter in a stage. For example, the following flags would add files 1 through 3 to the file_inputs parameter for stage-xxxx, or the first stage, of the workflow workflow:

$ dx run workflow\ -i0.file_inputs=project-xxxx:file-xxxx \ -istage-xxxx.file_inputs=project-xxxx:file-xxxx \ -istage-xxxx.file_inputs=project-xxxx:file-xxxx Using input JSON: { "stage-xxxx.forward_reads": [ { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" }, { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" }, { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" } ] } Confirm running the executable with this input [Y/n]: Y Calling workflow-xxxx with output destination project-xxxx:/ Analysis ID: analysis-xxxx

If no project is selected, or if the file is in another project, the project containing the files you wish to use must be specified as follows: -i<stage ID>.<input name>=<project id>:<file id>. See the Project and Data Object Navigation page for details on how to find this information for each file.

Job-based object references

The -i flag can also be used to specify job-based object references with the syntax -i<stage ID or number>:<input name>=<job id>:<output name>. The --brief flag, when used with the command dx run, will only output the execution's ID; we can also skip the interactive prompts confirming the execution using the -y flag. Calling dx run on a job with the --brief flag allows the command to return just the job ID of that execution and we can skip being prompted to begin execution with the -y flag.

The example below calls the BWA-MEM FASTQ Read Mapper app (platform login required to access this link) to produce the sorted_bam output described in the help string produced by running dx run app-bwa_mem_fastq_read_mapper -h. This output is then used as input to the first stage of the Parliament Workflow featured on the DNAnexus platform (platform login required to access this link).

$ dx run Parliament -i0.illumina_bam=$(dx run bwa_backtrack_fastq_read_mapper -ireads_fastqgz=file-xxxx -ireads2_fastqgz=file-xxxx -igenomeindex_targz=project-BQpp3Y804Y0xbyG4GJPQ01xv:file-B6qq53v2J35Qyg04XxG0000V -y --brief):sorted_bam -i0.ref_fasta=project-BQpp3Y804Y0xbyG4GJPQ01xv:file-B6qq53v2J35Qyg04XxG0000V -y Using input JSON: { "stage-F14F5qQ0Jz1gfpjX8y1JxG3y.illumina_bam": { "$dnanexus_link": { "field": "sorted_bam", "job": "job-xxxx" } }, "stage-F14F5qQ0Jz1gfpjX8y1JxG3y.ref_fasta": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-B6qq53v2J35Qyg04XxG0000V" } } } Calling workflow-xxxx with output destination project-xxxx:/ Analysis ID: analysis-xxxx

Advanced Options

Quiet output

Using the --brief flag at the end of a dx run command will cause the command line to print the execution's analysis ID ("analysis-xxxx") instead of the input JSON for the execution. This ID can be saved for later reference.

$ dx run Exome\ Analysis\ Workflow \ -i0.reads_fastqgz=Input/SRR504516_1.fastq.gz \ -istage-xxxx.genomeindex_targz=Input/human_g1k_v37.bwa-index.tar.gz -y --brief analysis-xxxx

Rerunning an analysis with the same settings

If you are debugging an analysis and wish to run an analysis with the exact settings of a previous run (including all inputs, parameters, and output destinations), but with a different version of the executable, you can use the --clone flag to copy the input JSON of the job you wish to rerun.

$ dx run Exome\ Analysis\ Workflow --clone analysis-xxxx -y Using input JSON: {} The following 4 stage(s) will reuse results from a previous analysis: Stage 0: BWA-MEM FASTQ Read Mapper (job-xxxx) Stage 1: FastQC Reads Quality Control (job-xxxx) Stage 2: Vendor Human Exome GATK-Lite Pipeline (job-xxxx) Stage 3: Vendor Human Exome Selection Metrics (job-xxxx)

The --clone flag will then take the inputs from analysis-xxxx and rerun any new executables that were replaced in the new workflow. In the example above, the executables were not replaced, so the results at each of those stages were reused and the execution was not run again.

To modify specific settings from the previous analysis, you can run the command dx run <analysis executable> --clone analysis-xxxx [options]. The [options] parameters will override anything set by the --clone flag, and they take the form of options passed as input from the command line (see Running in Non-Interactive Mode).

Note that the --clone flag will not copy the usage of the --allow-ssh or --debug-on flags, which must be set with the new execution; only the applet, instance type, and input spec are copied. See the Connecting to Jobs page for more information on the usage of these flags.

For example, the command below redirects the output of the analysis to the outputs/ folder and reruns all stages.

$ dx run Exome\ Analysis\ Workflow \ --clone analysis-xxxx --rerun-stage "*" --destination project-xxxx:/output -y analysis-xxxx

Note: Only the outputs of stages rerun are placed in the destination specified.

Rerunning a specific stage

When rerunning workflows, if a stage is run identically to how it was run in a previous analysis, the stage itself will not be rerun; the outputs of that stage will not be copied or rewritten in a new location. To rerun a specific stage, use the option --rerun-stage STAGE_ID to force a stage to be run again, wherein STAGE_ID is an ID of the form stage-xxxx, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). If you wish to rerun all stages of an analysis, you can use --rerun-stage "*", where the asterisk is enclosed in quotes to prevent expansion of that variable into all folders your current directory via globbing.

The command below reruns the third and final stage of the Exome Analysis Workflow (platform login required to access this link):

$ dx run Exome\ Analysis\ Workflow --clone analysis-xxxx --rerun-stage 4 \ --brief -y analysis-xxxx

Specifying an analysis' output folder

The --destination flag allows you to specify the path of the output of a workflow. (See Running Apps and Applets for more information.) Each output of each stage will be written to the destination specified by default.

Specifying a stage's output folder

You can use the --stage-output folder <stage_ID> <folder> command to specify the output destination of a particular stage in the analysis being run, wherein stage_ID is an ID of the form stage-xxxx, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0) and folder is the project and path to which you wish the stage to write using the syntax project-xxxx:/PATH where PATH is the path to the folder in project-xxxx where you wish to write outputs.

The following command re-runs all stages of the Exome Analysis Workflow and sets the output destination of the first step of the workflow (BWA) to "mappings" in the current project:

dx run Exome\ Analysis\ Workflow --clone analysis-xxxx --rerun-stage "*" \ --stage-output-folder 0 "mappings" --brief -y

Specifying a stage's relative output folder

If you want to specify output folder of a stage within the current output folder of the entire analysis, you can use the flag --stage-relative-output-folder <stage_id> <folder>, wherein stage_id is an object ID of the form stage-xxxx, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). For the argument, you can specify some path (in quotes) to write the output of that stage to relative to the output folder of the analysis.

The following command re-runs all stages of a cloned Exome Analysis Workflow (platform login required to access this link), sets the output destination of the analysis to "/exome_run", and sets the output destination of stage 0 to "/exome_run/mappings" in the current project:

dx run Exome\ Analysis\ Workflow --clone analysis-xxxx --rerun-stage "*" \ --destination "exome_data" \ --stage-relative-output-folder 0 "mappings" --brief -y analysis-xxxx

Specifying a different instance type

If you wish to specify the instance type of all stages in your analysis or a specific set of stages in your analysis, you can do so with the flag --instance-type ‘{"STAGE_ID": "INSTANCE_TYPE"}'. Here STAGE_ID is an ID of the form stage-xxxx, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). For information on specifying instance types from the command line, see Specifying a different instance type (apps and applets).

The example below re-runs all stages of the Exome Analysis Workflow (platform login required to access this link), and specifies that the first and second stages should be run on mem1_hdd2_x8 and mem1_ssd2_x4 instances respectively:

$ dx run Exome\ Analysis\ Workflow --clone analysis-F691FxQ0k4QJ5bZk7z185K8x \ --rerun-stage "*" --instance-type '{"0": "mem1_hdd2_x8", "1": "mem1_ssd2_x4"}' \ --brief -y analysis-F6932xj0k4Q6GqPgJ2J6Z0QX

Adding metadata to an analysis

This is identical to adding metadata to a job; see Adding metadata to a job for details.

Watching an analysis

Currently, it is not possible to watch an analysis to a command line. For information about watching a job from the command line, see Watching a job.

Providing input JSON

This is identical to providing an input json to a job; see Providing input JSON for details.

Note that as in running a workflow in non-interactive mode, inputs to a workflow must be specified as STAGE_ID.<input>, where STAGE_ID is either an ID of the form stage-xxxx or the index of that stage in the workflow (starting with the first stage at index 0).

Last edited by Samantha Zarate, 2017-12-19 19:21:22

 Feedback