Command Line Client/Quickstart

Welcome to the quick tour of the command-line tool dx! You can use this tool to log in, upload data, navigate and organize your data, and launch analyses. The rest of this guide will walk you through a simple data analysis and then provide you some more sophisticated scripts that you may want to run.

Step 0: Install the SDK

To use the command-line tool, make sure you have installed the DNAnexus SDK. In the examples that follow, text is sometimes shown in bold or blue to indicate when and what you will need to enter at the bash prompt, or at an interactive prompt of dx.

Upgrading the SDK

To update your version of the command-line tool, you can run the command dx upgrade. For more information about upgrading your SDK, please see the Downloads page.

Step 1: Log in

The first thing you'll need to do is to log in. If you haven't created a DNAnexus account yet, please visit the website and sign up. User sign-up is not supported on the command line.

$ dx login Acquiring credentials from Username: <your username> Password: <your password> No projects to choose from. You can create one with the command "dx new project". To pick from projects for which you only have VIEW permissions, use "dx select --level VIEW" or "dx select --public".

Your authentication token and your current project settings should now be saved in a local configuration file, and you're ready to start accessing your project.

Command-line help

If you ever have questions about what a dx command does, you can use the dx help command or add the ‑h flag. For example, to get the full details on what dx ls can do, run dx help ls or dx ls ‑h.

$ dx help ls usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]] [--env-help] [--brief | --summary | --verbose] [-a] [-l] [--obj] [--folders] [--full] [path] List folders and/or objects in a folder ...

You can also use the help command to explore the available commands. Try running dx help to see a list of categories, and you can run dx help <category> to see the related commands. Here's an example.

$ dx help fs dx fs: Navigate and organize your projects and files Commands: ls List folders and/or objects in a folder tree List folders and objects in a tree pwd Print current working directory select List and select a project to switch to cd Change the current working directory cp Copy objects and/or folders between different projects mv Move or rename objects and/or folders inside a project mkdir Create a new folder rmdir Remove a folder rm Remove data objects and folders rmproject Delete projects help Displays help messages and dx commands by category

Tab completion

In addition to the help, tab completion is available on the command line for the dx tool (when using the bash shell). Simply press Tab twice to see available completions for the current command.

Step 2: Look around

Public projects

Let's look inside some of the public projects that have already been set up.

$ dx select --public --name "Reference Genome Files" Available public projects: 0) Reference Genome Files (VIEW) Pick a numbered choice: 0 Setting current project to: Reference Genomes Files

By running the dx select command and picking a project, you've now done the command-line equivalent of going to the project page for Reference Genomes Files on the website. This is a DNAnexus-sponsored project containing popular genomes for use with running analyses on your own data.

TIP: For more information on the arguments --public and --name given to dx select, please see our documentation on dx select or run $dx select -h on the command line.
Note: You will never be charged for DNAnexus-sponsored data, so you can copy data from this project however many times you'd like, free of charge.

Now, you can list all of the data in the top-level directory of the project you've just picked by running the command dx ls, or give a folder argument to list it.

$ dx ls C. Elegans - Ce10/ D. melanogaster - Dm3/ H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/ H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/ H. Sapiens - GRCh38/ H. Sapiens - hg19 (Ion Torrent)/ H. Sapiens - hg19 (UCSC)/ M. musculus - mm10/ M. musculus - mm9/ $ dx ls "C. Elegans - Ce10/" ce10.bt2-index.tar.gz ce10.bwa-index.tar.gz ...
TIP: To avoid typing out the full folder name (with escaped characters), simply type $dx ls C then press tab twice for tab completion.
TIP: Use the -l flag to list more details, such as the time a file was last modified, its size (if applicable), and its full DNAnexus ID.
TIP: In our example, we picked "Reference Genomes", so that project's contents get listed automatically. You can look into another project, and a folder within the project, by giving the project name, followed by a colon (":") and the folder path.
$ dx ls -l "Demo Data:/SRR100022/" Project: Demo Data (project-BQbJpBj0bvygyQxgQ1800Jkk) Folder : /SRR100022 State Last modified Size Name (ID) ...
TIP: For more information on the path syntax, see Path Resolution.

Getting more details

Your key to getting more details about anything that exists on the DNAnexus platform is the dx describe command. It will take in DNAnexus IDs or filenames and return details on all results it finds.

Example: Describing a data object

$ dx describe "C. Elegans - Ce10/ce10.fasta.gz" Result 1: ID file-BQbY9Bj015pB7JJVX0vQ7vj5 Class file Project project-BQpp3Y804Y0xbyG4GJPQ01xv Folder /C. Elegans - Ce10 Name ce10.fasta.gz State closed Visibility visible Types - Properties Origin=, Assembly=UCSC ce10, Species=Caenorhabditis elegans, Taxonomy ID=6239 Tags - Outgoing links - Created Tue Sep 30 15:54:35 2014 Last modified Thu Oct 9 09:48:55 2014 archivalState null Media type application/x-gzip Size 29.21 MB

Example: Describing a project

$ dx describe "Reference Genome Files:" Result 1: Result 1: ID project-BQpp3Y804Y0xbyG4GJPQ01xv Class project Name Reference Genome Files Summary Billed to org-dnanexus Protected false Restricted false Created Wed Oct 8 13:42:53 2014 Last modified Thu Oct 9 15:38:01 2014 Data usage 96.38 GB Sponsored data 0.00 GB # Files 47 Tags - Access level VIEW Properties - containsPHI false

Besides describing data and projects, you can also describe apps, jobs, and users. We will discuss these more later.

Step 3: Create your own project

Now we'll use the command dx new project to create a new project:

$ dx new project "My First Project" Created new project called "My First Project" (project-B3X8bjBqqBk1y7bVPkvQ0001) Switch to new project now? [y/N]: y

You're now ready to start uploading your data and running your own analyses.

NOTE: Instead of project-B3X8bjBqqBk1y7bVPkvQ0001, your project will have a unique ID which the system uses to identify your specific project.

Step 4: Upload and manage your data


If you have a sample that you would like to analyze, you can use use the dx upload command or the Upload Agent if you have installed it. You can also download and use the following file which represents the first 25000 C. elegans reads from SRR070372: small-celegans-sample.fastq. We will use this file again later to run through a sample analysis.

TIP: For uploading multiple or large files, it is highly recommended to use the Upload Agent because it will compress your files and upload them in parallel over multiple HTTP connections. It also has other features such as resumable uploads. See the Upload Agent Guide for how to upload files using that tool.

The following command uploads our file into the current directory of the current project. The ‑‑wait flag tells dx to wait until it has finished uploading the data before returning the prompt and describing the result.

$ dx upload --wait small-celegans-sample.fastq [===========================================================>] Uploaded (16801690 of 16801690 bytes) 100% small-celegans-sample.fastq ID file-B4240xpqqBk1G8x53590004P Class file Project project-B3X8bjBqqBk1y7bVPkvQ0001 Folder / Name small-celegans-sample.fastq State closed Visibility visible Types - Properties - Tags - Details {} Outgoing links - Created Sat Jan 26 12:45:43 2013 Created by alice Last modified Sat Jan 26 12:45:52 2013 Media type text/plain Size 16.02 MB
TIP: If you run the same command but add ‑‑brief, only the file ID (in this case "file‑B4240xpqqBk1G8x53590004P") will be printed to the terminal. Other dx commands like this one also accept the ‑‑brief flag and will do something similar.
TIP: You may also notice that there are fields for different metadata for the file, such as "types", "properties", "tags", and "details". Run dx help upload to see how you can set some of those fields. Some of them can also be set after you've uploaded the file, but metadata such as visibility, types, and details must be specified when the file is first created and can never be changed again.


To take a quick look at the first few lines of the file you just uploaded, use the dx head command. By default, it prints the first 10 lines if given a file. (You can also use the command to look at part of a table, but we'll get to that later.)

Let's run it on the file we just uploaded and use the ‑n flag to ask for the first 12 lines (and hence the first 3 reads) of the FASTQ file.



If you'd like to download a file back out of the platform, just use the dx download command. This command will use the name of the file for the filename unless you specify your own with the ‑o/‑‑output flag.

$ dx download small-celegans-sample.fastq

Step 5: Analyze a sample

For the next few steps, you will need a FASTQ file of C. elegans reads if you would like to follow along exactly (we will map them against the ce10 genome). If you haven't already, you can download and use the following file which represents the first 25000 reads from SRR070372: small-celegans-sample.fastq.

TIP: If you'd like, you can also substitute your own reads file for a different species (though it may take longer to run through the example). For your convenience, DNAnexus has already imported the reference genomes b37, ce10, dm3, grch38, hg19, hs37d5, mm9, and mm10 to the platform. If you have a FASTA file of your own genome that you would like to use, you can upload the file and create genome indices for either BWA or Bowtie2 using the BWA FASTA Indexer app or the Bowtie2 FASTA Indexer app.

The following walkthrough is helpful if you would like to understand what all the commands do and take a look at what apps you're running, but if you're just interested in converting a FASTQ file to a VCF file via BWA and the FreeBayes variant caller, then you can skip ahead to Step 5.6 where you can see all the commands you would need to run.

Step 5.1: Upload reads

If you did not do so already, upload a FASTQ file for analysis.

$ dx upload small-celegans-sample.fastq --wait

Step 5.2: Map reads

Next, use the BWA-MEM app to map the uploaded reads file to a reference genome.

Find the app name

First of all, if you don't know the command-line name of the app you would like to run, you have two options: 1. Navigate to its webpage from the Apps page on the website. The app's page will tell you how to run it from the command-line. Try looking for it on the BWA-MEM FASTQ Read Mapper page 2. Alternatively, you can search for apps from the command-line by running dx find apps. You will find the name of the app that you can use on the command-line in the parentheses (highlighted in green below).

$ dx find apps ... x BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper), v1.4.0 ...

Install and run the app

Now let's install the app using dx install and check that it has been installed. While you do not always need to install an app to run it, you may find it useful as a bookmarking tool.

$ dx install bwa_mem_fastq_read_mapper Installed the bwa_mem_fastq_read_mapper app $ dx find apps --installed BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper), v1.4.0

Run the app using dx run. We will run it without any arguments; it will then prompt us for required and then optional arguments.

$ dx run bwa_mem_fastq_read_mapper Entering interactive mode for input selection.

Input: Reads (reads_fastqgz) Class: file
Enter file ID or path (<TAB> twice for compatible files in current directory,'?' for help) reads_fastqgz: small-celegans-sample.fastq

Input: BWA reference genome index (genomeindex_targz) Class: file
Suggestions: project-BQpp3Y804Y0xbyG4GJPQ01xv://file-* (DNAnexus Reference Genomes)
Enter file ID or path (<TAB> twice for compatible files in current directory,'?' for more options)
genomeindex_targz: "Reference Genome Files:/C. Elegans - Ce10/ce10.bwa-index.tar.gz"

Select an optional parameter to set by its # (^D or to finish):

[0] Reads (right mates) (reads2_fastqgz)
[1] Add read group information to the mappings (required by downstream GATK)? (add_read_group) [default=true]
[2] Read group id (read_group_id) [default={"$dnanexus_link": {"input": "reads_fastqgz", "metadata": "name"}}]
[3] Read group platform (read_group_platform) [default="ILLUMINA"]
[4] Read group platform unit (read_group_platform_unit) [default="None"]
[5] Read group library (read_group_library) [default="1"]
[6] Read group sample (read_group_sample) [default="1"]
[7] Output all alignments for single/unpaired reads? (all_alignments)
[8] Mark shorter split hits as secondary? (mark_as_secondary) [default=true]
[9] Advanced command line options (advanced_options)

Optional param #:

Using input JSON: {
"reads_fastqgz": {
"$dnanexus_link": {
"project": "project-B3X8bjBqqBk1y7bVPkvQ0001",
"id": "file-B3P6v02KZbFFkQ2xj0JQ005Y"
"genomeindex_targz": {
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-BQbYJpQ09j3x9Fj30kf003JG"
}} }
Confirm running the applet/app with this input [Y/n]: <ENTER> Calling app-BP2xVx80fVy0z92VYVXQ009j with output destination   project-B3X8bjBqqBk1y7bVPkvQ0001:/
Job ID: job-B3P6v5BKZbFFvfQ9gjJQ004K

Note: there are additional ways to specify the inputs when you run an app. We'll go through some of the other ones by the time we're done.

Monitor your job

If you'd like to view the job's log stream and tty output, you can use dx watch:

$ dx watch job-B3P6v5BKZbFFvfQ9gjJQ004K

... or just come back later to check up on it. You can use dx describe <jobID> if you still have the job ID handy. If you don't, you can use dx find jobs to list all the jobs run in the current project, along with their status and when they were run.

$ dx find jobs
* BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main)(done) job-B3P6v5BKZbFFvfQ9gjJQ004K bhannigan 2014-10-01 15:20:31 (runtime 0:00:27) $ dx describe job-B3P6v5BKZbFFvfQ9gjJQ004K ...
TIP: There are additional options by which you can restrict your search of previous jobs, e.g. by name. Run dx help find jobs for more details.

After your job finishes

You should now see two new files in your project: the mapped reads in a bam file, and an index of that bam file with a .bai extension. You can see refer to the output file by name or by the job that produced it using the syntax "<job ID>:<output field>". Try it yourself with the job ID you got from calling the BWA-MEM app!

$ dx ls small-celegans-sample.bam small-celegans-sample.bam.bai small-celegans-sample.fastq $ dx describe small-celegans-sample.bam ... $ dx describe job-B3P6v5BKZbFFvfQ9gjJQ004K:sorted_bam ...

Step 5.3: Call variants

You can use the FreeBayes Variant Caller app to call variants on your bam file.

This time, we won't rely on the interactive mode to enter our inputs. We will provide them directly. But first, let's look up the app's spec so we know what the inputs are called. For this, let's run the command dx run freebayes ‑h.

$ dx run freebayes -h usageusage: dx run freebayes [-iINPUT_NAME=VALUE ...] App: FreeBayes Variant Caller Calls variants (SNPs, indels, and other events) using FreeBayes See the app page for more information: Inputs: Sorted mappings: -isorted_bams=(file) [-isorted_bams=... [...]] One or more coordinate-sorted BAM files containing mappings to call variants for. Genome: -igenome_fastagz=(file) A file, in gzipped FASTA format, with the reference genome that the reads were mapped against. ...

Optional inputs are shown using square brackets ("[ ]") around the command-line syntax for each input. Thus you'll notice that there are two required inputs that must be specified: 1. Sorted mappings (sorted_bams): A list of files with a .bam extension 2. Genome (genome_fastagz): A reference genome in FASTA format that has been gzipped.

TIP: You can also run dx describe freebayes for a more compact view of the input and output specifications. By default, it will hide the advanced input options, but you can view them as well with the ‑‑verbose flag.

Run the app with a one-liner using a job-based object reference

It's faster and sometimes more convenient to be able to run the app in a single one-line command. You can do this by specifying all the necessary inputs either on the command or in a prepared file. We will use the ‑i flag to specify inputs as suggested by the output of dx run freebayes ‑h:

  • sorted_bams: we will set this to the output of the previous BWA step
  • genome_fastagz: the ce10 genome in the Reference Genomes project.

To specify new job input using the output of a previous job, we'll use a job-based object reference via the "<job ID>:<output field>" syntax which we saw earlier.

TIP: You can use job-based object references as input even before the referenced jobs have finished; the system will simply queue up the new job until the input is ready.

The job ID used below is from the main BWA job you created in the previous step (change it to match your own job). The ‑y flag skips the input confirmation.

$ dx run freebayes -y \ -igenome_fastagz=Reference\ Genome\ Files:/C.\ Elegans\ -\ Ce10/ce10.fasta.gz \ -isorted_bams=job-B3P6v5BKZbFFvfQ9gjJQ004K:sorted_bam Using input JSON: { "genome_fastagz": { "$dnanexus_link": { "project": "project-BQbJ0480KKgB5KpXbQ0Q1bpJ", "id": "file-BQbY9Bj015pB7JJVX0vQ7vj5" } }, "sorted_bams": { "field": "sorted_bam", "job": "job-B3P6v5BKZbFFvfQ9gjJQ004K" } } Calling app-BFG5k2009PxyvYXBBJY00BK1 with output destination project-B3X8bjBqqBk1y7bVPkvQ0001:/ Job ID: job-B3P6vJXKZbF18Vf81vBQ005p

Automatically run a command after a job finishes

You can use the command dx wait to wait for a job to finish. If we run the following command right after running the Freebayes app, it will show you the recent jobs only after the job has finished.

$ dx wait job-B3P6vJXKZbF18Vf81vBQ005p && dx find jobs Waiting for job-B3P6vJXKZbF18Vf81vBQ005p to finish running... Done * FreeBayes Variant Caller (done) job-B3P6vJXKZbF18Vf81vBQ005p bhannigan 2014-10-01 16:07:57 (runtime 0:05:24) ...

Congratulations! You have now called variants on a reads sample, and you did it all on the command-line. Now let's look at how you can automate this process.

Step 5.4: Automate it!

The beauty of the command-line interface is of course the ability to automate things like all those steps we just went through. The following script assumes you've already logged in and is hardcoded to use the ce10 genome and takes in a local FASTQ file as its command-line argument.

#!/bin/bash # Usage: <this_script> local_fastq_filename.fastq reference="Reference Genome Files:/C. Elegans - Ce10/ce10.fasta.gz" bwa_indexed_reference="Reference Genome Files:/C. Elegans - Ce10/ce10.bwa-index.tar.gz" local_reads_file="$1" reads_file_id=$(dx upload "$local_reads_file" --brief) bwa_job=$(dx run bwa_mem_fastq_read_mapper -ireads_fastqgz=$reads_file_id -igenomeindex_targz="$bwa_indexed_reference" -y --brief) freebayes_job=$(dx run freebayes -isorted_bams=$bwa_job:sorted_bam -igenome_fastagz="$reference" -y --brief) dx wait $freebayes_job dx download $freebayes_job:variants_vcfgz -o "$local_reads_file".vcf

Step 6: Finding data and scripting with dx

You can use the command dx find data with various flags to look for files of particular types, tags, etc. The following examples give a flavor of the kind of scripting you can do with it.

Example: Removing old data

The following line will remove all data objects in the current project that have not been modified in the last 6 months and have been marked for deletion with the tag "purge".

$ dx rm $(dx find data --mod-before=-6M --tag purge)

Example: Restricting by date

You can also restrict the search by date and list files uploaded in a particular month.

$ dx find data --created-after 2013-05-01 --created-before 2013-06-01 \ --class file

Example: Finding a particular sample

Suppose you are trying to find a particular sample, and you have labelled all of your samples with their ID as one of their properties. You can then find it using dx find data and use the ‑‑brief flag to get the dx-compatible path:

$ dx find data --property sample_id=id12345 --brief project-9zVpbQf4Zg2641v5BGY00001:file-9zbpq72y8x6F0xPzKZB00003

An example of using it as input to another dx command:

$ dx describe $(dx find data --property sample_id=id12345 --brief) Result 1: ID file-9zbpq72y8x6F0xPzKZB00003 Class file Project project-9zVpbQf4Zg2641v5BGY00001 Folder / Name sample State closed Visibility visible Types - Properties sample_id=id12345 Tags - Outgoing links - Created Mon Jul 30 12:09:03 2012 Created by alice Last modified Mon Jul 30 12:54:38 2012

Scripting with dx

As shown in some of the examples above, the ‑‑brief flag can come in handy for scripting, and you may find some other common flags useful as well.

What's next

For more detailed information about running apps and applets from the Command Line Client, see Running Apps and Applets.

For a comprehensive guide to the DNAnexus Command Line Client, see the Command Line Client Reference and Index of dx Commands.

Want to start writing your own apps? Check out the Developer Portal for some useful tutorials.

Last edited by Samantha Zarate, 2017-08-10 23:02:14