Quickstart

Welcome to the dx command-line tool quickstart! You can use this tool to log in, upload data, navigate and organize your data, and launch analyses on the DNAnexus platform. This guide will introduce you to the platform, walk you through a simple data analysis, and provide you with some more sophisticated scripts that you may want to run.

Introduction

The DNAnexus SDK (dx-toolkit) helps users utilize the DNAnexus platform to its full potential. Feel free to refer to this section throughout the walkthrough for help.

All of the projects and data that we use in this Quickstart are publicly available, so you can follow along step-by-step.

In the examples that follow, text is sometimes shown in bold or blue to indicate when and what you will need to enter at the bash prompt or an interactive prompt of dx, respectively.

Back to top of page

Command-Line Help

If you ever have questions about what a dx command does, you can use the dx help command or add the -h/--help flag to any dx command. For example, to learn more about what the command dx ls can do, you can run dx help ls, dx ls -h, or dx ls --help.

$ dx help ls usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]] [--env-help] [--brief | --summary | --verbose] [-a] [-l] [--obj] [--folders] [--full] [path] List folders and/or objects in a folder … # output truncated for brevity

You can also use the dx help command to explore the available commands. Running dx help by itself will show you the different categories of dx commands, and you can run dx help <category> to see the related commands.

$ dx help usage: dx help [-h] [command_or_category] [subcommand] Displays the help message for the given command (and subcommand if given), or displays the list of all commands in the given category. CATEGORIES all All commands session Manage your login session fs Navigate and organize your projects and files data View, download, and upload data metadata View and modify metadata for projects, data, and executions workflow View and modify workflows exec Manage and run apps, applets, and workflows org Administer and operate on orgs other Miscellaneous advanced utilities … # output truncated for brevity $ dx help fs dx fs: Navigate and organize your projects and files Commands: ls List folders and/or objects in a folder tree List folders and objects in a tree pwd Print current working directory select List and select a project to switch to cd Change the current working directory cp Copy objects and/or folders between different projects mv Move or rename objects and/or folders inside a project mkdir Create a new folder rmdir Remove a folder rm Remove data objects and folders rmproject Delete a project new project Create a new project new record Create a new record help Display help messages and dx commands by category

Back to top of page

Tab-Completion

In addition to the command dx help, tab-completion is available for the dx tool when using the bash shell. Simply press <TAB> once to see available completions for the current command. When there are multiple completions, you can press <TAB> twice to see all possible completions.

Back to top of page

Step 0: Install the SDK

To use the command-line interface, make sure you have installed the DNAnexus Software Development Kit (SDK).

Back to top of page

Upgrading the SDK

To update your version of the command-line tool, you can run the command dx upgrade. For more information about upgrading your SDK, please see the Downloads page.

Back to top of page

Step 1: Log In

The first thing you'll need to do is to log in. If you haven't created a DNAnexus account yet, please visit the website and sign up. User sign-up is not supported on the command line.

$ dx login Acquiring credentials from https://auth.dnanexus.com Username: <your username> Password: <your password> No projects to choose from. You can create one with the command "dx new project". To pick from projects for which you only have VIEW permissions, use "dx select --level VIEW" or "dx select --public".

Your authentication token and your current project settings have now been saved in a local configuration file, and you're ready to start accessing your project.

For more information about logging in to the platform, including a brief guide to authentication tokens, please see the Login and Logout page.

Back to top of page

Step 2: Look Around

Public Projects

Let's look inside some of the public projects that have already been set up.

$ dx select --public --name "Reference Genome Files" Available public projects: 0) Reference Genome Files (VIEW) Pick a numbered choice: 0 Setting current project to: Reference Genome Files

By running the dx select command and picking a project, you've now done the command-line equivalent of going to the project page for Reference Genome Files (platform login required to access this link) on the website. This is a DNAnexus-sponsored project containing popular genomes for you to use when running analyses with your own data.

For more information about the dx select command, please see the Changing Your Current Project page.

NOTE: You will never be charged for DNAnexus-sponsored data, so you can copy data from this project however many times you'd like, free of charge.

Now, you can list all of the data in the top-level directory of the project you've just selected by running the command dx ls. You can also see the contents of a folder by running the command dx ls <folder_name>.

$ dx ls C. Elegans - Ce10/ D. melanogaster - Dm3/ H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/ H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/ H. Sapiens - GRCh38/ H. Sapiens - hg19 (Ion Torrent)/ H. Sapiens - hg19 (UCSC)/ M. musculus - mm10/ M. musculus - mm9/ $ dx ls "C. Elegans - Ce10/" ce10.bt2-index.tar.gz ce10.bwa-index.tar.gz … # output truncated for brevity

You can avoid typing out the full name of the folder by typing in dx ls C and then pressing <TAB>. The folder name will auto-complete from there. For more information about tab-completion using dx, see the Tab Completion section above.

You don't have to be in a project to inspect its contents. You can also look into another project, and a folder within the project, by giving the project name or ID, followed by a colon (:) and the folder path. Here, we list the contents of the publicly available project "Demo Data" using both its name and ID.

$ dx ls "Demo Data:/SRR100022/" SRR100022_1.filt.fastq.gz SRR100022_2.filt.fastq.gz $ dx ls -l "project-BQbJpBj0bvygyQxgQ1800Jkk:/SRR100022/" Project: Demo Data (project-BQbJpBj0bvygyQxgQ1800Jkk) Folder : /SRR100022 State Last modified Size Name (ID)# output truncated for brevity

As shown above, you can use the -l flag in conjunction with dx ls to list more details about files, such as the time a file was last modified, its size (if applicable), and its full DNAnexus ID.

For more information on path syntax, see the Path Resolution page.

Back to top of page

Describe DNAnexus Objects

You can use the dx describe command to learn more about files and other objects on the platform. Given a DNAnexus object ID or name, dx describe will return detailed information about the object in question. dx describe will only return results for data objects to which you have access.

Besides describing data and projects (examples for which are shown below), you can also describe apps, jobs, and users. For more information about the dx describe command, please see the Describing Data Objects page.

Example: Describing a file

Below, we describe the reference genome file for C. elegans, which lives in the "Reference Genome Files" project that we've been using.

$ dx describe "Reference Genome Files:C. Elegans - Ce10/ce10.fasta.gz" Result 1: ID file-BQbY9Bj015pB7JJVX0vQ7vj5 Class file Project project-BQpp3Y804Y0xbyG4GJPQ01xv Folder /C. Elegans - Ce10 Name ce10.fasta.gz State closed Visibility visible Types - Properties Origin=http://hgdownload.cse.ucsc.edu/goldenPath/ce10/bigZ ips/ce10.2bit, Assembly=UCSC ce10, Species=Caenorhabditis elegans, Taxonomy ID=6239 Tags - Outgoing links - Created Tue Sep 30 15:54:35 2014 Last modified Thu Mar 2 09:17:27 2017 archivalState "live" Media type application/x-gzip Size 29.21 MB, sponsored by DNAnexus

Example: Describing a project

Below, we describe the publicly available "Reference Genome Files" project that we've been using.

$ dx describe "Reference Genome Files:" Result 1: ID project-BQpp3Y804Y0xbyG4GJPQ01xv Class project Name Reference Genome Files Summary Billed to org-dnanexus Access level VIEW Region aws:us-east-1 Protected true Restricted false Contains PHI false Created Wed Oct 8 13:42:53 2014 Last modified Thu Mar 2 09:17:27 2017 Data usage 0.00 GB Sponsored data 301.84 GB Sponsored egress 0.00 GB used of 0.00 GB total Tags - Properties - defaultInstanceType "mem2_hdd2_x2"

Back to top of page

Step 3: Create Your Own Project

Now, we'll use the command dx new project to create a new project.

$ dx new project "My First Project" Created new project called "My First Project" (project-xxxx) Switch to new project now? [y/N]: y

The text project-xxxx denotes a placeholder for a unique, immutable project ID. For more information about object IDs, please see the Entity IDs page.

You're now ready to start uploading your data and running your own analyses!

Back to top of page

Step 4: Upload and Manage Your Data

Upload Data

If you have a sample you would like to analyze, you can use the dx upload command or the Upload Agent if you have installed it. You can also download the file small-celegans-sample.fastq, which represents the first 25000 C. elegans reads from SRR070372. We will use this file again later to run through a sample analysis.

NOTE: For uploading multiple or large files, we strongly recommend that you use the Upload Agent; it will compress your files and upload them in parallel over multiple HTTP connections and boasts other features such as resumable uploads. For a general overview of when to use Upload Agent and when to use dx upload, please see the page about uploading to DNAnexus.

The following command uploads the small-celegans-sample.fastq file into the current directory of the current project. The --wait flag tells dx upload to wait until it has finished uploading the data before returning the prompt and describing the result.

$ dx upload --wait small-celegans-sample.fastq [===========================================================>] Uploaded (16801690 of 16801690 bytes) 100% small-celegans-sample.fastq ID file-xxxx Class file Project project-xxxx Folder / Name small-celegans-sample.fastq State closed Visibility visible Types - Properties - Tags - Details {} Outgoing links - Created Sun Jan 1 09:00:00 2017 Created by amy Last modified Sat Jan 1 09:00:00 2017 Media type text/plain Size 16.02 MB
NOTE: If you run the same command but add the flag --brief, only the file ID (in the form of file-xxxx) will be printed to the terminal. Other dx commands will also accept the --brief flag and will also report only object IDs.

Back to top of page

Examine Data

To take a quick look at the first few lines of the file you just uploaded, use the dx head command. By default, it prints the first 10 lines if given a file.

Let's run it on the file we just uploaded and use the -n flag to ask for the first 12 lines (the first 3 reads) of the FASTQ file.

$ dx head -n 12 small-celegans-sample.fastq @SRR070372.1 FV5358E02GLGSF length=78 TTTTTTTTTTTTTTTTTTTTTTTTTTTNTTTNTTTNTTTNTTTATTTATTTATTTATTATTATATATATATATATATA +SRR070372.1 FV5358E02GLGSF length=78 ...000//////999999<<<=<<666!602!777!922!688:669A9=<=122569AAA?>@BBBBAA?=<96632 @SRR070372.2 FV5358E02FQJUJ length=177 TTTCTTGTAATTTGTTGGAATACGAGAACATCGTCAATAATATATCGTATGAATTGAACCACACGGCACATATTTGAACTTGTTCGTGAAATTTAGCGAACCTGGCAGGACTCGAACCTCCAATCTTCGGATCCGAAGTCCGACGCCCCCGCGTCGGATGCGTTGTTACCACTGCTT +SRR070372.2 FV5358E02FQJUJ length=177 222@99912088>C<?7779@<GIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC;6666IIIIIIIIIIII;;;HHIIE>944=>=;22499;CIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIH?;;;?IIEEEEEEEEIIII77777I7EEIIEEHHHHHIIIIIIIIIIIIII @SRR070372.3 FV5358E02GYL4S length=70 TTGGTATCATTGATATTCATTCTGGAGAACGATGGAACATACAAGAATTGTGTTAAGACCTGCATAAGGG +SRR070372.3 FV5358E02GYL4S length=70 @@@@@DFFFFFHHHHHHHFBB@FDDBBBB=?::5555BBBBD??@?DFFHHFDDDDFFFDDBBBB<<410

Back to top of page

Download Data

If you'd like to download a file from the platform, just use the dx download command. This command will use the name of the file for the filename unless you specify your own with the -o/--output flag.

$ dx download small-celegans-sample.fastq [ ] Downloaded 0 byte[===========================================================>] Downloaded 16.02 of[===========================================================>] Completed 16.02 of 16.02 bytes (100%) small-celegans-sample.fastq

For more information about downloading files, please see the Downloading Files page.

Back to top of page

Metadata

When inspecting files, you may notice that there are fields for different metadata for the file, such as "properties", "tags", and "details". For more information about metadata, please see the Managing Metadata page.

Back to top of page

Step 5: Analyze a Sample

For the next few steps, if you would like to follow along, you will need a C. elegans FASTQ file. We will map the reads against the ce10 genome. If you haven't already, you can download and use the following FASTQ file, which contains the first 25000 reads from SRR070372: small-celegans-sample.fastq.

NOTE: If you'd like, you can also substitute your own reads file for a different species (though it may take longer to run through the example). For your convenience, DNAnexus has already imported a variety of reference genomes to the platform. If you have your own FASTA file that you would like to use, you can upload the file and create genome indices for either BWA or Bowtie2 using the BWA FASTA Indexer app or the Bowtie2 FASTA Indexer app (platform login required to access these links).

The following walkthrough is helpful if you would like to understand what all the commands do and take a look at what apps you're running, but if you're just interested in converting a FASTQ file to a VCF file via BWA and the FreeBayes variant caller, then you can skip ahead to the Automate It section, where you can see all the commands necessary for running apps.

Back to top of page

Upload Reads

If you have not yet done so, you can upload a FASTQ file for analysis.

$ dx upload small-celegans-sample.fastq --wait

For more information about using the command dx upload, please see the dx upload page.

Back to top of page

Map Reads

Next, use the BWA-MEM app (platform login required to access this link) to map the uploaded reads file to a reference genome.

Find the app name

First of all, if you don't know the command-line name of the app you would like to run, you have two options:

  1. You can navigate to its web page from the Apps page (platform login required to access this link) on the platform. The app's page will tell you how to run it from the command-line. You can find more information about the app we're running on the BWA-MEM FASTQ Read Mapper page (platform login required to access this link).

  2. Alternatively, you can search for apps from the command line by running the command dx find apps. You will find the name of the app that you can use on the command line in the parentheses (underlined below).

    $ dx find apps ... x BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper), v1.4.0 ...

Install and run the app

Now, let's install the app using dx install and check that it has been installed. While you do not always need to install an app to run it, you may find it useful as a bookmarking tool.

$ dx install bwa_mem_fastq_read_mapper Installed the bwa_mem_fastq_read_mapper app $ dx find apps --installed BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper), v1.4.0

We can now run the app using dx run. We will run it without any arguments; it will then prompt us for required and then optional arguments.

$ dx run bwa_mem_fastq_read_mapper Entering interactive mode for input selection.

Input: Reads (reads_fastqgz) Class: file
Enter file ID or path (<TAB> twice for compatible files in current directory,'?' for help) reads_fastqgz: small-celegans-sample.fastq

Input: BWA reference genome index (genomeindex_targz) Class: file
Suggestions: project-BQpp3Y804Y0xbyG4GJPQ01xv://file-* (DNAnexus Reference Genomes)
Enter file ID or path (<TAB> twice for compatible files in current directory,'?' for more options)
genomeindex_targz: "Reference Genome Files:/C. Elegans - Ce10/ce10.bwa-index.tar.gz"

Select an optional parameter to set by its # (^D or <ENTER> to finish):

[0] Reads (right mates) (reads2_fastqgz) [1] Add read group information to the mappings (required by downstream GATK)? (add_read_group) [default=true] [2] Read group id (read_group_id) [default={"$dnanexus_link": {"input": "reads_fastqgz", "metadata": "name"}}] [3] Read group platform (read_group_platform) [default="ILLUMINA"] [4] Read group platform unit (read_group_platform_unit) [default="None"] [5] Read group library (read_group_library) [default="1"] [6] Read group sample (read_group_sample) [default="1"] [7] Output all alignments for single/unpaired reads? (all_alignments) [8] Mark shorter split hits as secondary? (mark_as_secondary) [default=true] [9] Advanced command line options (advanced_options) Optional param #: <ENTER> Using input JSON: { "reads_fastqgz": { "$dnanexus_link": { "project": "project-B3X8bjBqqBk1y7bVPkvQ0001", "id": "file-B3P6v02KZbFFkQ2xj0JQ005Y" } "genomeindex_targz": { "$dnanexus_link": { "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv", "id": "file-BQbYJpQ09j3x9Fj30kf003JG" } } } Confirm running the applet/app with this input [Y/n]: <ENTER> Calling app-BP2xVx80fVy0z92VYVXQ009j with output destination project-xxxx:/ Job ID: job-xxxx

For more information about creating jobs, please see the Running Apps and Applets page.

Monitor your job

You can use the command dx watch to monitor jobs. The command will print out the log file of the job, including the STDOUT, STDERR, and INFO printouts.

You can also use the command dx describe job-xxxx to learn more about your job. If you don't know the job's ID, you can use the command dx find jobs to list all the jobs run in the current project, along with their status and when they began.

$ dx find jobs * BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main)(done) job-xxxx amy 2017-01-01 09:00:00 (runtime 0:00:27) $ dx describe job-xxxx ...

Please note that there are additional options that you can use to restrict your search of previous jobs, such as by their names or when they were run. See the Search Your Data page for more details.

After your job finishes

You should now see two new files in your project: the mapped reads in a BAM file, and an index of that BAM file with a .bai extension. You can refer to the output file by name or by the job that produced it using the syntax job-xxxx:<output field>. Try it yourself with the job ID you got from calling the BWA-MEM app!

$ dx ls small-celegans-sample.bam small-celegans-sample.bam.bai small-celegans-sample.fastq $ dx describe small-celegans-sample.bam ... $ dx describe job-xxxx:sorted_bam ...

Back to top of page

Call Variants

You can use the FreeBayes Variant Caller app (platform login required to access this link) to call variants on your BAM file.

This time, we won't rely on the interactive mode to enter our inputs. Instead, we will provide them directly. But first, let's look up the app's spec so we know what the inputs are called. For this, let's run the command dx run freebayes -h.

$ dx run freebayes -h usage: dx run freebayes [-iINPUT_NAME=VALUE ...] App: FreeBayes Variant Caller Calls variants (SNPs, indels, and other events) using FreeBayes See the app page for more information: https://platform.dnanexus.com/app/freebayes Inputs: Sorted mappings: -isorted_bams=(file) [-isorted_bams=... [...]] One or more coordinate-sorted BAM files containing mappings to call variants for. Genome: -igenome_fastagz=(file) A file, in gzipped FASTA format, with the reference genome that the reads were mapped against. ...

Optional inputs are shown using square brackets ([]) around the command-line syntax for each input. You'll notice that there are two required inputs that must be specified:

  1. Sorted mappings (sorted_bams): A list of files with a .bam extension.

  2. Genome (genome_fastagz): A reference genome in FASTA format that has been gzipped.

NOTE: You can also run dx describe freebayes for a more compact view of the input and output specifications. By default, it will hide the advanced input options, but you can view them using the --verbose flag.

For more information about running jobs, please see the Running Apps and Applets page.

Run the app with a one-liner using a job-based object reference

It is sometimes more convenient to run apps using a single one-line command. You can do this by specifying all the necessary inputs either via the command line or in a prepared file. We will use the -i flag to specify inputs as suggested by the output of dx run freebayes ‑h:

  • sorted_bams: The output of the previous BWA step (see the Map Reads section for more information).
  • genome_fastagz: The ce10 genome in the Reference Genomes project.

To specify new job input using the output of a previous job, we'll use a job-based object reference) via the job-xxxx:<output field> syntax we used earlier.

NOTE: You can use job-based object references as input even before the referenced jobs have finished. The system will simply wait until the input is ready to begin the new job.

Replace the dummy job ID below with that generated by the BWA app you ran earlier. The -y flag skips the input confirmation.

$ dx run freebayes -y \ -igenome_fastagz=Reference\ Genome\ Files:/C.\ Elegans\ -\ Ce10/ce10.fasta.gz \ -isorted_bams=job-xxxx:sorted_bam Using input JSON: { "genome_fastagz": { "$dnanexus_link": { "project": "project-xxxx", "id": "file-xxxx" } }, "sorted_bams": { "field": "sorted_bam", "job": "job-xxxx" } } Calling app-BFG5k2009PxyvYXBBJY00BK1 with output destination project-xxxx:/ Job ID: job-xxxx

For more information about running jobs, please see the Running Apps and Applets page.

Automatically run a command after a job finishes

You can use the command dx wait to wait for a job to finish. If we run the following command right after running the Freebayes app, it will show you the recent jobs only after the job has finished.

$ dx wait job-xxxx && dx find jobs Waiting for job-xxxx to finish running... Done *FreeBayes Variant Caller (done) job-xxxx amy 2017-01-01 09:00:00 (runtime 0:05:24) ...

Congratulations! You have now called variants on a reads sample, and you did it all on the command line. Now let's look at how you can automate this process.

Back to top of page

Automate It

The beauty of the command-line interface is the ability to automate processes. In fact, we can automate everything we just did. The following script assumes that you've already logged in and is hardcoded to use the ce10 genome and takes in a local FASTQ file as its command-line argument.

#!/bin/bash # Usage: <script_name.sh> local_fastq_filename.fastq reference="Reference Genome Files:/C. Elegans - Ce10/ce10.fasta.gz" bwa_indexed_reference="Reference Genome Files:/C. Elegans - Ce10/ce10.bwa-index.tar.gz" local_reads_file="$1" reads_file_id=$(dx upload "$local_reads_file" --brief) bwa_job=$(dx run bwa_mem_fastq_read_mapper -ireads_fastqgz=$reads_file_id -igenomeindex_targz="$bwa_indexed_reference" -y --brief) freebayes_job=$(dx run freebayes -isorted_bams=$bwa_job:sorted_bam -igenome_fastagz="$reference" -y --brief) dx wait $freebayes_job dx download $freebayes_job:variants_vcfgz -o "$local_reads_file".vcf

Back to top of page

What's Next

You're now ready to start scripting using dx. As shown in some of the examples above, the --brief flag can come in handy for scripting, and you may find some other common flags useful as well.

For more information about scripting, please see the Advanced Scripting page.

For more detailed information about running apps and applets from the command line, see the Running Apps and Applets page.

For a comprehensive guide to the DNAnexus command line client, see the Command Line Client Reference and Index of dx Commands pages.

Want to start writing your own apps? Check out the Developer Portal for some useful tutorials.

Back to top of page

Last edited by Samantha Zarate, 2017-10-12 07:20:29

 Feedback