Developer Tutorials/Advanced App Tutorial

In this tutorial, we will be introducing some advanced features for writing DNAnexus apps/applets in bash. We will be writing a simple bash applet to take an arbitrary number of BAM files, use Sambamba to merge them, then output the resulting merged BAM.

Sambamba is an open source toolkit for efficiently working with BAM data. For more information, please visit the Sambamba website and documentation (external link).

Note: Before we begin, you should first download the DNAnexus SDK and run through the Command-Line Quickstart if you haven't already. If this is your first time writing a DNAnexus app, we recommend you first go through the Intro to Writing Apps Tutorial before diving into this one.

Creating the app directory

When you create a DNAnexus app, you must first create the local directory for the app source code and resources. We will be creating the following local directory structure:

sambamba_merge_applet/
├── dxapp.json
├── Readme.md
├── resources/
│   └── usr/
│       └── bin/
│
└── src/
    └── script.sh
If you are unfamiliar with bash, you can use the following commands to set up the directory and subdirectories:
$ mkdir -p sambamba_merge_applet/resources/usr/bin
$ mkdir sambamba_merge_applet/src

The dxapp.json

Next, use your favorite text editor to write the dxapp.json. This file should be located in the root directory of the app directory as shown in the structure above.

The dxapp.json is a DNAnexus application metadata file. Its presence in a directory tells DNAnexus tools that it contains DNAnexus applet source code. We explain selected fields of this file below.

In this file, we specify that our applet will be named sambamba_merge_applet (field: name). Under the inputSpec field, we specify that the app will take in 2 inputs:

  1. sorted_bams: an array of BAM files
  2. advanced_options: an optional string of advanced command line options to be passed to the Sambamba merge command.

Under the outputSpec field, we specify that the app will always return 1 output:

  1. merged_bam: a single merged BAM file

Additionally, we specify that the sorted_bams input and merged_bam output should contain filenames that match the pattern "\*.bam". This specification tells the web UI to filter only files which match this pattern when selecting input files.

Next, we specify under runSpec that this is a bash script (field: interpreter) and that the worker running the applet should execute the executable located in the applet directory at src/script.sh (field: file).

Finally, under runSpec, systemRequirements, *, instanceType, we specify that all entry points of the applet should be run with the mem2_hdd2_x2 instance type.

For more information about the fields and options for the dxapp.json, please visit the App Metadata Reference page.

{ "name": "sambamba_merge_applet", "title": "Sambamba Mappings Merger", "summary": "Uses Sambamba to merge multiple sorted BAM files into a single BAM file", "version": "0.0.1", "inputSpec": [ { "name": "sorted_bams", "label": "Sorted mappings", "help": "A set of coordinate-sorted BAM files to be merged.", "class": "array:file", "patterns": ["*.bam"] }, { "name": "advanced_options", "label": "Advanced command line options", "help": "Advanced command line options that will be supplied directly to the Sambamba merge execution.", "class": "string", "optional": true } ], "outputSpec": [ { "name": "merged_bam", "label": "Merged sorted mappings", "help": "A BAM file with the merged mappings.", "class": "file", "patterns": ["*.bam"] } ], "runSpec": { "interpreter": "bash", "file": "src/script.sh", "systemRequirements": { "*": { "instanceType": "mem2_hdd2_x2" } }, "execDepends": [] }, "openSource": true }

Packaging the executable with the applet

In this applet, we will be using the Sambamba binary, which you can download from the Sambamba releases page. You can download this binary, uncompress the executable, and place it in the resources/usr/bin/ directory of your app directory.

After downloading the binary, run the following commands:

# Navigate to your applet root directory
cd /path/to/app/directory

# Untar the downloaded executable
tar -xzf /path/to/downloaded/sambamba_executable

# Rename and move the executable to the correct directory
# Note: if you don't rename the executable, make sure the 
#       app source code uses the full name of the downloaded
#       sambamba executable.
mv sambamba_* resources/usr/bin/sambamba

For a refresher for why this resource needs to be packaged with the app, please see this section in the Intro to Building Apps or see the Execution Environment Reference.

Writing the applet script

Next, we will be writing the script that the worker will execute when the applet is invoked. This file will named script.sh and located in the applet directory at the path src/script.sh. This location is important as this is the location specified in the dxapp.json above.

Note: Any lines you need to add to your script will be shown in blue

Script setup

The first few lines of the bash script specify where the bash interpreter can be found on the system and specify a couple settings for the execution of the script. The -e flag causes bash to exit at any point if there is any error, the -o pipefail flag tells bash to throw an error if it encounters an error within a pipeline, while the -x flag causes bash to output each line as it is executed -- useful for debugging

#!/bin/bash set -e -x -o pipefail

Downloading inputs using dx-download-all-inputs

You can easily download all file inputs to your applet with the dx-download-all-inputs command-line utility. Add this line to your script.sh:

dx-download-all-inputs

This utility will automatically download all the files supplied as input to the applet into the path $HOME/in/. Each file input parameter specified under inputSpecs in the dxapp.json will have its own folder under the $HOME/in/ directory.

In the case of this applet, there will be one folder for the sorted_bams input on the path $HOME/in/sorted_bams/. Since sorted_bams is an array of files, these files will be placed into numbered subdirectories under a parent directory $HOME/in/sorted_bams/.

For example, if the user supplied the following 3 files to the applet, SRR100022_chrom20_mapped_to_b37.bam, SRR100022_chrom21_mapped_to_b37.bam, SRR100022_chrom22_mapped_to_b37.bam, in that order, the files would be downloaded into the following paths respectively:

  • $HOME/in/sorted_bams/0/SRR100022_chrom20_mapped_to_b37.bam
  • $HOME/in/sorted_bams/1/SRR100022_chrom21_mapped_to_b37.bam
  • $HOME/in/sorted_bams/2/SRR100022_chrom22_mapped_to_b37.bam.

The following is a visualization of this example structure:

$HOME
├── in
│   └── sorted_bams
│       ├── 0
│       │   └── SRR100022_chrom20_mapped_to_b37.bam
│       ├── 1
│       │   └── SRR100022_chrom21_mapped_to_b37.bam
│       └── 2
│           └── SRR100022_chrom22_mapped_to_b37.bam...

Prep your output folder

Next, create a folder for your output file.

mkdir -p out/merged_bam

We just made a directory with the path $HOME/out/merged_bam, which corresponds to the merged_bam output parameter in the dxapp.json. Later, we will place the output of Sambamba merge, a merged BAM file, into this subdirectory.

Later, at the end of the bash script, we will call the dx-upload-all-outputs. This utility will automatically upload all files found on the path $HOME/out/ and link the files to the appropriate output parameter (the outputs specified under outputSpecs in the dxapp.json).

By convention, only directories with names equal to output parameter names are expected to be found in the output directory, and any file(s) found in those subdirectories will be uploaded as the corresponding outputs.

In our case, the merged BAM file placed into the path $HOME/out/merged_bam/ will be uploaded as the merged_bam output parameter of the job.

Note: The execution of an applet on a worker starts inside $HOME, so in this tutorial $HOME/in, ~/out, and out/ are all the same since we have not changed directories.

Set output name using bash app helper variables

DNAnexus has provided some environment variables to make it even simpler to write bash apps. Here, we will use the $sorted_bams_prefix variable to help us name our output file. This variable is provided for every file or array:file input parameter specified in the applet's dxapp.json.

In this case, our only file input parameter is sorted_bams, an array:file. The variable $sorted_bams_prefix is a bash array of filenames of every file in the file array with the extension stripped off, as well as any .gz extension (if applicable).

For example, given the inputs outlined above, the first item in the bash array $sorted_bams_prefix[0] will be NA12878.chr1, the secound item $sorted_bams_prefix[1] will be NA12878.chr2, etc.

We will use the prefix of the first file prefix in the array to name our output file.

output_name="${sorted_bams_prefix[0]}_merged.bam"

For more information about bash app helper variables, see some sample code here.

Set input files using bash app helper variables

Next, we will run Sambamba merge. The syntax for Sambamba merge is as follows:

sambamba merge [OPTIONS] <output.bam> <input1.bam> <input2.bam> [...]

Add the following lines to your script.sh:

sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"

[OPTIONS] We pass any advanced_options string the user may have entered as input to the app. The string was stored as the variable $advanced_options during app initialization.

<output.bam> We will name the output file according the the $output name bash variable set in the section above.

<input1.bam> <input2.bam> [...] Finally, we can use the $sorted_bams_path variable to help us pass the input files to the executable.

Similar to the prefix variable explained above, a path bash app helper variable is provided for every file or array:file input parameter specified in the applet's dxapp.json. This bash variable stores the full path of each input file, assuming that the file was downloaded using dx-download-all-inputs.

Since our sorted_bams input is of type array:file, the $sorted_bams_path variable is a bash array containing the file paths of the files given as input to sorted_bams, in the order they were given to the app. "$sorted_bams_path[@]" represents the array as a string, properly tokenized for any whitespace.

When the shell script is run, bash will automatically interpret all the variables in the command. For example, if we have 3 input files named NA12878.chr1.bam, NA12878.chr2.bam, and NA12878.chr3.bam, the interpreted sambamba merge command will look like this:

sambamba merge $HOME/out/bam_output/part_0_merged.bam $HOME/in/sorted_bams/0/NA12878.chr1.bam $HOME/in/sorted_bams/1/NA12878.chr2.bam $HOME/in/sorted_bams/2/NA12878.chr3.bam -out 

Alternatively: The following command also works.

sambamba merge $advanced_options $output_folder/$output_name $HOME/in/sorted_bams/*/*

Upload results with dx-upload-all-outputs

Finally, after Sambamba merge is finished, we move the merged bam file into the out/merged_bam/ folder to be uploaded using dx-upload-all-outputs. This utility will upload the contents of the subdirectories on the path $HOME/out/.

mv $output_name out/merged_bam/ dx-upload-all-outputs

For more information on how this utility works, please go back to the Prep Your Output Folder section above or visit the dx-upload-all-outputs page.

Build and run the applet

At this point, you are done with your app source code. The final script should look like this:

#!/bin/bash
set -e -x -o pipefail

dx-download-all-inputs
mkdir -p out/merged_bam
output_name="${sorted_bams_prefix[0]}_merged.bam"
sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"
mv "$output_name" out/merged_bam/
dx-upload-all-outputs
You are ready to build and run your applet using the following commands. If you have not done so already, login to DNAnexus in your terminal and select a project you wish to work in.

You can either upload your own BAM files to merge, or use the example BAM files available in the Demo Data public project (Developer Quickstart folder). If you upload your own data, we recommend you first test your app with small files.

$ dx build path/to/app/directory
$ dx run sambamba_merge_applet

Congratulations! You are done with the advanced bash app tutorial.

The dx-app-wizard

For the sake of this tutorial we manually created the applet local directory, dxapp.json, and shell script (src/script.sh). However, you can automate this step by using the dx-app-wizard as explained in the Intro to Building Apps tutorial.

The dx-app-wizard will prompt you for inputs, and automatically creates the dxapp.json based on your answers and a template file for your shell script. However, the app wizard was not intended to be a tool for the advanced developer. Thus, it does not prompt you for more advanced fields in the applet specification such as patterns, and instanceType. Additionally, it does not use the dx-download-all-inputs or dx-upload-all-outputs utilities.

We have found it useful for advanced developers to use the app wizard to create an app directory template, and basic dxapp.json. Afterwards, you can then go back in and add additional fields to the dxapp.json and replace the template bash script with your own.

Next Steps

  • See the sidebar at the top of this page for language-specific tutorials in a variety of programming languages. These will walk you through writing more complex apps in the language of your choice.
  • If you would like to see more example code, you can use the dx get command to reconstruct and download the source directory for applets within the Developer Applets project.
  • Explore the Developer Portal for additional resources

Last edited by Maria Nattestad, 2017-09-13 22:42:21

 Feedback