Developer Tutorials/Intro to Building Apps

Ready to start running your own custom analyses in the cloud? We'll walk you through the process of packaging an existing executable as an applet that you can run from the website or from the command line to process a file in your project.

Applets vs Apps

Applets and apps are both executables that can be run in the platform. The main difference is that applets are data objects, which live in projects, while apps do not live in a project and can be published to allow other users to run them.

The procedure for creating applets or apps is essentially the same, with the exception of the final build step where you specify whether it should be created as an applet or an app. (You can also convert an existing applet to an app.) As a result, we may use the two terms interchangeably in tutorials like this one.

Overview

In this tutorial, we will upload the fastq_quality_trimmer program from the FASTX-Toolkit to the platform as an applet called "mytrimmer". If we did not upload the program, we would have to run the analysis on a local machine by downloading the file to our local hard drive (1), running the executable locally (2), and then uploading the result back into our project (3), where a new DNAnexus file object is created in our project (4).

Of course, it'd be much better if we didn’t have to tie up our own computer's CPU and network bandwidth, and instead perform this analysis in the cloud. The next steps will cover how to run the equivalent commands in the cloud instead.

There are two key CLI tools that will be helpful: dx-app-wizard and dx build. We will run dx-app-wizard to create a skeleton directory for our applet, after which we can fill in the code (1). Next, we will run dx build (2), which will take the generated files and package them as an applet on the platform where it will reside as a data object in your project (3).

Step 0: Download sample files

Note: Before we begin, you should first download the DNAnexus SDK and run through the Command-Line Quickstart if you haven't already.

To follow along exactly, you should download the following two files.

We will be using the fastq_quality_trimmer found in the excellent FASTX-Toolkit (more details on installation at http://hannonlab.cshl.edu/fastx_toolkit/download.html; the link above is the "Linux (64bit)" binary). You can substitute it with any executable of your own as long as it can run on 64-bit Ubuntu 12.04, and it takes in a local file and outputs another local file.

The FASTQ file provided above contains reads from the small C. elegans sample (the first 25000 reads from SRR070372) also used in the Command-Line Quickstart. You can also substitute your own input file in FASTQ format if you wish.

You can test out the fastq_quality_trimmer on the sample on your local machine. Note that if you are not running 64-bit Linux, you should download the appropriate distribution from the FASTX-Toolkit download page to run it locally. The command we will run is:

$ fastq_quality_trimmer -t 20 -Q 33 -i small-celegans-sample.fastq \ -o trimmed_reads.fastq

Step 1: Create the applet source code

Step 1.1: Run the app wizard

The app wizard is a tool which will generate a local directory and a source code template for your applet after prompting you with questions about the applet you would like to build (e.g. the name, which we will set to "mytrimmer"). We will create a simple applet that will just execute a few shell commands, so we will pick the bash scripting language. For now, don't worry about the rest of the parameters; they are either not required or have safe default values. Just press <ENTER> for the rest of the prompts; other input you should enter is shown in blue below.

$ dx-app-wizard DNAnexus App Wizard, API v1.0.0 [...] The name of your app must be unique on the DNAnexus platform. After creating your app for the first time, you will be able to publish new versions using the same app name. App names are restricted to alphanumeric characters (a-z, A-Z, 0-9), and the characters ".", "_", and "-". App Name: mytrimmer [...] (Press <ENTER> to accept defaults) Input Specification You will now be prompted for each input parameter to your app. Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number. 1st input name (<ENTER> to finish): input_file Label (optional human-readable name) []: <ENTER> Your input variable must be of one of the following classes: applet array:file array:record file int array:applet array:float array:string float record array:boolean array:int boolean hash string Choose a class (<TAB> twice for choices): file This is an optional parameter [y/n]: n 2nd input name (<ENTER> to finish): <ENTER> Output Specification You will now be prompted for each output parameter of your app. Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number. 1st output name (<ENTER> to finish): output_file Label (optional) []: <ENTER> Choose a class (<TAB> twice for choices): file 2nd output name (<ENTER> to finish): <ENTER> Template Options You can write your app in any programming language, but we provide templates for the following supported languages: Python, bash Programming language [Python]: bash The following common execution patterns are currently available for your programming language: basic Your app will run on a single machine from beginning to end. parallelized Your app will subdivide a large chunk of work into multiple pieces that can be processed in parallel and independently of each other, followed by a final stage that will merge and process the results as necessary. scatter-process-gather Similar to parallelized but with the addition of a "scatter" entry point. This allows you to break out the execution for splitting up the input, or you can call a separate app/applet to perform the splitting. Execution pattern [basic]: <ENTER> *** Generating DNAnexus App Template... *** [...] Created files: mytrimmer/Readme.developer.md mytrimmer/Readme.md mytrimmer/dxapp.json mytrimmer/resources/ mytrimmer/src/ mytrimmer/src/mytrimmer.sh mytrimmer/test/ App directory created! See http://wiki.dnanexus.com/Developer-Portal for tutorials on how to modify these files, or run "dx build mytrimmer" or "dx build --create-app mytrimmer" while logged in with dx. Running the DNAnexus build utility will create an executable on the DNAnexus platform. Any files found in the resources directory will be uploaded so that they will be present in the root directory when the executable is run.

Step 1.2: Add your executable

Within the platform, executables run on a virtual Linux PC with a stock Ubuntu 12.04 environment. Thus, the FASTX-Toolkit (or your custom executable) is not available there by default. To make it available to your applet, put a copy of the executable inside mytrimmer/resources/usr/bin/.

$ mkdir -p mytrimmer/resources/usr/bin/ $ cp /path/to/fastq_quality_trimmer mytrimmer/resources/usr/bin/

When you use the DNAnexus build utility to build your applet, any files in the mytrimmer/resources directory will be packaged as part of your applet and will be placed in the root directory of the virtual Linux PC whenever your applet is run in the cloud. Files placed in mytrimmer/resources/usr/bin will therefore be put in /usr/bin in the container and be available in the default path at runtime. (Note: while the mytrimmer/resources subdirectory is unpacked into the root of the virtual filesystem, your applet's executable will later start in /home/dnanexus as its current working directory.)

Step 1.3: Fill in the shell script

If you look inside the main() block in mytrimmer/src/mytrimmer.sh, you'll see that some of the code has been filled in for you. In the code block below, we highlight the crucial lines with pared-down comments changed for the purposes of this tutorial. You should insert the line to run your executable on input_file once it has been downloaded and write your output to output_file.

#!/bin/bash

main() {

    # When the applet is run, the variable "input_file" is already set
    # to the DNAnexus link to the file object. Here, we download it to the
    # job's scratch space

    dx download "$input_file" -o input_file

    # Insert the following line between the download and upload lines

    fastq_quality_trimmer -t 20 -Q 33 -i input_file -o output_file

    # Here, we set the variable "output_file" to be the ID of the
    # uploaded file.

    output_file=$(dx upload output_file --brief)

    # This line reports the uploaded file ID under the output field
    # called "output_file".

    dx-jobutil-add-output output_file "$output_file" --class=file
}
# Step 2: Build the applet Now it's time to build your applet and upload it to the platform. Run "dx build" on your `mytrimmer` directory:
$ dx build mytrimmer [...] Created applet applet-B30XFpZfK1p8gFPPqjbQ005P successfully

If you run dx describe on the applet, you will see that it has been created with the name "mytrimmer" and has input and output specifications, each expecting one file. It also has a file in its bundledDepends, which contains the fastq_quality_trimmer executable that you previously placed in the mytrimmer/resources directory.

$ dx describe mytrimmer Result 1: ... Name mytrimmer ... Input Spec input_file (file) Output Spec output_file (file) Interpreter bash bundledDepends resources.tar.gz (file-B42KQ3pqqBkGJz8B3J900049) ...

You can also run dx run mytrimmer ‑h to see automatically generated help for the applet you've just created.

$ dx run mytrimmer -h ...
Congratulations! You have created your first applet and are ready to run it in the DNAnexus Platform.

Step 3: Run your applet

You are now ready to launch the analysis in the cloud, using the "dx run" command (4). Notice that the "mytrimmer" applet resides inside your project in the cloud, so you can really launch it from anywhere. When you launch the analysis, DNAnexus will bring up a new virtual Linux PC to run your code (5).

Upload the sample input file if you haven't already, and then run the applet.

$ dx upload small-celegans-sample.fastq ... $ dx run mytrimmer -iinput_file=small-celegans-sample.fastq -y
That's it! You've now created your first applet with and run it on the platform.

Behind the scenes...

Once DNAnexus instantiates the virtual execution environment, it will automatically run your code, executing the commands in the shell script you provided when building the applet. The script will run just like on your local computer, downloading the reads to the hard drive of that virtual computer (6), running the FASTX-Toolkit executable (7), then uploading the resulting file (8) back to the project (9).

Next Steps

Advanced applet options

If you look inside the file mytrimmer/dxapp.json, you will find a JSON hash containing the metadata that the build utility uses to package the applet for the platform. If you wish to change the inputs or outputs of your applet, or request additional execution resources (e.g., network access or more CPU or memory), this is the file to edit before running "dx build". Click here to get more information on how to request additional resources. The following shows the parameters automatically set by the app wizard.

{
  "name": "mytrimmer",
  "dxapi": "1.0.0",
  "version": "0.0.1",
  "resources": [],
  "inputSpec": [
    {
      "name": "input_file",
      "class": "file",
      "optional": false
    }
  ],
  "outputSpec": [
    {
      "name": "output_file",
      "class": "file"
    }
  ],
  "runSpec": {
    "interpreter": "bash",
    "file": "src/mytrimmer.sh"
  }
}

Other app wizard templates

When running dx-app-wizard above, you selected the "basic" execution template. Try the other available templates to see simple examples of how to parallelize your execution over multiple machines in the cloud by using additional entry points. You can also choose other programming languages where you can start using some of the DNAnexus client libraries. While the command-line client dx provides quite a bit of functionality, the client libraries can provide a richer experience for programmatically accessing and modifying data on the platform directly in the programming language of your choice.

Next steps

See the Advanced App Tutorial to get a better understanding of the app directory structure and how to manually modify app inputs, outputs, and metadata.

If you're curious about what happens in the cloud, the Execution Environment Reference is a complete reference to what happens when a job is run.

The Job Lifecycle page describes the progression of a job's states and discusses the reasons a job may fail.

The Developer Portal contains additional resources, including discussion of more advanced concepts (such as parallelizing your workload across multiple instances) and links to example source code for your reference and inspiration.

Last edited by Thanh-Ha Nguyen, 2016-06-24 23:25:08

 Feedback