Developer Tutorials/Python/Python

This tutorial will walk you through the process of building a simple Python app on the DNAnexus Platform. This tutorial will demonstrate:

  • Writing, deploying, running, and monitoring apps in Python
  • Using the DNAnexus Platform APIs to represent and store your data

This tutorial assumes that you have already installed the DNAnexus SDK and worked through Intro to Building Apps. Refer back to that tutorial as necessary.

Before you begin

To initialize the SDK environment, open your command line terminal, navigate to the directory where you extracted the SDK (for example, /home/Bart/Downloads/dx-toolkit), and type:

$ source environment

This will place the DNAnexus client scripts in your executable PATH, and the Python DNAnexus libraries in the Python library path.

Next, type:

$ dx login

to log in to the platform and select a project to work in.

Source code for the example apps used in this tutorial can be found in the doc/examples/dx-apps directory of the SDK. You can also browse the example programs on Github.

Revisiting the quality trimmer

We'll start by recreating the quality trimmer app from Intro to Building Apps in a more idiomatic Python app.

Run the command-line DNAnexus App Wizard (dx-app-wizard). The App Wizard can assist you in creating Python apps, so we'll use it here:

$ dx-app-wizard ⋮ App Name: python_trimmer_example ⋮ (<ENTER> to accept defaults) Input Specification You will now be prompted for each input parameter to your app. Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number. 1st input name (<ENTER> to finish): input_name Label (optional human-readable name) []: Input file Your input parameter must be of one of the following classes: applet array:file array:record file int array:applet array:float array:string float record array:boolean array:int boolean hash string Choose a class (<TAB> twice for choices): file This is an optional parameter [y/n]: n 2nd input name (<ENTER> to finish): <ENTER> Output Specification You will now be prompted for each output parameter of your app. Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number. 1st output name (<ENTER> to finish): output_file Label (optional human-readable name) []: Output file Choose a class (<TAB> twice for choices): file 2nd output name (<ENTER> to finish): <ENTER> Template Options You can write your app in any programming language, but we provide templates for the following supported languages: Python, bash Programming language [Python]: <ENTER> ⋮ Execution pattern [basic]: <ENTER>

Take a look at the generated files

Let's open up the generated metadata file, dxapp.json. The run specification specifies what code your app is to run and how it should be invoked. In this case the runSpec.file field refers to a file src/ The specified file is executed whenever you run your app.

This file was automatically generated by dx-app-wizard; you can see that it includes a skeleton that handles retrieving your input files from the platform to the local filesystem, and uploading the output files after your analysis has run.

Under the line that says "Fill in your application code here", add the following line to do your analysis:

subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

Also import the subprocess module (just add import subprocess underneath the other imports at the top of the file).

Your file will now look like the following:

#!/usr/bin/env python
# python_trimmer_example 1.0.0
# Some comments have been abbreviated here; create an app using dx-app-wizard
# or look in dx-toolkit/doc/examples/dx-apps/python_trimmer_example to read
# the comments in full.

import os
import dxpy
import subprocess

def main(input_name):

    # Create DXDataObject handlers for the input object(s).

    input_name = dxpy.DXFile(input_name)

    # Download the file to the local filesystem.

    dxpy.download_dxfile(input_name.get_id(), "input_name")

    # Fill in your application code here.

    subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

    # Upload the output file (presumed to now exist at the path output_file)
    # back to the platform.

    output_file = dxpy.upload_local_file("output_file");

    # Returns a reference to the file object we just created.

    output = {}
    output["output_file"] = dxpy.dxlink(output_file)

    return output

The app inputs are listed as keyword arguments to the main entry point function, which is executed when you run the app. The return value of this function should be a hash that contains the names and values of your app's output parameters.

  • Inputs that are DNAnexus data objects are represented as dicts containing DNAnexus links. These can be passed as inputs to a handler class to construct a handler object (such as dxpy.DXFile(input_name) above, or with dxpy.get_handler()), or reduced to the string containing the object ID: input_name['$dnanexus_link'].
  • Inputs of primitive classes (int, float, string, boolean, or hash) are given directly as the corresponding Python data types.
  • Outputs that are data objects should be given as DNAnexus links, which can be constructed from handler objects or ID strings using dxpy.dxlink(). Outputs of primitive classes should be given using their Python data types.

Intro to Building Apps, to complete your app, download the FASTX-Toolkit (courtesy of the Hannon Lab), extract it, and put the fastq_quality_trimmer executable into the resources/usr/bin subdirectory of your app directory. Also, download the sample reads file we've provided and upload it to a project, if you haven't already:

$ dx upload small-celegans-sample.fastq

Building and running on the Platform

Now let's upload your app to the DNAnexus Platform. In the app directory, run:

$ dx build .

When loading your app the second and subsequent times, also pass the --overwrite or -f flag to request the removal of old versions of your app.

Now we'll run the app on the platform, instantiating a new job. When your job has successfully been enqueued, dx run prints out a job ID you can use to track the progress of your job.

$ dx run python_trimmer_example -iinput_name=small-celegans-sample.fastq # Inspect the input parameters and press ENTER to confirm... ⋮ Calling applet-121212121212 with output destination project-343434343434:/ Job ID: job-123456654321

During or after the execution of your job, you can check its status with dx describe JOB_ID. Its output will show the outputs of the job once the job has finished (if successful).

$ dx describe job-123456654321 Result 1: ID job-123456654321 ⋮ State running ⋮ Input input_file = project-343434343434:file-567856785678 Output -

Congratulations! You've run your first app on the DNAnexus platform.

Many common bioinformatics pipelines can be represented by steps that each have the pattern illustrated above (which is generally the easiest way to take a preexisting analysis and make it run as a DNAnexus app or applet):

  • Download inputs from the platform using the API bindings and save them to local files in your execution container.
  • Shell out to a subprocess to run whatever analysis you like, producing local files as output.
  • Upload outputs from the local files you've produced back into the platform, again using the API bindings.

Next Steps

he Developer Portal contains additional resources, including discussion of more advanced concepts, such as parallelizing your workload across multiple instances or requesting additional computational resources.

Last edited by Thanh-Ha Nguyen, 2016-06-24 23:25:08