Developer Tutorials/Sample Code

Click one of these buttons to switch between languages for the entire page.

In the code snippets that follow, we will assume that the following standard client libraries have been properly included (as applicable).

import dxpy

For more details, see the full documentation on the dxpy library.

For bash, nothing special is required if running as a job in the cloud. If running client-side, don't forget to source the file dx-toolkit/environment.

Code Snippets

Handler Classes

Data object handlers

Data object handlers are classes that store some state about a data object on the platform and have some useful methods associated with them.

# You can initialize a handler for an existing data object using
# either a string ID or a DNAnexus link.  Note: no error will be
# thrown at initialization time for malformed IDs or
# nonexistent/inaccessible data objects.
dxfile = dxpy.DXFile("file-xxxx")
dxfile2 = dxpy.DXFile({"$dnanexus_link": "file-xxxx"})

dxrecord = dxpy.DXRecord(id_or_link)
dxapplet = dxpy.DXApplet(id_or_link)

# There is also a function for guessing the handler based on the
# string ID; both of these will return dxpy.DXRecord objects
dxhandler = dxpy.get_handler(id_or_link)

Other handlers

There are also other handlers for data containers, apps, and jobs.

# There are handlers for both projects and containers; DXProject
# inherits from DXContainer.
dxproject = dxpy.DXProject("project-xxxx")
workspace_handler = dxpy.DXContainer(dxpy.WORKSPACE_ID)

# App handlers can be initialized in the following ways
#  by ID:
dxapp = dxpy.DXApp(dxid="app-xxxx")
#  by name (default alias is then "default"):
dxapp2 = dxpy.DXApp(name="bwa")
#  by name and version or tag:
dxapp3 = dxpy.DXApp(name="bwa", alias="1.0.0")
dxapp3 = dxpy.DXApp(name="bwa", alias="testing")

# Job handlers are usually created when you run an app or call an
# entry point, but they can also be initialized via ID.
dxjob = dxpy.DXJob("job-xxxx")

Refer to data by name

# The following raises dxpy.exceptions.DXSearchError if no data object
# named "filename" in the current project can be found and otherwise
# returns a data object handler; no error is raised if there are
# actually more results
dxhandler = dxpy.find_one_data_object(name="filename",
                                      project=dxpy.WORKSPACE_ID,
                                      return_handler=True)

# The following will return None if no results are found
dxproject = dxpy.find_one_project(name="projectname",
                                  return_handler=True,
                                  zero_ok=True)

# The following will raise an error if >1 results are found
dxpy.find_one_project(name="projectname", more_ok=False)

A full discussion of the path syntax used by dx can be found here.
dx ls file-xxxx
dx ls folder/path/to/filename
dx ls project-xxxx:file-xxxx
dx ls project-name:folder/path/to/filename

General Data Object Handling

Retrieve metadata

# The following methods are available without creating an object
# handler first.
dxpy.describe(id_or_link)
dxpy.get_details(id_or_link)

# Otherwise, the following methods are available for object handlers
dxhandler.describe()
dxhandler.get_id()
dxhandler.get_proj_id()
dxhandler.get_details()
dxhandler.get_properties()

# Print human-readable output
dx describe file-xxxx
dx describe project-name-or-id:file-xxxx
dx describe project-name-or-id:/path/to/filename

# Print output in JSON
dx describe PATH --json

# Print just the JSON *details*
dx get_details PATH

Modify metadata

# The following methods can only be called when the object is in the
# "open" state.
dxhandler.add_types(["UploadSentinel", "BAM"])
dxhandler.remove_types(["BAM"])
dxhandler.set_details({"mappings": {"$dnanexus_link": "file-xxxx"}})
dxhandler.hide()
dxhandler.unhide()

# The following methods can be called in any state.
dxhandler.rename("newname")
dxhandler.set_properties({"propkey": "propval", "2nd-key": "2nd-val"})
# Unset a property by setting it to None
dxhandler.set_properties({"propkey": None})
dxhandler.add_tags(["tag0", "anothertag", "foo"])
dxhandler.remove_tags(["anothertag", "foo"])

# The following commands can only be used when the object is in the
# "open" state
dx add_types path Reads LetterReads
dx remove_types path Reads
dx set_details path '{"mappings": {"$dnanexus_link": "file-xxxx"}}'
dx set_visibility path hidden
dx set_visibility path visible

# The following commands can be used in any state.
dx tag path tag0 anothertag foo
dx untag path anothertag foo
dx rename path newname
dx set_properties path propkey=propval 2nd-key=2nd-val
dx unset_properties path propkey

Close a data object

You must close a data object before it can be copied between projects or exported as part of the final output of an app or applet.

dxhandler.close()

dx close file-id-or-path

Move data objects between folders

Moving data refers to moving it within the same project or container.

# Move an individual data object using its handler method
dxhandler.move(folder="/path/to/destination/folder")

# Move multiple objects and/or folders in the same container using the
# container handler
dxcontainer.move(destination="/path/to/destination/folder",
                 objects=["record-xxxx", "file-xxxx"],
                 folders=["/path/to/some/other/folder"])

# Move multiple data objects and/or folders into another folder
dx mv record-id-or-path file-id-or-path /path/to/some/other/folder \
  /path/to/destination/folder

# Like mv, dx mv can also be used for renaming
dx mv object-name newname

Delete data objects

# Without an existing handler:
dxpy.remove(id_or_link)

# With an existing handler:
dxhandler.remove()

dx rm path
dx rm -r folder/

Clone data objects

As mentioned previously, data objects must be in the closed state before they can be cloned (copied) between projects or exported as part of the final output of an app or applet. By default, hidden links will be cloned as well. It is not possible to copy data objects to make a new copy in the same project or container.

dxcontainer.clone(container="project-xxxx",
                  destination="/destfolder",
                  objects=["file-xxxx", "workflow-xxxx"],
                  folders=["/some/folder/to/clone"],
                  include_hidden_links=True)

dx cp file-id-or-path \
  /some/folder/to/clone project-xxxx:/destfolder

Files

Upload a local file

Use the following commands to upload a local file. Files uploaded will not be automatically uploaded to the parent project (the project from which the app or applet was called). To upload to parent project, you must specify a file as an output of the app/applet. For more information about app/applet output handling, please see the Handle Job Input and Output section below.

dxpy.upload_local_file("local_filename.fastq")

Uploading a single file

You can upload a local file using dx-upload

# To upload a local file.
dx upload local_filename.fastq

Uploading all your outputs

Alternatively, if you place the files you wish to upload as outputs to the app into subdirectories of the path $HOME/out as described below, you can use dx-upload-all-outputs to upload all app outputs

# If you have the following output fields in 
# your outputSpec of the app/applet's dxapp.json: 
#     app_output_1 (file)
#     app_output_2 (file)
#
# You can create a $HOME/out folder on the worker running 
# your app. 
mkdir $HOME/out
mkdir $HOME/out/app_output_1
mkdir $HOME/out/app_output_2

# Then place your output files in following paths: 
#   $HOME/out/app_output_1/output1_file.txt
#   $HOME/out/app_output_2/output2_file.txt
# 
# When you run the following command, output_1_file.txt and
# output_2_file.txt will be uploaded and packaged into a
# DNAnexus link for app_output_1 and app_output_2 
# respectively
mv path/to/output1_file.txt $HOME/out/app_output_1/
mv path/to/output2_file.txt $HOME/out/app_output_2/
dx-upload-all-outputs

For more information about dx upload and dx-upload-all-outputs, please visit their documentation pages at Index of dx commands and Helpstrings of SDK Command Line Utilities.

Download a remote file

These are the commands used to download remote files. For more information on input and output handling, see the Jobs section below.

dxpy.download_dxfile("file-xxxx", "desired_local_filename")

Downloading a single file

# To download a specific file (i.e. a file given as input to the app)
dx download remote_filename -o desired_local_filename
dx download file-xxxx -o desired_local_filename

Downloading all input files

Instead of downloading input files one-by-one, you can use dx-download-all-inputs. These files will be downloaded into subdirectories of the path $HOME/in/ on the worker.

See the bash app helper variables section below for more information on handling inputs downloaded via this utility.

# If you have the following input fields in 
# your inputSpec of the app/applet's dxapp.json: 
#     input_file (file)
#     input_file_array (array:file)
# 
# The files will be downloaded into the following folders on
# the worker: 
#     $HOME/in/input_file/
#     $HOME/in/input_file_array/
#
# where the $HOME/in/input_file_array/ is a parent directory 
# with numbered subdirectories for each input file in the  array 
# (zero-padded to ensure argument order)
dx-download-all-inputs

For more information about dx download and dx-download-all-inputs, please visit their documentation pages at Index of dx commands and Helpstrings of SDK Command Line Utilities.

Jobs

Handle job input and output

In all cases, the input hash can always be found in the file ~/job_input.json.

Input variables are passed in to the function itself, so you can use named arguments (for required inputs), named keyword arguments (for optional inputs), or simply use **kwargs and parse it on your own.

Output should be returned as a hash, but if nothing is returned, it will simply be treated as the empty hash.

def main(named_required_input,
         optional_input=None,
         **kwargs):
    print "Hello world"
    # If you omit the next line, it will act as if you returned {}
    return { "myoutput": "Hello", "mysecond_output": "world" }

Input variables are set by their name; array inputs are set as bash arrays. Helper functions are available for setting individual output values.
# Single input field
echo "$input_field_name"

# Array input field
echo "${array_input_field_name[@]}"

# Report single output value using dx-jobutil-add-output
# The utility will package up IDs into a DNAnexus link for you
dx-jobutil-add-output file_output_field_name file-xxxx --class=file
# Or it can keep it as a string if you specify "string" as the class
dx-jobutil-add-output string_output_field_name file-xxxx --class=string

# For array input, add each element in the array, one at a time
dx-jobutil-add-output array_output_field_name 32 --class=int --array
dx-jobutil-add-output array_output_field_name 64 --class=int --array

Bash app helper variables

If you use the dx-download-all-inputs utility, the following environment variables will also be set to support file handling.

# If an input field is named 'input_file' in the dxapp.json
# and the input is a file, the following environment
# variables will be set in the worker running the bash app

# Input filename (i.e. 'my_file.txt')
echo "$input_file_name"

# Input file prefix (i.e. 'my_file' for a file named
# 'my_file.txt'). If a file is gzipped, the variable will 
# strip the '.gz' as well as the extension (i.e. 
# 'my_compressed_file.fq.gz' will have the prefix 
# 'my_compressed_file')
echo "$input_file_prefix"

# Input file path, assuming dx-download-all-inputs was used
# to download inputs to the worker. For a file 'my_file.txt' 
# given as input to the field 'input_file', this will 
# return '$HOME/in/input_file/my_file.txt'
echo "$input_file_path"

# These environment variables work with file array inputs 
# (class array:file) as well.

# If an input field is named 'input_array' in the 
# dxapp.json, the bash helper environment variables will be 
# set in bash array as well: 
echo "${input_array_filename[@]}"
echo "${input_array_prefix[@]}"
echo "${input_array_path[@]}"

Note: While the above environment variables are set for all bash apps/applets, the path helper variables (i.e. $input_path) are only accurate if you use the dx-download-all-inputs utility. If you download the input files one-by-one, the files will not automatically be downloaded into the $HOME/in/ subdirectories.

For more information about dx-download-all-inputs, see the download a remote file section above above.

Run an app or applet

# Run an app or applet (either an app or applet handler will work)
# with some input
application_input = { "input_field": "input_value" }
dxjob = dxapp_or_applet.run(application_input)

# Schedule the job to start after a different job finishes
dxjob2 = dxapp_or_applet.run(application_input,
                             depends_on=[dxjob])

# Name the job (by default, the platform will pick something that
# makes sense like the name of the app or applet)
dxjob3 = dxapp_or_applet.run(application_input,
                             name="jobname")

# Run an applet/app with some input; usual dx paths are accepted.  Add
# --brief to output only the job ID, -y to skip confirmation.

dx run executable_name -iinput_field_name=value -ifile=filename --brief -y
# Result: job-xxxx

# Schedule the job to start after a different job finishes
dx run executable_name --depends-on job-xxxx -iinput_field_name=value

# Optionally name the job (by default, the platform will pick
# something that makes sense like the name of the app or applet)
dx run executable_name --name my_job_name -iinput_field_name=value

Run a workflow

There is some limited support for running workflows from the command-line and from Python. In the example below, the workflow's first (0th) stage is the URL Fetcher which has the input field called url. This input field is referred to as 0.url as a result. If the second stage is also of a URL Fetcher app, then its url field can be set using the field name 1.url.

# First create a handler for the workflow; note that workflows are actually
# records, so use the record ID of your workflow to initialize the handler.
dxworkflow = dxpy.DXWorkflow("record-xxxx")

# Run the workflow using the run method.  This takes as its first input
# a dict with keys 
dxjobs = dxworkflow.run({"0.url": "http://mydomain.com/myfastqfile.fastq"})

# dxjobs now contains DXJob handlers for each of the stages of the workflow
print dxjobs[0].describe()

# Additional options can also be set when the workflow is run
dxworkflow.run(workflow_input=myinputdict,
               project="project-xxxx",     # output project
               folder="/myoutputfolder",
               name="suffix to job names (default is workflow name)")
# Run a workflow, adding --brief to only print job IDs and -y to skip
# confirmation.
dx run workflowname-or-id -i0.url=http://mydomain.com/myfastqfile.fastq --brief -y

Add an entry point (function)

By creating a new entry point to your application, you can parallelize your application by distributing work across multiple machines in the cloud.

# Input/Output work the same way as for the main entry point
@dxpy.entry_point("myfunc")
def myfunc(myinput, myinput2):
    # Your entry point code goes here
    pass

myfunc() {
        # Your entry point code goes here
}

Call an entry point (function)

Calling an entry point will create a new job on another machine in the cloud that will be run with the specified input.

# Call "myfunc" with two inputs.
job_input = { "myinput": "foo", "myinput2": "bar" }
dxjob = dxpy.new_dxjob(fn_input=job_input, fn_name="myfunc")

# Schedule the job to start after a different job finishes
dxjob2 = dxpy.new_dxjob(fn_input=job_input, fn_name="myfunc",
                        depends_on=[dxjob])

# Request a particular instance type for your entry point; note that
# this allows you to adjust the instance type used based on factors
# known only at runtime, e.g. how much data needs to be processed.
dxjob3 = dxpy.new_dxjob(fn_input=job_input, fn_name="myfunc",
                        instance_type="mem2_hdd2_x4")

The job input options follow the same syntax and conventions as is used for dx run.
# Call "myfunc" with two inputs
dx-jobutil-new-job myfunc -imyinput:string=foo -imyinput2:string=bar

# Schedule the job to start after a different job finishes
dx-jobutil-new-job myfunc --depends-on job-xxxx -imyinput:int=32

Use DNAnexus links

Use job-based object references (JBORs)

You can use job-based object references as part of the input to new jobs or as output of a job.

# Job input
dx-jobutil-new-job function_name -iother_input=job-xxxx:ref_output_field

# Referencing a member of an output array (3rd element)
dx-jobutil-new-job function_name -iother_input=job-xxxx:ref_output_field.2

# Job output
dx-jobutil-add-output my_output job-xxxx:ref_output_field
# Job input
input_hash = {
    "other_input":
        dxpy.DXJob("job-xxxx").get_output_ref("ref_output_field")
    }
dxpy.new_dxjob(fn_name="function_name",
               fn_input=input_hash)

# Job output
output_hash = {
    "my_output":
        dxpy.DXJob("job-xxxx").get_output_ref("ref_output_field")
    }
return output_hash

Make API calls directly

See the Directory of API methods for a complete list of API methods that can be called and links to their specifications.

Wrapper functions are available for most of the API, and there is also a utility function for contacting the API server directly using the network and authentication configuration options found in the environment.
# The wrappers usually take in an object ID and an input hash
dxpy.api.file_describe(object_id="file-xxxx",
                       input_params={"properties": True})

# By default, *input_params* is the empty hash
dxpy.api.file_describe(object_id="file-xxxx")

# Some wrappers do not require an object ID
dxpy.api.file_new(input_params={"project": "project-xxxx"})
dxpy.api.system_find_apps(input_params={"name": "bwa"})

# Because app names belong to a global namespace, app API requests can
# be made using their names and optionally a version or tag
dxpy.api.app_describe(app_name_or_id="app-bwa")
dxpy.api.app_describe(app_name_or_id="app-xxxx")
dxpy.api.app_describe(app_name_or_id="app-bwa", alias="1.0.0")
dxpy.api.app_describe(app_name_or_id="app-bwa", alias="testing")

# The DXHTTPRequest method allows you to specify the API route and
# request body directly
dxpy.DXHTTPRequest(resource="/file/new", data={"project": "project-xxxx"})

The dx client has a command for making API calls directly.
# Usage: dx api class/ID method input_json (default input_json is '{}')
dx api file-xxxx describe '{"properties": true}'
dx api file new '{"project": "project-xxxx"}'
dx api system findApps
dx api app-bwa describe
dx api app-xxxx describe
dx api app-bwa/1.0.0 describe
dx api app-bwa/testing describe

Accessing the environment

The following environment variables are always set inside a running job:

DX_SECURITY_CONTEXT
DX_APISERVER_HOST
DX_APISERVER_PORT
DX_JOB_ID
DX_PROJECT_CONTEXT_ID
DX_WORKSPACE_ID

# Additional variables for apps (but not applets)
DX_RESOURCES_ID
DX_PROJECT_CACHE_ID

These special variables are loaded for use when you load the library. For more details on their values, see the Execution Environment Reference.

dxpy.APISERVER_PROTOCOL
dxpy.APISERVER_HOST
dxpy.APISERVER_PORT
dxpy.JOB_ID
dxpy.WORKSPACE_ID
dxpy.PROJECT_CONTEXT_ID

Last edited by Thanh-Ha Nguyen, 2016-06-24 23:25:08

 Feedback