The most basic unit of execution on the DNAnexus platform is the
job. For every job, a fresh worker in the cloud is prepared, and
the job's code is downloaded to the machine and executed. The job may
make API calls, perform computations, or spawn other jobs. For more
details on the lifecycle that DNAnexus executions take, see the
Job and Analysis Lifecycles page. In this section,
we cover the API for creating the different executables that can
be run to create these jobs, and how you can interact with the
On the DNAnexus platform, there are three different types of
executables that can be run.
Applets are data objects which can be thought of as the
fundamental building block of any executable on the platform. They
contain all the data and metadata required to run a job.
Apps wrap applets into a packaged product; they provide an additional
layer on top of applets to facilitate versioning and sharing the
executable with a wider audience. Like applets, they produce a
job when run, but they are not data objects and therefore do
not reside in projects.
Workflows are data objects which contain metadata for setting up a
pipeline of one or more apps or applets together as a single unit of
analysis. In contrast with the other types of executables, workflows
do not produce a single job but produce an analysis which itself
contains a list of all the jobs created for each step in the
Components of an Applet
Whenever a job runs on a worker, it is running an applet (or an app,
which is just a packaged applet). An applet can be described as
having the following components, some of which are optional:
- Input specification:
if provided, this specifies the named inputs that the executable
expects and any constraints that they may have. For example, an applet
may expect to be given a File for the input field called "reads".
- Output specification:
if provided, this specifies the named outputs that the executable
guarantees that it will provide. For example, an applet may provide an
output field called "mappings" that will be a File.
- Code: the code
that is actually run on the worker; this code must be in bash or
Python 2.7 and can consist of multiple functions, or entry points
for more information on writing entry points for the different
- Bundled files:
if an applet requires additional files or programs that a developer
has compiled (perhaps written in a different language such as C++),
these can be bundled with the applet and made available when it is
- Additional resource requirements: additional resources in the
form of additional computational power or memory, software packages,
additional network access, special project permissions, etc. can
also be requested (see the
sections for how to specify these)
When an applet or app is run, a job is created, and the "main"
function, or entry point, of the applet's code is executed on
a worker node in the DNAnexus cloud. The code that is run must be
written in one of the languages supported by DNAnexus (currently,
these are bash and Python 2.7); however, that code can spawn
other Linux processes to perform its tasks, including compiled
The job runs in the
Execution Environment, a fully
capable, isolated Linux environment. The API server is always
available to the job.
Like data objects, jobs can be tagged with (and searched by their)
A job can launch other jobs by running an executable directly
(e.g. via the API calls
or /app-xxxx/run), or by
calling another entry point in its own executable (the API call
/job/new). The jobs
launched are called child jobs and the job that launched them is
called the parent job. The original job created when the user
runs an applet or app is called an origin job, while a job created
when a job runs an applet or app is called a master job. Jobs can
depend on each other or on data objects, so that a job does not start
until other jobs are finished or certain data objects are closed.
These dependencies can be implicit via
Job-based Object References
provided in the input, or via the
dependsOn field in the API call
when creating the new job.
The Project Context and Temporary Workspace
An executable is always launched from a particular project, and any
child jobs descendant from the resulting origin job will inherit this
attribute which we call the project context. This project will be
charged for the execution and data usage of all these jobs. It is also
the project into which any data objects output by the origin job will
be placed. For these reasons, the user running the executable must
have CONTRIBUTE access to the project.
For applets, jobs will receive VIEW permission to the project by
default. For apps, jobs will not automatically receive any
permissions into the project. An applet or app can, however, specify
that it requires certain permissions into the running user's projects
in order to function correctly.
Jobs running as part of the same executable (either an origin or
master job and all its descendants created from calling
always share the same temporary workspace, which is a container
for objects that the executable can read from and write to on the
platform. This is distinct from the local disk that each job will
receive on its machine; jobs must upload any data to the platform in
order to share it with other jobs or provide output. Jobs' temporary
workspaces are containers for data which behave like projects, except
they cannot be explicitly created or destroyed, and their permissions
are fixed. See Data Containers for more generic
details about containers and
Containers for Execution for specifics
on the types of containers involved in app and applet execution.
Jobs always receive CONTRIBUTE permission to their temporary
workspace. The data objects given as inputs to an executable (and all
hidden objects that they link to) are cloned into the workspace before
the executable will begin running. If any objects given as inputs are
not from the project context but from other projects, the user or job
that launches an executable must have VIEW access to those projects
and those projects must not have the RESTRICTED flag set. Upon
completion of the job, objects given as outputs are cloned into the
project from which the executable was launched, and the workspace is
Example: Inputs from Different Projects
Example: Chained Execution
If an applet, while running, launches an applet, then the project context is carried forward, but a new workspace is made for the launched applet. The launched applet has VIEW access to the original project and CONTRIBUTE access to its workspace but no access to the workspace of the applet that it was launched from. When the launched applet is done, any objects output by the job are cloned back into the workspace of the parent applet.
In the figure below, we have an example where Applet1 calls another Applet2 and receives its output (Object C) in its workspace. Because Applet1 was launched with the project context of Project A, both jobs have VIEW access (the black arrows) into Project A. The jobs of Applet1, however, does not have any permissions into the temporary workspace of Applet2's jobs, and vice versa.
Other than the original project (to which applet jobs have VIEW access) and the intermediate chain of workspaces (to which jobs have CONTRIBUTE access), jobs do not have access to any other projects except for public projects.
Since the system needs to clone objects, jobs will only start once all of its data object input is in the closed state. Likewise, if an origin or master job outputs any objects, they must be in either the "closing" or "closed" states. All jobs may have finished running, but the origin or master job will remain in the "waiting_on_output" state until all output objects have transitioned to the "closed" state.
Note that if Applet2 is an app and not an applet, its jobs will again have CONTRIBUTE permissions to their own temporary workspace, but they will have no access into the project context. In the case where an app calls an applet, the app's permissions are the maximum that the applet can be run with. Thus the applet will have no access into the project context as well.