This page describes the various states that a
job or analysis may
go through in its lifecycle.
Jobs can be monitored using the Monitor tab
on the website, or with the
command in the CLI.
For reference, we use the following terminology when discussing jobs
- execution: an analysis or job
- root execution: the analysis or job that was created
immediately as a result of a user making the API call to run a
workflow, app, or applet
- execution tree: the set of all jobs and/or analyses that
were created as a result of running a root execution
- analysis: an analysis is created when a workflow is run; it
consists of some number of stages, each of which is
represented by either another analysis (if running a workflow) or
a job (if running an app or applet)
- parent analysis: each analysis is the parent analysis to
each of the jobs that were created to run its stages
- job: a job is created when an app or applet is run, or
when a job spawns another job; a job is always a unit of execution
that is run on a worker in the cloud
- origin job: the job created when an app or applet is run by
either a user or an analysis; this job always executes the
"main" entry point
- master job: the job created when an app or applet is run by
a user, job, or analysis; this job always executes the "main"
entry point; all origin jobs are also master jobs
- parent job: the job which creates another job via the /job/new API call
- subjob or child job: a job created by a job via the
/job/new API call; this job executes the entry point specified
in the API call which created it and runs the same executable as
- job tree: a set of all jobs which share the same origin job
- job-based object reference: a hash containing a job ID and an
output field name; this hash is given in the input or output of a
job and will be replaced with the specified job's output field by
the DNAnexus platform once the specified job has transitioned to the
Example Execution Tree
In the following example, we have a workflow that has two stages, one
of which is an applet, and the other of which is an app.
If the workflow is run, it will generate an analysis with an attached
workspace for storing intermediate output from its stages. Jobs are
also created to run the two stages. These jobs in turn can spawn more
jobs, either to run another function in the same executable or to run
an executable. The blue labels indicate which jobs or analyses can be
described using a particular term (as defined above).
Note that the subjob or child job of stage 1's origin job shares
the same temporary workspace as its parent job. Any calls to run a
new applet or app (using the API methods /applet-xxxx/run or
/app-xxxx/run) will launch a master job that has its own separate
workspace, and (by default) no visibility into its parent job's
Every successful job goes through at least the following four states:
- idle: initial state of every new job, regardless of what API call was made to create it.
- runnable: the job's inputs are ready, and it is not waiting for any other job to finish or data object to finish closing.
- running: the job has been assigned to and is being run on a worker in the cloud.
- done: the job has completed, and it is not waiting for any subjob to finish or data object to finish closing. This is a terminal state, so no job will become a different state after transitioning to done.
Jobs may also pass through the following transitional states as part of more complicated execution patterns:
- waiting_on_input (between idle and runnable): a job enters and stays in this state if at least one of the following is true:
- it has an unresolved job-based object reference in its input
- it has a data object input that cannot be cloned yet because it is not in the closed state or a linked hidden object is not in the closed state
- it was created to wait on a list of jobs or data objects that must enter the done or closed states, respectively (see the
dependsOn field of any API call that creates a job); linked hidden objects are implicitly included in this list
- waiting_on_output (between running and done): a job enters and stays in this state if at least one of the following is true:
- it has a descendant job that has not been moved to the done state
- it has an unresolved job-based object reference in its output
- it is an origin or master job which has a data object (or linked hidden data object) output in the closing state
There are two terminal job states other than the done state, terminated and failed, and a job can enter either of these states from any other state except another terminal state.
The terminated state is entered when a user has requested that the job (or another job that shares the same origin job) be terminated. For all terminated jobs, the
failureReason in their describe hash will be set to "Terminated", and the
failureMessage will indicate the user responsible for terminating the job. Only the user who launched the job or administrators of the job's project context can terminate the job.
Jobs can fail for a variety of reasons, and once a job fails, this triggers
failure for all other jobs that share the same origin job. If an unrelated
job (i.e. is not in the same job tree) has a job-based object reference or
otherwise depends on a failed job, then it will also fail. See
Errors for the types of errors that may occur (reported
failureReason field and visible in the Monitor tab) and
For unsuccessful jobs, there are a couple more states that jobs may enter between the running state and its eventual terminal state of terminated or failed; unsuccessful jobs starting in all other non-terminal states will be transitioned directly to the appropriate terminal state.
- terminating: the transitional state when the worker in the cloud
has begun terminating the job and tearing down the execution
environment. Once the worker in the cloud has reported that it has
terminated the job or otherwise becomes unresponsive, then the job
will transition to its terminal state
- debug_hold: a job has been run with debugging options and has
failed for an applicable reason, and is being held for debugging by
the user. See
Connecting to Jobs for
more information on how to trigger this.
All analyses start in the state in_progress, and like jobs, will
end up in one of the terminal states done, failed, or
terminated. The following diagram shows the state transition for
all successful analyses.
If an analysis is unsuccessful, it may transition through one or more
intermediate states before it reaches its terminal state:
- partially_failed: this state indicates that one or more stages
in the analysis have not finished successfully, and there is at
least one stage which has not transitioned to a terminal state. In
this state, some stages may have already finished successfully (and
entered the done state), and the remaining stages will also be
allowed to finish successfully if they can.
- terminating: an analysis may enter this state either via an API
call where a user has terminated the analysis, or there is some
failure condition under which the analysis is terminating any
remaining stages. This may happen if the
executionPolicy for the
analysis (or a stage of an analysis) had the
onNonRestartableFailure value set to "failAllStages".
In general, compute and data storage costs due to jobs that end up
failing because of user error (e.g. InputError, OutputError) and
terminated jobs are still charged to the project in which the jobs
were run. For internal errors of the DNAnexus platform, such costs
will not be billed.
The costs for each stage in an analysis is determined independently.
If the first stage finishes successfully while a second stage fails
for a system error, the first stage will still be billed, and the
second will not.