Running Analyses

The most basic unit of execution on the DNAnexus platform is the job. For every job, a fresh worker in the cloud is prepared, and the job's code is downloaded to the machine and executed. The job may make API calls, perform computations, or spawn other jobs. For more details on the lifecycle that DNAnexus executions take, see the Job and Analysis Lifecycles page. In this section, we cover the API for creating the different executables that can be run to create these jobs, and how you can interact with the results.

The Executables

On the DNAnexus platform, there are three different types of executables that can be run.

Applets

Applets are data objects which can be thought of as the fundamental building block of any executable on the platform. They contain all the data and metadata required to run a job.

Apps

Apps wrap applets into a packaged product; they provide an additional layer on top of applets to facilitate versioning and sharing the executable with a wider audience. Like applets, they produce a job when run, but they are not data objects and therefore do not reside in projects.

Workflows

Workflows are data objects which contain metadata for setting up a pipeline of one or more apps or applets together as a single unit of analysis. In contrast with the other types of executables, workflows do not produce a single job but produce an analysis which itself contains a list of all the jobs created for each step in the pipeline.

Components of an Applet

Whenever a job runs on a worker, it is running an applet (or an app, which is just a packaged applet). An applet can be described as having the following components, some of which are optional:

  • Input specification: if provided, this specifies the named inputs that the executable expects and any constraints that they may have. For example, an applet may expect to be given a File for the input field called "reads".
  • Output specification: if provided, this specifies the named outputs that the executable guarantees that it will provide. For example, an applet may provide an output field called "mappings" that will be a File.
  • Code: the code that is actually run on the worker; this code must be in bash or Python 2.7 and can consist of multiple functions, or entry points (See Code Interpreters for more information on writing entry points for the different languages.)
  • Bundled files: if an applet requires additional files or programs that a developer has compiled (perhaps written in a different language such as C++), these can be bundled with the applet and made available when it is run.
  • Additional resource requirements: additional resources in the form of additional computational power or memory, software packages, additional network access, special project permissions, etc. can also be requested (see the Run Specification and Access Requirements sections for how to specify these)

Jobs

When an applet or app is run, a job is created, and the "main" function, or entry point, of the applet's code is executed on a worker node in the DNAnexus cloud. The code that is run must be written in one of the languages supported by DNAnexus (currently, these are bash and Python 2.7); however, that code can spawn other Linux processes to perform its tasks, including compiled programs.

The job runs in the Execution Environment, a fully capable, isolated Linux environment. The API server is always available to the job.

Like data objects, jobs can be tagged with (and searched by their) metadata.

Job Hierarchy

A job can launch other jobs by running an executable directly (e.g. via the API calls /applet-xxxx/run or /app-xxxx/run), or by calling another entry point in its own executable (the API call /job/new). The jobs launched are called child jobs and the job that launched them is called the parent job. The original job created when the user runs an applet or app is called an origin job, while a job created when a job runs an applet or app is called a master job. Jobs can depend on each other or on data objects, so that a job does not start until other jobs are finished or certain data objects are closed. These dependencies can be implicit via Job-based Object References provided in the input, or via the dependsOn field in the API call when creating the new job.

The Project Context and Temporary Workspace

An executable is always launched from a particular project, and any child jobs descendant from the resulting origin job will inherit this attribute which we call the project context. This project will be charged for the execution and data usage of all these jobs. It is also the project into which any data objects output by the origin job will be placed. For these reasons, the user running the executable must have CONTRIBUTE access to the project.

For applets, jobs will receive VIEW permission to the project by default. For apps, jobs will not automatically receive any permissions into the project. An applet or app can, however, specify in its access requirements that it requires certain permissions into the running user's projects in order to function correctly.

Jobs running as part of the same executable (either an origin or master job and all its descendants created from calling /job/new) will always share the same temporary workspace, which is a container for objects that the executable can read from and write to on the platform. This is distinct from the local disk that each job will receive on its machine; jobs must upload any data to the platform in order to share it with other jobs or provide output. Jobs' temporary workspaces are containers for data which behave like projects, except they cannot be explicitly created or destroyed, and their permissions are fixed. See Data Containers for more generic details about containers and Containers for Execution for specifics on the types of containers involved in app and applet execution.

Jobs always receive CONTRIBUTE permission to their temporary workspace. The data objects given as inputs to an executable (and all hidden objects that they link to) are cloned into the workspace before the executable will begin running. If any objects given as inputs are not from the project context but from other projects, the user or job that launches an executable must have VIEW access to those projects and those projects must not have the RESTRICTED flag set. Upon completion of the job, objects given as outputs are cloned into the project from which the executable was launched, and the workspace is destroyed.

Example: Inputs from Different Projects

Example: Chained Execution

If an applet, while running, launches an applet, then the project context is carried forward, but a new workspace is made for the launched applet. The launched applet has VIEW access to the original project and CONTRIBUTE access to its workspace but no access to the workspace of the applet that it was launched from. When the launched applet is done, any objects output by the job are cloned back into the workspace of the parent applet.

In the figure below, we have an example where Applet1 calls another Applet2 and receives its output (Object C) in its workspace. Because Applet1 was launched with the project context of Project A, both jobs have VIEW access (the black arrows) into Project A. The jobs of Applet1, however, does not have any permissions into the temporary workspace of Applet2's jobs, and vice versa.

Other than the original project (to which applet jobs have VIEW access) and the intermediate chain of workspaces (to which jobs have CONTRIBUTE access), jobs do not have access to any other projects except for public projects.

Since the system needs to clone objects, jobs will only start once all of its data object input is in the closed state. Likewise, if an origin or master job outputs any objects, they must be in either the "closing" or "closed" states. All jobs may have finished running, but the origin or master job will remain in the "waiting_on_output" state until all output objects have transitioned to the "closed" state.

Note that if Applet2 is an app and not an applet, its jobs will again have CONTRIBUTE permissions to their own temporary workspace, but they will have no access into the project context. In the case where an app calls an applet, the app's permissions are the maximum that the applet can be run with. Thus the applet will have no access into the project context as well.

Last edited by Thanh-Ha Nguyen, 2016-06-24 20:03:40

 Feedback