In this tutorial we look at the flow of creating and managing workflows:
from a local, project-based workflow to a versioned, global workflow that
can be published to authorized users and listed in the DNAnexus tools library.
Workflows created in the DNAnexus platform with an ID
workflow-xxxx are data
objects stored in a project. Since they are stored in a project it is easy to share
the workflow with other users by adding them to the project. Such "local" workflows
are also great for fast iterations for development and testing. It is convenient
that we can always delete the workflow and create a new one in its place.
However, for users and organizations collaborating on multiple private or public
projects, these local workflows may be less suitable for long-term maintenance
and collaboration. To address use cases in this context, DNAnexus provides an
additional suite of release management features that can be used. For example,
an organization administrator or workflow developer may want to restrict execution
of a particular workflow to specific users and organizations, or they may want
to allow execution of this workflow across cloud regions and providers. These
features are applied to objects we call "global workflows".
A global workflow is an executable that can be versioned and published to other
users. It is implemented as a wrapper around an existing, project-based workflow.
Examples of when it is useful to make a local workflow global
- version management: it is easy to maintain the source code of the workflow across
- provenance: global workflows maintain an explicit history of changes for the workflow
name and its associated ID. The user can always revert to a previous version and
versions are immutable
- sharing: a global workflow can be maintained by a set of developers across projects
and organizations and this workflow can be shared to a separate set of users who
cannot modify the workflow but can run it
- multi-region support: it is convenient to maintain one executable that can be
run across multiple regions and cloud providers
- immutability guarantees: it is not possible to change/override an existing version
of a given global worflow
- ability to list the workflow in the DNAnexus tools library: the authorized users
and developers can then discover global workflows shared with them
Step 1: Build a local workflow
The easiest way to build a workflow is by using the web interface, as detailed
in the Building Workflows in the UI guide. We can also use
the DNAnexus SDK, dx-toolkit. For details on how to create a workflow
using dx-toolkit, please check the
Intro to Building Workflows
In both tutorials we built a similar workflow named "BWA MEM + GATK Exome Workflow"
in a project. The creator and all the users that have at least VIEW permissions
to this project, can run it in that project.
We will now create a global workflow based on the "BWA MEM + GATK Exome Workflow".
Please, make sure your installed dx-toolkit version is at least 0.253 (you
can upgrade it using
Step 2: Make the workflow global
We can use the existing, project-based workflow to create a new object on the DNAnexus
platform - a global workflow. Functionally, the global workflow will be identical to
the original workflow.
The basic procedure is the following:
- use the
dx get command to
download a JSON representation (
dxworkflow.json) of a chosen workflow on the platform
- set a unique name (in the global namespace) and version in the downloaded JSON file
- use the
dx build --globalworkflow
command to build the global workflow on the platform
We'll go over each step in more detail below.
Download the original workflow
First, let's download the JSON source code of the workflow, as follows:
dx get "BWA MEM + GATK Exome Workflow"
This will create a directory "BWA MEM + GATK Exome Workflow" locally and
dxworkflow.json file in it.
Alternatively, you can skip this step since in the next section
we provide the example
dxworkflow.json that you can use.
Build a global workflow on the platform
We will use the downloaded
dxworkflow.json to create a new global workflow.
The only additional fields that are required for the global workflow are: name
and version. The name has to be unique in the global namespace shared by apps
and global workflows, that is no other app or global workflow (whether it is
available to us or not) should have the same name. We can leave the "stages"
and any other fields unchanged.
: We recommend following the Semantic Versioning (external link
) conventions for numbering the versions of your global workflow.
dxworkflow.json can be found
We can add fields such as
summary to make our workflow more
user-friendly. It is also a good practice to include documentation which can
be placed in the
Readme.md file in the same
We can now generate our first global workflow as follows:
dx build --globalworkflow "BWA MEM + GATK Exome Workflow"
BWA MEM + GATK Exome Workflow is the name of the directory storing the
it can be changed freely.
Describe and run the workflow
dx build command should return us the unique ID ('globalworkflow-xxxx') of
the created version. We can use it to refer to the global workflow from now on,
dx describe globalworkflow-xxxx
: All the valid identifiers of the global workflow are listed
Running a global workflow is the same as running any other workflow. We can
run our workflow for example:
dx run -ialign_reads.reads_fastqgz=myreads.fastq.gz \
Step 3: Add authorized users
We can now specify a list of users with whom we want to share our workflow by using
dx add users. We will
prepare the list now but the users will be able to find and access the global workflow
only after the workflow is published. The list of users can be updated by developers
before or after it is published, and it will apply to all to the versions of the workflows,
past and future.
For example, to share our workflow with a user and an organization we'll run:
dx add users bwa_mem_gatk_exome_demo user-bob org-partnerorg
Then, we can view who is on the access list for the workflow:
dx list users bwa_mem_gatk_exome_demo
and remove a user by running:
dx remove users bwa_mem_gatk_exome_demo org-partnerorg
The authorized user permissions do not propagate to apps automatically, so if the
workflow contain any apps the users need access to them to in order to run the
workflow. To add users to an app, the same
dx add users command can be used.
Step 4: Release (publish) a version to users
Once we have tested our workflow version we can release it to the authorized
users by executing
dx publish bwa_mem_gatk_exome_demo/0.0.1
Publishing a workflow version has the following effects:
- the authorized users can discover the version via
dx find globalworkflow
- the authorized users can describe and run the version
- the authorized users can download the workflow's
dx get (as well as the dependencies, e.g. applets), though they
cannot build new versions with the same global workflow name
Developers can add and remove users and other developers any time. Adding users to
the workflow will give the users access to all the published versions but it will have
no effect on the unpublished versions. Unpublished versions are only accessible to
the people listed as
of the workflow.
The "default" alias
dx publish command will make the published version a default
one by adding a "default" alias to this version (and removing it from whichever
workflow version was previously marked as "default"). The alias indicates that
this version will be invoked whenever the workflow name is used without the
version. For example, users can run the workflow:
dx run bwa_mem_gatk_exome_demo # equivalent to "dx run bwa_mem_gatk_exome_demo/0.0.1"
The first created version of a global workflow gets this flag automatically.
dx publish --help for more options for this command.
We can enable multiple users to update current or build new versions of our global
workflow, for example:
dx add developers bwa_mem_gatk_exome_demo user-eve
Either specific users or whole orgs can be developers of a global workflow.
We can list who is on the developer list as follows:
dx list developers bwa_mem_gatk_exome_demo
To remove developers:
dx remove developers bwa_mem_gatk_exome_demo user-bob
Create a new version
We can create a new version of the workflow by updating the
dx build --globalworkflow. In order to create the new version, we
need to explicitly update the version string, e.g. to "0.0.2".
If we loose the source
dxworkflow.json we can download it any time with
for example to get version 0.0.2:
dx get globalworkflow-bwa_mem_gatk_exome_demo/0.0.2
Any developer of the workflow can download its source code and build new versions.
Authorized users that have access to published versions can only download or run it.
We can mark a global workflow version as deleted, which will make the version
unrunnable. It will still be possible to describe the workflow for provenance
purposes. For example:
dx api globalworkflow-xxxx delete
Please, use this route with caution as deleting a global executable can break
users' reproducibility requirements.
Deleting all workflow versions will not release the workflow name. That is, it
will not be possible to reuse the name for a different global workflow
by another user.
Search and discover global workflows
Search all available versions
dx find globalworkflows is useful for browsing global workflows
that are available to us.
The above command will print a list of published global workflows. To list
the workflows that are not published:
dx find globalworkflows --unpublished
: By default, "dx find globalworkflows" lists one version per each available workflow - the version marked as "default". To print the whole version history for a workflow, add the "--all" flag, for example:
dx find globalworkflows --unpublished --all
Search by category
--category parameter can be used to restrict the search to workflows
from a specific category. Common categories are available as tab completions.
dx find globalworkflows --category Variation\ Calling
To view all available categories that we can search by:
dx find globalworkflows --category-help
In the table we summarize the main steps or stages we went through to create and publish our workflow.
||Definition and purpose
||based on project permissions
||A file-like workflow object stored in a project; used for private, light-weight development
|Global workflow unpublished
||A development version of a workflow that is not visible to users but is to developers; used for development and testing in multiple regions, release management, preparation for publishing to users
|Global workflow published
||A version of a workflow that is available to authorized users
|Global workflow deleted
||A version (previously published or unpublished) that is not runnable but can be described
- A detailed description of the global workflow object is included in the API documentation.