dx-docker

Run Docker containers within DNAnexus applications

Overview

Docker (external link) images provide a way to encapsulate files, configurations, and dependencies of a particular application (typically a command line tool) in a single place. These images can be shared via registries in the cloud and this approach for sharing applications and dependencies has become increasingly popular the last few years.

dx-docker is a utility included in the DNAnexus toolkit (dx-toolkit) that provides the ability to obtain these images, run them, and cache them within projects and applets on the platform. We developed this utility to allow users to run images within the security confines of our execution environment. dx-docker uses the open App Container specification (external link) format by CoreOS to obtain Docker images and PRoot (external link) to execute commands within Docker containers. The motivation behind using the open specification is to more easily allow support of alternative and emerging containerization technologies.

Using a Docker image allows the user to develop certain core features of their application independent from DNAnexus and subsequently import/wrap their image with additional features and metadata associated with the platform.

Quickstart

This simple application illustrates how you would run dx-docker within a DNAnexus applet much like you would run the Docker CLI. We will run a quay.io (external link) Samtools image in a DNAnexus applet.

dx-docker run pulls the Docker image from quay.io in this example. If you would like to cache this image--or a privately developed image--in your applet resources or a DNAnanexus project, please see the section on caching images in the platform below.

src/docker-samtools-example.sh

#!/bin/bash
# Very simple example of using Docker in a DNAnexus applet

main() {
    set -e -x -o pipefail

    echo "Value of fastafile: '$fastafile'"
    echo "Value of indexname: '$indexname'"

    # Make a data directory to mount into the Docker container
    mkdir -p /data/

    # Download the file into the data directory
    dx download "$fastafile" -o /data/fastafile.fasta

    # Mount the /data/ directory to /data in the container and run
    dx-docker run -v /data/:/data quay.io/ucsc_cgl/samtools faidx /data/fastafile.fasta

    # Recommended alternative (pull from immutable image ID) when possible
    dx-docker run -v /data/:/data quay.io/ucsc_cgl/samtools@sha256:3b9ab37fbe2da27bb8067be122fc072d99e35406f293923a30d627477e0770b0 faidx /data/fastafile.fasta

    # Rename the index file to user input
    mv /data/fastafile.fasta.fai $indexname

    fastaidx=$(dx upload $indexname --brief)
    dx-jobutil-add-output fastaidx "$fastaidx" --class=file
}

dxapp.json

There are no substantial changes from what would be generated by dx-app-wizard. Note that we are using Ubuntu 14.04 in this example as this is the target of dx-docker development. We also enable network access to pull the image from quay.io.

{
  "name": "docker-samtools-example",
  "title": "Simple example of using dx-docker with Samtools",
  "summary": "docker-samtools-example",
  "dxapi": "1.0.0",
  "version": "0.0.1",
  "inputSpec": [
    {
      "name": "fastafile",
      "label": "FASTA file for input",
      "class": "file",
      "optional": false,
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "indexname",
      "label": "Name of index file",
      "class": "string",
      "optional": true,
      "default": "fastaidx.fai",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],
  "outputSpec": [
    {
      "name": "fastaidx",
      "label": "Samtools index of FASTA file",
      "class": "file",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],
  "timeoutPolicy": {
    "*": {
      "hours": 48
    }
  },
  "runSpec": {
    "interpreter": "bash",
    "file": "src/docker-samtools-example.sh",
    "systemRequirements": {
      "*": {
        "instanceType": "mem1_ssd1_x4"
      }
    },
    
    "distribution": "Ubuntu",
    "release": "14.04"
    
  },
  "access": {
    
    "network": [
      "*"
    ]
    
  }
}

Additional help for the command and its sub-commands can be obtained with:

$ dx-docker --help
$ dx-docker run --help

Use cases

Pull an image

Basics

To pull a public Docker image from a standard repository such as Dockerhub or quay.io within a DNAnexus applet, one simply calls dx-docker pull. This command obtains the different image layers and merges them into an image that can be executed within the platform. It works like docker pull within the security confines of our execution environment:

Docker images can be pulled and executed in multiple ways:

$ dx-docker pull ubuntu
sha256:fc19d60a83f: 824 B/824 B
sha256:2879a7ad314: 678 B/678 B
sha256:de413bb911f: 444 B/444 B
sha256:668604fde02: 160 B/160 B
sha256:6bbedd9b76a: 2.65 MB/49.9 MB
sha256:6bbedd9b76a: 9.71 MB/49.9 MB
sha256:6bbedd9b76a: 12.8 MB/49.9 MB
sha256:6bbedd9b76a: 19.6 MB/49.9 MB
sha256:6bbedd9b76a: 23.5 MB/49.9 MB
sha256:6bbedd9b76a: 30 MB/49.9 MB
sha256:6bbedd9b76a: 33.5 MB/49.9 MB
sha256:6bbedd9b76a: 38.7 MB/49.9 MB
sha256:6bbedd9b76a: 42.5 MB/49.9 MB
sha256:6bbedd9b76a: 48 MB/49.9 MB

The command above will download (from Dockerhub) the latest image that contains the utilities included in a basic Ubuntu distribution. The sha256: lines correspond to download progress of various components (layers) of the image that are downloaded in parallel. If certain layers take longer, the sha256: line will be repeated with an update on the download progress. Updates for each layer are provided when another 10% of the layer is downloaded. We provide this output for debugging reasons (e.g. to diagonose whether a failed pull occured due to a network failure). This output can be suppressed with dx-docker pull -q ....

Pull an image with a specific tag/version (common use case)

It is also common to pull a specific tagged release of an image:

$ dx-docker pull ubuntu:14.04

Pull public image from immutable image digest (recommended when possible)

Pulling a Docker image by tag does not guarantee that the image will always be the same. To ensure immutability of an image, Docker supports pulling via a digest, and dx-docker works analagously:

dx-docker pull quay.io/ucsc_cgl/samtools@sha256:3b9ab37fbe2da27bb8067be122fc072d99e35406f293923a30d627477e0770b0

Important note: we have noticed that there are cases where pulling from an immutable digest does not work even for the Docker CLI (e.g. images that are a part of the library collection such as ubuntu have this problem, while images associated with a username such as armbuild like the example above do not). We will update this documentation as we find out more information on this issue. An alternative to pulling an image by immutable digest is to use one of our dx-docker utilities to cache an image within a project or applet on the platform (this works for private images as well). The images will then be associated with an immutable ID to ensure provenance of data used in your applications and workflows. See the section below on storing Docker images on the platform for how to use this feature.

Running Docker images

A command within the image can be run analagously to the Docker CLI:

$ dx-docker run ubuntu cat /etc/issue
Image Repo  library/ubuntu
Image Tag   latest
Image Size  122M
Image ID    sha256:668604fde02e75dddb4b44c80d4ce20baaac4832c41c3a945f4a236cd7d2f164
Last Updated    2016-10-13T21:13:21.562394737Z (1d 1h ago)
Registry    registry-1.docker.io
Default CMD ["/bin/bash"]

Ubuntu 16.04.1 LTS \n \l

Note here that the image is a 16.04 image that is tagged with 'latest'. Metadata associated with the image is also printed by default (included because it may be useful for inspection in DNAnexus job logs). This output can be suppressed with dx-docker run -q .... If an image has not been pulled, dx-docker run will fetch it for you like the Docker CLI does.

dx-docker supports typical CMD, ENTRYPOINT, and ENV declarations in Dockerfiles to allow images to be run like applications. For example:

$ dx-docker run quay.io/ucsc_cgl/samtools

should return the samtools help since the image has metadata associated with it about where the executable is located in the image.

Environment variables can be set in the container when using the docker run by adding one or more -e or --env flags. By default home directory environment variable is set to /root.

Mounting volumes

dx-docker also supports mounting in data volumes into the container, which is crucial for applications that require and produce data that will be used outside the image, e.g.:

$ mkdir -p data
$ dx-docker run -v data:/data-host ubuntu:14.04 touch /data-host/newfile.txt
$ test -e data/newfile.txt

Please see dx-docker run --help for more information about how a Docker image can be run within the platform.

Storing images on the platform

These subcommands allow a user to create a private image or simply store a public image within a DNAnexus applet or project.

Some advantages to storing images within projects on the platform:

  • Applet does not need network access to obtain the image and avoids networking and other maintenance issues that can occur by relying on a third-party repository
  • Can obtain images much more efficiently when within cloud due to fast network speeds and region locality of project
  • Private images stored within applets or projects naturally inherit access/permission of the project

Container clean up (–rm)

By default a container file system is removed after the container exits. This prevents container accumulation on the platform. Volumes associated with the container are not removed. Adding --rm flag to the run command is therefore redundant (only a message is printed).

add-to-applet

The most straightforward case is to cache a developed or pulled Docker image within an applet that you are developing:

$ docker pull busybox
$ dx-docker add-to-applet busybox applet/
$ dx build applet/

applet/ in the command above is a directory corresponding to a DNAnexus applet. The command simply places the image data in the local cache so a network connection is not required.

This applet can be built in different projects and regions (e.g. China in the image above). After building the applet you will be able to efficiently obtain the image within the DNAnexus cloud without pulling from a public repository. Private images remain within the applet as a project in this case.

create-asset (experimental and subject to change)

We also provide experimental support for storing a Docker image to the platform using a DNAnexus asset. The use case here is that if a single large image is being used by multiple applets, it can be stored once in a project and reused by applets by referencing the asset in the dxapp.json. Note that the "name" field of the asset is identical to the name of the Docker image.

An example:

$ docker pull busybox
$ dx-docker create-asset busybox

Exporting Docker image busybox
Extracting a61cd723bcf2
Extracting 9967c5ad88de
Building DNAnexus asset for busybox
Image busybox successfully cached in DNAnexus platform.
To include this cached image in an application, please include the following within the runspec/assetDepends list in your dxapp.json.
    {
        "project": "project-BzzxKXj07V9qg02vx4K4K4vJ",
        "folder": "/",
        "name": "busybox",
        "version": "0.0.1"
    }

Using Local Image IDs and Digests

dx-docker create-asset and dx-docker add-to-applet also support using image IDs or digests. Due to the nature of how dx-docker functions these IDs are not interchangable. For example, if you have a local ubuntu:14.04 docker image with short ID, long ID sha256, and digest, all three values can be used with dx-docker create-asset and dx-docker add-to-applet. However, when executing dx-docker run on the DNAnexus platform, the exact same ID must be used as when the dx-docker compatible image was generated. For this reason using the repo:tag format is preferred when working with create-asset and add-to-applet.

For example on your local machine:

$ docker pull busybox
$ docker images busybox
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
busybox             latest              00f017a8c2a6        6 days ago          1.11 MB

$ dx-docker create-asset 00f017a8c2a6

Exporting Docker image 00f017a8c2a6
Image ID: 00f017a8c2a6 found in local docker cache
Extracting 628d8a43e932
Building DNAnexus asset for 00f017a8c2a6
Uploading DNAnexus asset for 00f017a8c2a6
Image 00f017a8c2a6 successfully cached in DNAnexus platform.
To include this cached image in an application, please include the following within the runspec/assetDepends list in your dxapp.json.
    {
        "project": "project-F33Pv680JzBXy7b5B7p1pqQF",
        "folder": "/",
        "name": "00f017a8c2a6",
        "version": "0.0.1"
    }

Subsequently on the DNAnexus platform:

$ dx-docker run 00f017a8c2a6 ls

Installation

dx-docker should be available within the dx-toolkit, so no additional installation is required.

If you are aware of a very recent feature added to dx-docker, it may not be available in DNAnexus applets for approximately two weeks. This is because applets use dx-toolkit releases maintained in our Debian repository and these releases go through our staging environment before reaching our production environment. If you want to use the latest commits to the dx-toolkit source, you can install dx-toolkit from source within a DNAnexus application as follows:

$ apt-get install python-virtualenv
$ git clone --recursive https://github.com/dnanexus/dx-toolkit.git
$ make -C dx-toolkit python dx-docker && source dx-toolkit/environment

Notes and issues

  • Tests have been performed using Ubuntu 14.04. Please use this (as opposed to 12.04) in your DNAnexus applet configuration.
  • Not necessarily an issue for dx-docker directly, but even for the Docker CLI, pull by digest sometimes does not work for images in library/.

Last edited by kpjensen, 2017-03-23 17:40:19

 Feedback