Using dx ssh to set up an interactive workstation

On this page, we will take you through the cloud workstation app. This app sets up a job where you can SSH into the worker running the job and use the worker as a "workstation" to explore and manipulate data stored on DNAnexus as you would on a local Linux machine.

The benefit to using this workstation app as opposed to running on your local machine is that in the workstation, you will be able able to access data stored on DNAnexus without downloading the files to your local machine and being constrained by your local internet bandwidth. In addition, you can configure the applet to launch more powerful instance types (virtual computer configurations) available to DNAnexus users. Any files or results you may want to save from your workstation session can simply be uploaded back into the project from which you launched your app.

Run the workstation app

Below, we walk you through running the cloud workstation app. This app has been set up to provide basic functionalility such as access to all your data, network access to download public tools and can be run as is.

If you would like to customize your cloud workstation experience, we also provide the source code of the app so you can build your own version of the workstation. See the customizing the workstation section for more information.

Note: Please note that you can only give SSH access permissions and access the interactive worker via the command line client. Please download and install it if you have not done so already. If you are not familiar with the DNAnexus command-line tool, you may find the Command-Line Quickstart useful.

Step 1: Configure SSH for your account

If you haven't already, you will need to configure your account to allow use of SSH connections using dx ssh_config. For more information on configuring your account and connecting to jobs, click here.

Step 2: Run the app

To run the workstation app and SSH into the terminal, navigate to the project you would like to work in. You will need CONTRIBUTE or ADMINISTER access to run the app in that project.

$ dx select "my-working-project"

Run the dx command shown in the code block below. The --ssh flag will automatically configure the job to allow SSH access and connect to it after launching. This applet takes as input a maximum session length (in minutes).

$ dx run app-cloud_workstation --ssh Select an optional parameter to set by its # (^D or to finish): [0] Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length) [default="1h"] [1] Files (fids) Optional param #: 0 Input: Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length) Class: string Enter string value ('?' for more options) max_session_length: 3h Select an optional parameter to set by its # (^D or to finish): [0] Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length) [="3h"] [1] Files (fids) Optional param #: <ENTER>

Upon confirmation of input, you will be connected to the worker running the cloud workstation app and shown the following message:

Calling app-cloud_workstation with output destination project-xxxx:/ Job ID: job-xxxx Waiting for job-xxxx to start...... Resolving job hostname and SSH host key........................... Checking connectivity to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com...OK Connecting to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com Welcome to DNAnexus! This is the DNAnexus Execution Environment, running job-xxxx. Job: Cloud Workstation App: cloud_workstation:main Instance type: mem2_hdd2_x2 Project: Cloud Workstation Project (project-xxxx) Workspace: container-xxxx Running since: Fri Oct 31 17:45:26 UTC 2014 Running for: 0:00:30 The public address of this instance is ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com. You are running byobu, a terminal session manager. If you get disconnected from this instance, you can log in again and your work will be saved a s long as the job is running. For more information on byobu, press F1. The job is running in terminal 1. To switch to it, use the F4 key (press F4 again to switch back to this terminal). Use sudo to run administrative commands. From this window, you can: - Use the DNAnexus API with dx - Monitor processes on the worker with htop - Install packages with apt-get install - Use this instance as a general-purpose Linux workstation OS version: Ubuntu 12.04.5 LTS (GNU/Linux 3.2.0-70-virtual x86_64) dnanexus@job-xxxx:~$

Step 3: Set up the workspace

Due to differences in the execution environment explained below, in order to upload or download files from your parent projects, you must first run the following two commands in your workstation:

$ unset DX_WORKSPACE_ID $ dx cd $DX_PROJECT_CONTEXT_ID:

The first command unsets an environment variable which is set when the applet is launched and allows you to navigate into any of the projects you have access to. The second command is an invocation of dx cd to change the working directory of your workstation to the parent project (the only project your workstation has CONTRIBUTE access to). For more information about the environment variables in the job container, please visit the Execution Environment Reference.

The workstation should now be ready to use.

Downloading files saved on DNAnexus

This app is configured to have VIEW access to all projects that the user running the app can access. This means that you will be able to download any file you have access on DNAnexus using the dx download command.

To download a file named my-file.txt from the parent project:

$ dx download my-file.txt

To download one set of reads from the SRR100022 exome from the public Demo Data project:

$ dx download project-BQbJpBj0bvygyQxgQ1800Jkk:/SRR100022/SRR100022_1.filt.fastq.gz

To navigate to another project you have access to, other than the parent project and download a file from that project you can do the following:

$ dx select --level=VIEW Available projects (VIEW or higher): 0) Working Project (CONTRIBUTE) 1) Research Project (VIEW) 2) Production Project (VIEW) [...] Pick a numbered choice or "m" for more options [0]: 1 Setting current project to: Research Project $ dx ls my-file-1.txt $ dx download my-file-1.txt

Downloading additional tools

This app has network access so you will be able to download any tool you may need during your session as you would on a Linux workstation. After downloading your tools, you can use the worker as a general purpose workstation to manipulate and explore your data as needed.

If you would like to have your tools packaged into your workstation as it is launched, see the section below about customizing your own workstation applet.

Uploading files back to the parent project

If you wish to save any files or results from your workstation session, you must upload the files back into the project from which the cloud workstation app was launched (the "parent project"). To allow you to do this, the cloud workstation app is given CONTRIBUTE access to the parent project.

If you have been navigating around your projects, downloading files, you should use the --path option with dx upload to ensure that the files you created are uploaded to the correct project.

$ dx upload --path "$DX_PROJECT_CONTEXT_ID:" <FILE>

To perform a test upload, do the following:

$ dx ls $ echo "This is a test file" > file_from_workstation.txt $ dx upload --path "$DX_PROJECT_CONTEXT_ID:" test_file_from_workstation.txt $ dx ls

You should see the contents of your project change between the first and second invocations of dx ls.

Terminating the session

By default, your workstation will automatically shut down after the maximum session length. However, if you wish to terminate the workstation app before the end of the session, simply use the dx terminate command with the job-ID of this instance of the cloud workstation app, or terminate the job from the web platform.

$ dx terminate $DX_JOB_ID
Note: The contents of your workstation will be destroyed upon termination (either manual termination or after the workstation has run for the maximum session length). Please remember to upload any files you wish to save before the end of your session.

Cloud Workstation Execution Environment

Instance type

By default, the cloud workstation app will launch on a mem1_ssd1_x4 instance type which has 4 cores, 7.5 GB memory, and 80 GB storage. To run the app on a different instance type, use the --instance-type flag for dx run

$ dx run --instance-type mem1_ssd1_x32 --ssh app-cloud-workstation
Note: Since the Cloud Workstation app runs an Ubuntu 14.04 image, it is only compatible with ssd1 and ssd2 instances. It will not run if given mem*_hdd2_* instance types.

Operating system

The cloud workstation is set up to use Ubuntu 14.04. If you would like to use Ubuntu 12.04, you must run your own custom version of the cloud workstation.

Job execution environment vs. local environment

When connecting to the execution environment, you are using the job's credentials to interact with the DNAnexus API. The job has a limited subset of your user's permissions; by default, jobs running the cloud workstation app has VIEW permissions to all projects in which you have VIEW permissions or greater.

Please note that the dx select command by default hides projects to which you only have VIEW permissions, so you will want to run dx select --level=VIEW in the execution environment to see those projects.

See the Running Analyses API specifications and Execution Environment Reference for more information about the execution environment within DNAnexus workers.

Customizing the workstation

The provided cloud workstation app provides the minimum functionality for an interactive workstation. Below is the source code for the Cloud Workstation App.

To make your own version of the applet, you can use dx-app-wizard to set up a source code template for your applet. You can then use the following files as an example to build the dxapp.json and script for your applet. This section also links to some resources which may help you customize or build your own interactive workstation.

If you are unfamiliar with the DNAnexus app development process, you may find the
Intro to Building Apps tutorial useful.

dxapp.json

{ "name": "cloud_workstation", "title": "Cloud Workstation", "summary": "Sets up a cloud workstation which you can access by running with the --ssh or --allow-ssh flags ", "tags": [ "Debugging" ], "dxapi": "1.0.0", "inputSpec": [ { "help": "The maximum length of time to keep the workstation running. Value should include units of either s, m, h, d, w, M, y for seconds, minutes, hours, days, weeks, months, or years respectively.", "default": "1h", "label": "Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y)", "optional": false, "class": "string", "name": "max_session_length" }, { "optional": true, "label": "Files", "help": "An optional list of files to download to the cloud workstation on startup.", "name": "fids", "class": "array:file" } ], "outputSpec": [], "runSpec": { "execDepends": [ { "name": "ipython" } ], "systemRequirements": { "*": { "instanceType": "mem1_ssd1_x4" } }, "file": "src/code.py", "release": "14.04", "interpreter": "python2.7", "distribution": "Ubuntu" }, "access": { "project": "CONTRIBUTE", "allProjects": "VIEW", "network": ["*"] } }

src/code.py

#!/usr/bin/env python # cloud_workstation 1.0.0 import os import re import subprocess import dxpy def _run_cmd(cmd): print cmd subprocess.check_call(cmd, shell=True) @dxpy.entry_point('main') def main(**job_inputs): if 'fids' in job_inputs: for fid in job_inputs['fids']: cmd = 'dx download {0}'.format(fid['$dnanexus_link']) _run_cmd(cmd) seconds_to_sleep = dxpy.utils.normalize_timedelta(job_inputs['max_session_length'])/1000 cmd = 'sleep {0}'.format(seconds_to_sleep) _run_cmd(cmd) output = {} return output dxpy.run()

Some example customizations:

  • Specifying different inputs
  • Prepackage external utilities for use within the worker
  • Change the instance type of the worker
  • Change the access permissions

For more information on how to do the above, please reference these pages:

Last edited by Thanh-Ha Nguyen (tnguyen), 2016-03-02 20:01:24

 Feedback