Website Quickstart

Welcome to the DNAnexus Platform, the complete cloud-based solution for next-generation sequence analysis and visualization. The purpose of this guide is to help you get started quickly. You will find instructions for uploading data, running analyses, and building your own customized analysis workflows.

Step 1: Make your first project

Before uploading or using any data, you will need to create a project. Each piece of data in the Platform lives inside a project. You can create a project by clicking on the  Add Project button on the home page after logging in.

Add data

Click on the  Add Data button within your newly-created project.

This opens a dialog box, which gives you three options for uploading your data:

  1. Your Computer Use your web browser to upload data from your computer. You will have to stay logged in and keep the browser open until the upload has completed.
  2. Server Specify an URL of a globally-accessible server from which the file will be uploaded to the platform.
  3. Other Project Copy data from another project on the platform. Some data is provided free of charge by DNAnexus.
Note: If you are uploading very large files to the platform, the Upload Agent, available as a command line tool, is the fastest and most reliable way to transfer large data files to the platform.

Example: Importing data into a project from another DNAnexus project

  1. Click on Other Project (1).
  2. Select Demo Data (2)
  3. Select the Quickstart folder (3). This folder contains 2 files with the paired-end sequencing reads from chromosome 20 of exome SRR100022 (in fq.gz format) from the 1000 Genomes project. The full exome data is available in the SRR100022 folder, but for the sake of this demo, we will use the smaller dataset.
  4. Select all of the data in the Quickstart folder by checking the box next to the "Name" header (4).
  5. Finally, click the Add button to add the data to your project.

Click the screenshot below to zoom in.

Click the screenshot below to zoom in.

TIP: For more information about adding data to your DNAnexus project, see the Add Data page.

Step 2: Share your project with collaborators

You can collaborate within the platform by sharing projects with other DNAnexus users at various access levels. To share a project with a collaborator, click Share in the upper right corner of the project page.

The resulting dialog box shows the current list of project members and lets you add a new member as follows:

  1. Type the username or the email address of an existing DNAnexus user (1).
  2. Pick an access level (2).
  3. Click Add Member.
  4. Repeat the above steps to add more users at the same time.

Step 3: Run an analysis

You can analyze the Quickstart data we imported earlier by running DNAnexus-provided apps. Click on the Apps link in the upper right-hand corner to see a list of what is available.

Install apps

On the Apps page, you will see a list of apps, each with a grey Install button. Installing an app "bookmarks" the app for your later reference and makes it easy to add that app to workflows. Let's install the following two apps:

  • BWA-MEM FASTQ Read Mapper
  • Vendor Human Exome GATK-Lite Pipeline

Click the screenshot below to zoom in.

Run the apps

Apps and workflows are run from a particular project and will put any output objects in the same project. Let's return to your project page to run the apps you've just installed.

  1. Click the Projects link in the upper right-hand corner.
  2. Click on the card of your project.

Click the screenshot below to zoom in.

Back in the Manage view for a project, you can run an app by clicking on one of the two buttons:

  • Run This will let you pick a single app to run, or to create a single-use disposable workflow.
  • New Workflow This will create a new workflow capable of running multiple stages of apps. This allows you to string together multiple analysis steps that depend on each other.

Build a workflow

Let's build a workflow using the two apps we've installed.

  • Click on the large green New Workflow button. Click the screenshot below to zoom in.

  • Click the Add a Step button in the view that opens.

  • Click on BWA-MEM FASTQ Read Mapper and Vendor Human Exome GATK-Lite Pipeline, in that order. (See (3) in the screenshot below.)

    Note: the workflow progresses from top to bottom. If you added apps in the wrong order, you can click and drag on steps in the workflow to change their order.
  • Let's set the inputs for the BWA-MEM FASTQ Read Mapper step.

  • Click on the box for Reads (*.fq.gz). Add the SRR100022_20_1.fq.gz file. Note that the workflow only allows you to add files which match the file extensions specified by the input. (4)

  • Since the SRR100022 exome was sequenced using paired-end sequencing, we will need to provide the right-mates of the first set of reads. Click on the box for Reads (right mates) (*.fq.gz). Add the SRR100022_20_2.fq.gz file. If you are using your own data and your data is from a single-end sequencing experiment, this input is optional.

  • Click on the box for BWA reference genome index (5). Note that the data selector that pops up has a box labeled Suggestions (6). Select the Reference Genome Files from the Suggestions and navigate to the folder named H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I) (7). Select the human_g1k_v37.bwa-index.tar.gz file (8).

  • Note that the BWA-MEM FASTQ Read Mapper app only takes as input a file with the extension *.bwa-index.tar.gz which is a TAR archive file containing all the sequence index files as previously output by the BWA indexer. (Indexing is an one-time operation that needs to be performed to a reference genome sequence in order for it to be usable by BWA).

TIP: The data selector for app inputs can contain a Suggestions section (6). For inputs which may use public reference data (i.e. reference genomes, indices for DNAnexus-provided read mappers, gene annotations), DNAnexus-provided apps often provide a path to a folder containing this data for your convenience.
TIP: You can find more information about the app and/or configure the BWA-MEM FASTQ Read Mapper app further by clicking on the black box with the icon for the app, which will bring up a dialog box of parameters (9). In this case, the parameters correspond to options that you could supply if you were to run the BWA-MEM program locally.

Click the screenshot below to zoom in.

Click the screenshot below to zoom in.

Click the screenshot below to zoom in.

  • Now let's set the input for the Vendor Human Exome GATK-Lite Pipeline step.
  • Drag the output of the BWA-MEM FASTQ Read Mapper app to the Sorted Mappings (*.bam) input of the Vendor Human Exome GATK-Lite Pipeline app to indicate that the two apps should be connected in series (10). You can click the screenshot below to zoom in.
  • The Vendor Human Exome GATK-Lite Pipeline app has an additional required configuration to specify which vendor exome kit was used to sequence the reads. You will not be able to run the analysis without setting this configuration. Click on the black box with the icon for the app, which will bring up a dialog box of parameters. Note that the field Vendor Exome is in bold with a * on the right side, indicating a required input (11). The SRR100022 exome originally used the Agilent SureSelect Human All Exon V2 kit, which is an option in the dropdown menu (agilent_sureselect_human_all_exon_v2). You can click the screenshot below to zoom in.
  • Press the Save button to save your configurations and close the dialogue box.

  • You're now ready to run the workflow: click Start Analysis! This will launch both stages of the workflow. In this example, the BWA-MEM FASTQ Read Mapper app will start executing immediately, and the Vendor Human Exome GATK-Lite Pipeline app will start after the first job has finished.

TIP: For more information about running workflows, see the Workflows page.

Monitor your jobs

After starting the job, you can monitor its progress by selecting the Monitor tab. This page lists all the jobs that have been launched in your project. For more information, please see the Jobs and Job Details pages.

When the workflow completes, the outputs will be deposited into a new folder in your project with the same name as your workflow. (You can also select a different folder for the outputs to go into by clicking on the app in the workflow and setting the "Output folder" field.)

If you wish, you can run this workflow on the full SRR100022 exome, available in the SRR100022 folder in the Demo Data project. It will take a proportionately longer amount of time to map the reads and call variants using the same workflow on this larger data set.

What's next

See the list of "Other Guides" at the top of this page for more walkthroughs of things you can do with the DNAnexus Platform, or start exploring the site and the wiki!

Last edited by pmurray, 2017-06-29 21:00:14