DNAnexus provided the computation resources and implemented the analysis pipeline used to process these 3000 rice genomes.
DNAnexus is pleased to provide easy access to the genomes hosted on Amazon, and to provide access to the analytical tools and pipelines used to produce the results. These are found in a featured project on DNAnexus called "[3000 Rice Genomes]"(https://platform.dnanexus.com/projects/BfQyb8007q34bv8vQGKzF4bb/data/), which is available to any user on signing up to DNAnexus.
All of these can be copied into a DNAnexus project you create – please see the Quickstart for instructions on how to do this.
Reference Data Used:
The reference data used in the analyses are present in the References folder within the project. All 3000 strains were mapped against the rice reference genomes Nipponbare (IRGSP1.0), DJ123, Kasalath, and 9311. The references and BWA index used for each genome can be found within this folder.
The applets used in this analysis are present in the Applets folder. The analytical pipeline uses BWA-MEM to map reads to a reference genome. It then uses GATK to realign these reads and GATK to call variants. GATK is configured to output a call on every site in the genome. The tools used for this are present in the Applets folder. In addition, we have included a LumpyExpress applet as part of a demonstration on how to easily run data from Amazon’s repository on DNAnexus.
Fully configured pipelines for each of the four reference genomes are available in the Pipelines folder.
Using data from the Amazon public resources
Data hosted by Amazon for this project can be access quickly and without charge with our URL transfer app. You can give this app the S3 path for any hosted Amazon data and it will be pulled into DNAnexus.
In the Pipelines project, there are 2 different transfer pipelines for bringing in data. Transfer-Temporary must be connected to other apps in a workflow and will only bring in the data for the duration of processing, so you won’t be charged for storing the data.
Transfer-Copy pulls a copy of the data into DNAnexus as its own object. You can use this like any file object on DNAnexus, but will be charged for storage. You can delete this data at any time
To demonstrate this, we have created a pipeline Demonstrate-Transfer-Lumpy, which pulls in Nipponbare data from the B001 strain into DNAnexus and uses the structural variant caller Lumpy to call structural variants in the data without keeping a copy of the data. To change the URL, edit the workflow, click on the black box for URL Fetcher, and change it to the S3 URL you would like to pull in.
Batch Analyses on Data
All operations discussed can be launched or automated from our command line tools. Please see the CLI Quickstart and
Developer Portal for more information. Please also feel free to contact us at DNAnexus to discuss using the data.