Projects and Data Objects Explained

The DNAnexus Platform has been designed not just to store files and run analyses in the cloud, but also to make it easy to collaborate on, annotate, and visualize your data. This page explains the details of how files are stored as objects, uniquely identified, and referenced on the platform.

Platform IDs

All entities on the DNAnexus Platform — be they projects, data objects, apps, or jobs — have unique IDs (e.g. project-BQbJpBj0bvygyQxgQ1800Jkk). These IDs must be used when trying to access or modify DNAnexus entities via our API. To make it easier to work with, the website and our client libraries and tools also provide the capability to refer to many DNAnexus entities such as projects and data objects by name. One of the key advantages of this system is that there is always a fixed machine-readable ID by which you can refer to something, even if its name changes.

Projects

You can collaborate with other users by making a project, adding data to the project, and sharing it with other users.

When you add data by copying (or "cloning") it from a different project, the underlying data objects actually keep the same IDs. To refer to a particular copy of a data object, you have to specify both the project and the data object.

Members of the other project are free to rename their copy and annotate it with different tags and properties. However, the underlying data is immutable so that all viewers see the same data.

Keeping the same object ID across projects has a number of advantages. It makes it easy for us to charge you only once for the same data you might have in multiple projects, and it makes it easy for you to tell when two data objects are actually the same even if they have different names in different projects.

Note: Some metadata is locked down and can never be changed once the data object is closed. These fields are considered an integral part of the data object itself. This is discussed more in the next section.

Data Objects

When you upload a file to the platform, it does more than just save the file to disk. It also creates a DNAnexus entity with a unique ID that can be annotated with additional metadata such as a name, the fact that it is a data object of class file, any tags or properties you might have specified, and so on.

Data Object Lifecycle

In general, a data object starts out in the open state and cannot be used on the platform until it is finalized, or closed. Below is a summary of what can be done to or accessed from a data object before and after closing it.

Action Before closing Once closed
access underlying data write-only read-only
copy to another project or use as job input
modify name, tags, or properties
modify types, details, or visibility

A key side effect of this model is that you should be able to specify any copy of a particular data object (as long as they have the same data object ID) as input to an analysis, and expect the same results every time.

Last edited by Thanh-Ha Nguyen, 2016-01-31 23:12:48

 Feedback