The DNAnexus Platform has been designed not just to store files
and run analyses in the cloud, but also to make it easy to collaborate
on, annotate, and visualize your data. This page explains the details
of how files are stored as objects, uniquely identified, and
referenced on the platform.
All entities on the DNAnexus Platform — be they projects, data
objects, apps, or jobs — have unique IDs (e.g.
project-BQbJpBj0bvygyQxgQ1800Jkk). These IDs must be used
when trying to access or modify DNAnexus entities via our API. To make
it easier to work with, the website and our client libraries and tools
also provide the capability to refer to many DNAnexus entities such as
projects and data objects by name. One of the key advantages of this
system is that there is always a fixed machine-readable ID by which
you can refer to something, even if its name changes.
You can collaborate with other users by making a
project, adding data to the
project, and sharing it with other users.
When you add data by copying (or "cloning") it from a different
project, the underlying data objects actually keep the same IDs.
To refer to a particular copy of a data object, you have to specify
both the project and the data object.
Members of the other project are free to rename their copy and
annotate it with different tags and properties. However, the
underlying data is immutable so that all viewers see the same
Keeping the same object ID across projects has a number of advantages.
It makes it easy for us to charge you only once for the same data you
might have in multiple projects, and it makes it easy for you to tell
when two data objects are actually the same even if they have
different names in different projects.
Note: Some metadata is locked down and can never be changed
once the data object is closed. These fields are considered an
integral part of the data object itself. This is discussed more in
the next section.
When you upload a file to the platform, it does more than just save
the file to disk. It also creates a DNAnexus entity with a unique ID
that can be annotated with additional metadata such as a name, the
fact that it is a data object of class file, any tags or
properties you might have specified, and so on.
Data Object Lifecycle
In general, a data object starts out in the open state and cannot
be used on the platform until it is finalized, or closed. Below
is a summary of what can be done to or accessed from a data object
before and after closing it.
|access underlying data
|copy to another project or use as job input
|modify name, tags, or properties
|modify types, details, or visibility
A key side effect of this model is that you should be able to specify
any copy of a particular data object (as long as they have the same
data object ID) as input to an analysis, and expect the same results