Describing Data Objects

You can describe objects (files, app(let)s, and workflows) on the DNAnexus platform using the command dx describe.

Describing an Object by Name

Objects can be described using their DNAnexus platform name via the command line interface (CLI) using a path. For additional information on using paths via the CLI, see the Path Resolution page.

Describe an object with a relative path

Objects can be described relative to the user's current directory on the DNAnexus platform. In the following example, we describe the indexed reference genome file human_g1k_v37.bwa-index.tar.gz.

File object we will describe from CLI.

$ dx describe "Original files/human_g1k_v37.bwa-index.tar.gz" Result 1: ID file-xxxx Class file Project project-xxxx Folder /Original files Name human_g1k_v37.bwa-index.tar.gz State closed Visibility visible Types - Properties - Tags - Outgoing links - Created ---- Created by Amy via the job job-xxxx Last modified ---- archivalState "live" Size 3.21 GB

Note: The entire path is enclosed in quotes due to the space in the folder name Original files. Instead of quotes, you can escape special characters with the `\` character:

$ dx describe Original\ files/human_g1k_v37.bwa-index.tar.gz

Describe an object in a different project using an absolute path

Objects can be described using an absolute path. This allows us to describe objects outside the current project context. In the following example, we dx select My Research Project and dx describe human_g1k_v37.fa.gz in the Reference Genome Files Project.

Reference Genome project file object we will describe from CLI.

$ dx select "My Research Project" $ dx describe Reference\ Genome\ Files:H.\ Sapiens\ -\ GRCh37\ -\ b37\ \(1000\ Genomes\ Phase\ I\)/human_g1k_v37.fa.gz Result 1: ID file-xxxx Class file Project project-xxxx Folder /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I) Name human_g1k_v37.fa.gz State closed Visibility visible Types - Properties - Tags - Outgoing links - Created ---- Created by Amy via the job job-xxxx Last modified ---- archivalState "live" Size 810.45 MB

Describe an Object Using Object ID

Objects can be described using a unique object ID. For more information about object IDs, see the Entity IDs page.

Workflow object we will describe from the CLI.

In this example, we describe the publicly available workflow object Exome Analysis Workflow using its ID.

$ dx describe workflow-BVgqYJQ0bZ474q9XQGgQZ3J5 # See the Manipulating output from dx describe page for a # description of the output for a workflow from dx describe.

Due to the amount of information contained in a workflow (including multiple component app(let)s, inputs/outputs, and default parameters), the dx describe output can seem overwhelming. The Manipulating Outputs section lists some suggestions for parsing dx describe output.

Manipulating Outputs

The output from a dx describe command can be used for various purposes. The optional argument --json will convert the output from dx describe into JSON format for advanced scripting and command line use.

In this example, we will describe the publicly available workflow object Exome Analysis Workflow and return the output in JSON format.

$ dx describe workflow-BVgqYJQ0bZ474q9XQGgQZ3J5 --json { "links": [ "file-BgXJ1qj0Qz3XPY0jgxYQb1kK" ], "inputSpec": [ { "group": "stage-BVgqYQQ0bZ41QJz3V1Jz3Kjj", "label": "Split BNX files", "name": "split_bnx_links", "class": "array:file" }, ... "id": "workflow-BVgqYJQ0bZ474q9XQGgQZ3J5", "title": "Exome Analysis Workflow", ... "stages": [{ "accessible": true, "executable": "app-bwa_mem_fastq_read_mapper/1.5.0", "name": null, "systemRequirements": {}, "executionPolicy": {}, "input": {}, "folder": null, "id": "stage-BVgqYQQ0bZ41QJz3V1Jz3Kjj" }, { "accessible": true, "executable": "app-fastqc/2.2.0", "name": null, } ] ... ] }

We can parse, process, and query the JSON output using jq (external link). Below, we process the dx describe --json output to generate a list of all applets involved in the aforementioned exome analysis pipeline.

$dx describe workflow-BVgqYJQ0bZ474q9XQGgQZ3J5 --json "stages": [{ "accessible": true, "executable": "app-bwa_mem_fastq_read_mapper/1.5.0", # Field we want to output ... "id": "stage-BVgqYQQ0bZ41QJz3V1Jz3Kjj" }, { "accessible": true, "executable": "app-fastqc/2.2.0", # Field we want to output "name": null, ... } ]

We want to output the "executable" value of each stage present in the "stages" value of our dx describe output, as seen above.

$dx describe workflow-BVgqYJQ0bZ474q9XQGgQZ3J5 --json | jq '.stages | map(.executable) | .[]' "app-bwa_mem_fastq_read_mapper/1.5.0" "app-fastqc/2.2.0" "app-vendor_human_exome_gatk_lite_pipeline/1.1.5" "app-vendor_human_exome_selection_metrics/1.0.0"

Above, the dx describe response sent to stdout is piped into a jq expression that returns the "executable" value for each stage in the "stages" section of the JSON.

General Response Fields Overview

Field name Objects Description
ID All Unique ID assigned to a DNAnexus object.
Class All DNAnexus object type.
Project All Container where the object is stored.
Folder All Objects inside a container (project) can be organized into folders. Objects can only exist in one path within a project.
Name All Object name on the platform.
State All Status of the object on the platform.
Visibility All Whether or not the file is visible to the user through the platform web interface.
Tags All Set of tags associated with an object. Tags are strings used to organize or annotate objects.
Properties All Key/value pairs attached to object.
Outgoing Links All JSON reference to another object on the platform. Linked objects will be copied along with the object if the object is cloned to another project.
Created All Date and time object was created.
Created by All DNAnexus user who created the object. Contains subfield “via the job” if the object was created as a result of an app or applet.
Last modified All Date and time the object was last modified.
Input Spec App(let)s and Workflows App(let) or workflow input names and classes. With workflows, the corresponding applet stage ID is also provided.
Output Spec App(let) and Workflows App(let) or workflow output names and classes. With workflows, the corresponding applet stage ID is also provided.

Last edited by Steve Osazuwa, 2017-03-06 21:58:10

 Feedback