Searching Data Objects

You can use the dx ls command to list the objects in your current project. You can learn which project and folder you are currently in by using the command dx pwd. Using glob patterns, you can broaden your search for objects by specifying filenames with wildcard characters such as * and ?. An asterisk (*) is used to represent zero or more characters in a string, and a question mark (?) represents exactly one character.

Searching Objects with Glob Patterns

Searching objects in your current folder

By listing objects in your current directory with the wildcard characters * and ?, you can search for objects with a filename using a glob pattern. Here we take folder "C. Elegans - Ce10/" in the public project "Reference Genome Files" (platform login required to access this link) and walk through these examples:

Print the current working directory

$ dx select "Reference Genome Files" $ dx cd "C. Elegans - Ce10/" $ dx pwd # Print current working directory Reference Genome Files:/C. Elegans - Ce10

List folders and/or objects in a folder

$ dx ls ce10.bt2-index.tar.gz ce10.bwa-index.tar.gz ce10.cw2-index.tar.gz ce10.fasta.fai ce10.fasta.gz ce10.hisat2-index.tar.gz ce10.star-index.tar.gz ce10.tmap-index.tar.gz

List objects named using a pattern

$ dx ls ‘*.fa*’ # List objects with filenames of the pattern "*.fa*" ce10.fasta.fai ce10.fasta.gz $ dx ls ce10.???-index.tar.gz # List objects with filenames of the pattern "ce10.???-index.tar.gz" ce10.cw2-index.tar.gz ce10.bt2-index.tar.gz ce10.bwa-index.tar.gz

Searching over all objects in the current project

If you wish to search the entire project with a filename pattern, you can utilize the command dx find data --name with the wildcard characters. Unless --path or --all-projects is specified, dx find data searches data under the current project. Below, we use the command dx find data in the public project "Reference Genome Files" (platform login required to access this link) using the --name option to specify the filename of objects that we're searching for.

$ dx find data --name "*.fa*.gz" closed 2014-10-09 09:50:51 776.72 MB /M. musculus - mm10/mm10.fasta.gz (file-BQbYQPj0Z05ZzPpb1xf000Xy) closed 2014-10-09 09:50:30 767.47 MB /M. musculus - mm9/mm9.fasta.gz (file-BQbYK6801fFJ9Fj30kf003PB) closed 2014-10-09 09:49:27 49.04 MB /D. melanogaster - Dm3/dm3.fasta.gz (file-BQbYVf80yf3J9Fj30kf00PPk) closed 2014-10-09 09:48:55 29.21 MB /C. Elegans - Ce10/ce10.fasta.gz (file-BQbY9Bj015pB7JJVX0vQ7vj5) closed 2014-10-08 13:52:26 818.96 MB /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.fa.gz (file-B6ZY7VG2J35Vfvpkj8y0KZ01) closed 2014-10-08 13:51:31 876.79 MB /H. Sapiens - hg19 (UCSC)/ucsc_hg19.fa.gz (file-B6qq93v2J35fB53gZ5G0007K) closed 2014-10-08 13:50:53 827.95 MB /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.fa.gz (file-B6ZYPQv2J35xX095VZyQBq2j) closed 2014-10-08 13:50:17 818.88 MB /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.fa.gz (file-BFBv6J80634gkvZ6z100VGpp) closed 2014-10-08 13:49:53 810.45 MB /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.fa.gz (file-B6ZXxfG2J35Vfvpkj8y0KXF5)

Escape special characters

As described above, if your file contains special characters in their filename, the special characters should be escaped when searching. Additionally, as a colon (:) is used to denote project names and a slash (/) is used to separate folder names on the platform, they are also special characters, so we will also need to escape these two characters when they appear in a data object's name. To escape any special characters, you will use a preceding backslash \. For more detailed discussion about the escaping of special characters, please refer to the Path Resolution page.

Searching Objects with Other Criteria

dx find data also allows you to search data using metadata fields, such as when the data was created, the data's tags, or the project the data exists in. For more information about metadata, please see the Manage Metadata page.

Searching objects created within a certain period of time

You can utilize the flags --created-after and --created-before to search for data objects created within a period of time.

$ dx find data --created-after 2017-02-22 --created-before 2017-02-25 closed 2017-02-27 19:14:51 3.90 GB /H. Sapiens - hg19 (UCSC)/ucsc_hg19.hisat2-index.tar.gz (file-F2pJvF80Vzx54f69K4J8K5xy) closed 2017-02-27 19:14:21 3.55 GB /M. musculus - mm10/mm10.hisat2-index.tar.gz (file-F2pJqk00Vq161bzq44Vjvpf5) closed 2017-02-27 19:13:57 3.51 GB /M. musculus - mm9/mm9.hisat2-index.tar.gz (file-F2pJpKj0G0JxZxBZ4KJq0Q6B) closed 2017-02-27 19:13:41 3.85 GB /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.hisat2-index.tar.gz (file-F2pJkp00BjBk99xz4Jk74V0y) closed 2017-02-27 19:13:28 3.85 GB /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.hisat2-index.tar.gz (file-F2pJpy007bGBzj7X446PzxJJ) closed 2017-02-27 19:13:02 3.90 GB /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.hisat2-index.tar.gz (file-F2pJpb000vFpzj7X446PzxF0) closed 2017-02-27 19:12:31 3.91 GB /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.hisat2-index.tar.gz (file-F2pK5y00F8Bp9BYk4KX7Qb4P) closed 2017-02-27 19:12:18 224.54 MB /D. melanogaster - Dm3/dm3.hisat2-index.tar.gz (file-F2pJP7j0QkbQ3ZqG269589pj) closed 2017-02-27 19:11:56 139.76 MB /C. Elegans - Ce10/ce10.hisat2-index.tar.gz (file-F2pJK300KKz8bx1126Ky5b3P)

Searching objects by their metadata

You can search for objects based on their metadata. An object's metadata can be set by performing the command dx tag or dx set_properties to respectively tag or setup key-value pairs to describe your data object. You can also set metadata while uploading data to the platform (for more information about uploading to the platform, please see the Upload and Download Files page. To search by object tags, use the option --tag. This option can be repeated if the search requires multiple tags.

$ dx find data --tag sampleABC --tag batch123 closed 2017-01-01 09:00:00 6.08 GB /Input/SRR504516_1.fastq.gz (file-xxxx) closed 2017-01-01 09:00:00 5.82 GB /Input/SRR504516_2.fastq.gz (file-wwww)

To search by object properties, use the option --property. This option can be repeated if the search requires multiple properties.

$ dx find data --property sequencing_providor=CRO_XYZ closed 2017-01-01 09:00:00 8.06 GB /Input/SRR504555_1.fastq.gz (file-qqqq) closed 2017-01-01 09:00:00 8.52 GB /Input/SRR504555_2.fastq.gz (file-rrrr)

Searching objects in another project

You can search for an object living in a different project than your current working project by specifying a project and folder path with the flag --path. Below, we specify the project ID (project-BQfgzV80bZ46kf6pBGy00J38) of the public project "Exome Analysis Demo" (platform login required to access this link) as an example.

$ dx find data --name "*.fastq.gz" --path project-BQfgzV80bZ46kf6pBGy00J38:/Input closed 2014-10-03 12:04:16 6.08 GB /Input/SRR504516_1.fastq.gz (file-B40jg7v8KfPy38kjz1vQ001y) closed 2014-10-03 12:04:16 5.82 GB /Input/SRR504516_2.fastq.gz (file-B40jgYG8KfPy38kjz1vQ0020)

Searching objects from all projects with VIEW and above permissions

If you would like to search for data objects live in all projects in which you have VIEW and above permissions, you can use the --all-projects flag. Public projects are not shown in this search.

$ dx find data --name "SRR*_1.fastq.gz" --all-projects closed 2017-01-01 09:00:00 6.08 GB /Exome Analysis Demo/Input/SRR504516_1.fastq.gz (project-xxxx:file-xxxx) closed 2017-07-01 10:00:00 343.58 MB /input/SRR064287_1.fastq.gz (project-yyyy:file-yyyy) closed 2017-01-01 09:00:00 6.08 GB /data/exome_analysis_demo/SRR504516_1.fastq.gz (project-zzzz:file-xxxx)

Last edited by Yih-Chii Hwang, 2017-12-05 22:38:58

 Feedback