Apollo through command-line

On this page, we walk you through the dx commands which will help you to:

Find Databases

You can find all accessible databases by using the findDataObjects API method:

dx api system findDataObjects '{"class": "database"}'

You can search all accessible databases by Spark database name by using the findDatabases API method:

dx api system findDatabases '{"databaseName": "genomics"}'

You can use a regular expression for your search:

dx api system findDatabases '{"uniqueDatabaseName": {"regexp": "__genomics$", "flags": "i"}}'

You can use a wildcard pattern for your search:

dx api system findDatabases '{"uniqueDatabaseName": {"glob": "*genomics"}}'

Manage Databases

You can find more information about database object including databaseName and uniqueDatabaseName representing the Spark database name by using the describe command with:

  • database object name

    dx describe genomics
  • database object id

    dx describe database-xxxx

You can add one or multiple tags to a database object in a project context by using the tag command:

dx tag ./genomics tag1 tag2 tag3

You can remove specified tags from a database object in a project context by using the untag command:

dx untag ./genomics tag2 tag3

You can relocate a database object from current project to a new project by using the relocate API method:

dx api database-xxxx relocate '{"project": "project-xxxx"}'

You can choose a destination folder and whether the destination folder and/or parent folders should be created if they do not exist:

dx api database-xxxx relocate '{"project": "project-xxxx", "destination": "/databases", "parents": true}'

Find Cohorts

You can get a list of all cohorts in the current project by using the find data command:

dx find data --class record --type DatabaseQuery

You can find cohorts across all projects by adding --all-projects flag. Please refer to the help for more details about all search parameters:

dx find data --help

Manage Cohorts

You can create a cohort (DatabaseQuery record) by providing the following references to the new record command:

dx new record my_record

You can attach types to a record by using the add_types command:

dx add_types my_record DatabaseQuery some_type

DatabaseQuery type defines my_record as a cohort.

You can remove types from a record object by using the remove_types command:

dx remove_types my_record some_type

You can set or change object details by using the set_details command:

dx set_details my_record '{"database": "database-xxxx",
                           "databaseExplorer": "record-xxxx",
                           "dashboard": "phenotypequery",
                           "filters": {"Sex": "M"}}'

You can retrieve details set on a record in JSON format by using the get_details command:

dx get_details my_record

You can rename a record object in a project context by using the rename command:

dx rename ./my_record my_cohort

You can use the close command to make the cohort immutable and prevent any further changes to its details or types. The close operation is irreversible, i.e. the database cannot be open after closing.

dx close male_cohort

Closed record object can be copied to a different project by using the cp command:

dx cp ./my_cohort project-yyyy:/Cohorts/new_cohort

You need to have access to the referenced database to access data defined by SQL query stored in details of the cohort.

You can create a cohort in multiple steps as described above or as a single command, shown on an example for a complex cohort:

dx new record --type DatabaseQuery --tag tag1 --tag tag2 --close \
              --details '{"database": "database-xxxx",
                          "databaseExplorer": "record-xxxx",
                          "dashboard": "phenotypequery",
                          "baseSql": "SELECT * FROM genomics.sample"}' \

Run Apps

You can find Spark apps by supplying the dedicated "Translational Informatics" category in the find apps command:

dx find apps --category "Translational Informatics"

If you would like to search only for Spark apps suitable for analysis of data, use the additional category "Apollo Analysis":

dx find apps --category "Translational Informatics" --category "Apollo Analysis"

You will need to install the app before running it by using the install command:

dx install allele_frequency_calculator

After the app is installed, you can run it using the run command:

dx run allele_frequency_calculator

If you don't specify any arguments, you will enter interactive mode for input selection.

You can monitor the created job using the watch command:

dx watch job-xxxx

Build Apps

You can create your own Spark app using the general app build process with additional cluster configuration required in the 'systemRequirements' section of the dxapp.json file:

"systemRequirements": {
  "*": {
    "instanceType": "mem1_ssd1_x4",
    "clusterSpec": {
      "type": "spark",
      "version": "2.4.0",
      "initialInstanceCount": 2

Please refer to the Developing Spark Apps page for more details.

Last edited by Peter Murray, 2019-01-22 20:00:06