DNAnexus Thrift Server

DNAnexus Spark Thrift server is a service that allows JDBC and ODBC clients to run Spark SQL queries. These queries are run against databases created on the DNAnexus platform. Thrift server is a high availability Apache Spark cluster integrated with the platform. It leverages the security, permissions and sharing features built into DNAnexus.

Connecting to Thrift Server

In order to connect to thrift server we need these two bits of information

  1. JDBC url
    jdbc:hive2://THRIFT-SERVER:10000/;ssl=true
    
    These are the different thrift servers for various regions
    • us-east-1 : query.us-east-1.translational.dnanexus.com
    • us-west-1 : Not Available
  2. username

    We support two formats of username

    • TOKEN : Used when the user wants to just run queries and not for creating databases
    • TOKEN__PROJECTID : Used when the user wants run queries as well as creating databases

    TOKEN is DNAnexus user generated token and PROJECTID is DNAnexus project id

Generate token for DNAnexus account

  1. Navigate to https://platform.dnanexus.com and login using your username and password.
  2. Click on your username in the upper right corner and choose Profile > API Tokens
  3. Generate a token, select the token scope and give it an expiry date in the future. Copy it to a safe place.

Getting the ProjectId

  1. Navigate to https://platform.dnanexus.com and login using your username and password.
  2. Go to Projects -> your project -> Settings -> ProjectID and click on Copy to Clipboard.

Using Beeline

Beeline is a JDBC client bundled with Apache Spark and can be used to run queries on the command line.

Installing Apache Spark

Download Apache Spark from link

$ tar -zxvf spark-2.2.0-bin-hadoop2.7.tar
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). You need to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

Running Beeline

The beeline client is located under $SPARK_HOME/bin/

$ cd spark-2.2.0-bin-hadoop2.7/bin
$ ./beeline
Connect to beeline using the jdbc url
beeline> !connect jdbc:hive2://query.us-east-1.translational.dnanexus.com:10000/;ssl=true
Enter username of format TOKEN or TOKEN__PROJECTID. There is no password, so just enter return.
Enter username: username <format TOKEN or TOKEN__PROJECTID>
Enter password: <empty - press RETURN>

Once successfully connected you should see the message

Connected to: Spark SQL (version 2.2.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ

Now you are connected to the thrift server using your credentials and will be able to see all databases you have access to.

0: jdbc:hive2://query.us-east-1.translati> show databases;
+---------------------------------------------------------+--+
|                      databaseName                       |
+---------------------------------------------------------+--+
| database_fj7q18009xxzzzx0gjfk6vfz__genomics_180718_01   |
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01  |
| database_fj96qx00v10vj50j0gyfv00z__af_result2           |
| database_fjf3y28066y5jxj2b0gz4g85__metabric_data        |
| database_fjj1jkj0v10p8pvx78vkkpz3__pchr1_test           |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1           |
+---------------------------------------------------------+--+

You can use the whole database name database_fjf3y28066y5jxj2b0gz4g85__metabric_data or just the suffix metabric_data. If there are more than one database with suffix metabric_data then you will need to use the whole database name database_fjf3y28066y5jxj2b0gz4g85__metabric_data

0: jdbc:hive2://query.us-east-1.translati> use metabric_data;

Now you can run SQL queries.

0: jdbc:hive2://query.us-east-1.translati> select * from cna limit 10;
+--------------+-----------------+------------+--------+--+
| hugo_symbol  | entrez_gene_id  | sample_id  | value  |
+--------------+-----------------+------------+--------+--+
| MIR3675      | NULL            | MB-6179    | -1     |
| MIR3675      | NULL            | MB-6181    | 0      |
| MIR3675      | NULL            | MB-6182    | 0      |
| MIR3675      | NULL            | MB-6183    | 0      |
| MIR3675      | NULL            | MB-6184    | 0      |
| MIR3675      | NULL            | MB-6185    | -1     |
| MIR3675      | NULL            | MB-6187    | 0      |
| MIR3675      | NULL            | MB-6188    | 0      |
| MIR3675      | NULL            | MB-6189    | 0      |
| MIR3675      | NULL            | MB-6190    | 0      |
+--------------+-----------------+------------+--------+--+

Last edited by Elena Duranova, 2018-10-24 22:03:29

 Feedback