The GWAS app calculates genome-wide association statistics for select cohorts and phenotypes and presents the results as Spark SQL databases. The GWAS app is a database-to-database app--all inputs and outputs are databases. Based on the chosen cohorts and phenotypes, the relevant data points are extracted from source databases and used to calculate genome-wide associatation statistics, which are loaded into target databases that can be queried using Spark SQL.

How to run the GWAS app

Before running GWAS app, you need to figure out the following things

  • Cohort inputs
    • What type of GWAS analysis are you interested in?
    • Case-control analysis? Have you created your case and control cohorts? Make sure the source database has the complete genomics data that you want to include in the GWAS analysis.
    • Categorical trait analysis? You can treat this like a case-control analysis, where the case cohort consists of samples with the categorical trait of interest.
    • Quantitative trait analysis? Have you created your case cohort of interest? There are no control cohorts here.
  • Phenotype options
    • If you are interested in a quantitative trait analysis, then what is your phenotype of interest? Make sure the source database has the complete phenotypic data that you want to include in the GWAS analysis.
  • Target database
  • Advanced options
    • PLINK options

How to run the GWAS app


  • cohort -> cohort record containing the genotypes and phenotypes to analyze.
  • control_cohort [optional] -> cohort record acting as the control.
  • plink_snp_only - default false -- if true, only convert SNPs to PLINK format, skipping indels and complex variants.
  • out_db_name - string -- GWAS results database name, e.g. "assoc_vitamind_1"

Other Options:

  • pheno_key -> string -- ID or unique name of a phenotype in the cohort's database, e.g. "X100021.Vitamin.D"
  • pheno_value -> string -- phenotype value to treat as 'case', otherwise 'control'.
  • plink_sel_args -> string -- PLINK marker selection arguments.


  • output_files -> array:file -- extracted genotypes, markers, and phenotypes in PLINK binary format.
  • gwas_database -> string -- GWAS association test results database.

Basic Run

dx run tis-gwas \
  -i cohort=record-xxxx \
  -i control_cohort=record-yyyy \
  -i out_db_name="assoc_vitamind_1"

Last edited by Peter Murray, 2019-02-07 20:21:42