Analyzing Genomic Data for Whole Populations: How AWS Enables Analysis of Large Cohorts

It is no surprise that rapid increases in genomic sequencing capacity coupled to lower price-per-gigabase has resulted in biomedical research and clinical institutions seeking to sequence their entire patient populations. Current algorithms and tools have fallen short of meeting the demand, often requiring ad hoc parallelization techniques that are only able to analyze tens samples at any given time. To scale to the needed cohort size of hundreds to thousands of genomic samples, new methods need to be developed that are scalable, highly parallel, accurate, and reproducible. AWS offers a unique set of scalable services that can enable development of such tools.

In this presentation, Peter White from Nationwide Children’s Hospital talk about development and optimization of their Churchill sequence analysis pipeline to analyze over 2500 whole genomes and exomes from the 1000 Genomes Project in just seven days.

You will also hear from Abhi Nallore at John Hopkins University about recent progress toward developing software tools (called Myrna and Rail) on top of Amazon Elastic Map Reduce (EMR) and EMRFS that can analyze many samples worth of mRNA sequencing data at a time, applying a uniform analysis method across all samples.


Angel Pizarro
Technical Business Development Manager, Amazon Web Services

Peter White
Director of the Biomedical Genomics Core and Molecular Bioinformatics, Assistant Professor of Pediatrics, The Research Institute at Nationwide Children’s Hospital and The Ohio State University

Abhinav Nellore
Johns Hopkins University

Duration: 37:57
Publisher: Amazon Web Services
You can watch this video also at the source.

Inxy Hosting CDN Marketplace