On this episode of This is My Architecture, Mary Goldman, Design and Outreach Engineer at the UC Santa Cruz Genomics Institute explains how they process genomic sequencing data on AWS. With a need to crunch data measured in petabytes, they designed a low cost solution using a combination of Docker containers and EC2 Spot instances. TOIL, the pipeline management system they built is open source (link: https://github.com/BD2KGenomics/toil) and recently published (link: http://dx.doi.org/10.1038/nbt.3772) in Nature Biotechnology.
Learn more about This Is My Architecture at – http://amzn.to/2qfaOQc.
Publisher: Amazon Web Services
You can watch this video also at the source.