Big Data Discount: How UC Santa Cruz Uses Mesos & Amazon EC2 Spot to Enable Low Cost Cancer Research

On this episode of This is My Architecture, Mary Goldman, Design and Outreach Engineer at the UC Santa Cruz Genomics Institute explains how they process genomic sequencing data on AWS. With a need to crunch data measured in petabytes, they designed a low cost solution using a combination of Docker containers and EC2 Spot instances. TOIL, the pipeline management system they built is open source (link: and recently published (link: in Nature Biotechnology.

Duration: 6:20
Publisher: Amazon Web Services
