AWS re:Invent 2014 | (BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS & On-demand Clusters

Bit Ninja


Not only did the 156,000+ core run (nicknamed the MegaRun) on Amazon EC2 break industry records for size, scale, and power, but it also delivered real-world results.

The University of Southern California ran the high-performance computing job in the cloud to evaluate over 220,000 compounds and build a better organic solar cell. In this session, USC provides an update on the six promising compounds that we have found and is now synthesizing in laboratories for a clean energy project. We discuss the implementation of and lessons learned in running a cluster in eight AWS regions worldwide, with highlights from Cycle Computing’s project Jupiter, a low-overhead cloud scheduler and workload manager. This session also looks at how the MegaRun was financially achievable using the Amazon EC2 Spot Instance market, including an in-depth discussion on leveraging Spot Instances, a strategy to deal with the variability of Spot pricing, and a template to avoid compromising workflow integrity, security, or management.

After a year of production workloads on AWS, HGST, a Western Digital Company, has zeroed in on understanding how to create on-demand clusters to maximize value on AWS. HGST will outline the company’s successes in addressing the company’s changes in operations, culture, and behavior to this new vision of on-demand clusters. In addition, the session will provide insights into leveraging Amazon EC2 Spot Instances to reduce costs and maximize value, while maintaining the needed flexibility, and agility that AWS is known for.


Duration: 40:58
Publisher: Amazon Web Services
You can watch this video also at the source.