Hearst Corporation monitors trending content on 250+ sites worldwide, providing metrics to editors and promoting cross-platform content sharing. To facilitate this, Hearst built a clickstream analytics platform on AWS that transmits and processes over 30 TB of data a day using AWS resources such as AWS Elastic Beanstalk, Amazon Kinesis, Spark on Amazon EMR, Amazon S3, Amazon Redshift, and Amazon Elasticsearch. In this session, learn how Hearst designed their clickstream analytics application and how you can use the same architecture to build your own and be ready to handle the changing world of clickstream data. Dive into how to do Spark streaming from an Amazon Kinesis stream, use timestamps to cleanse and validate data coming from diverse sources, and see how the system has evolved as data types have change from HTTP GET to RESTful JSON requests. Finally, see how Hearst’s data scientists interact with and use cleansed data provided by the platform to perform ad hoc analyses, develop home-grown algorithms, and create visualizations and dashboards that support Hearst business stakeholders.
Publisher: Amazon Web Services
You can watch this video also at the source.