dunnhumby uses Dataproc as a data platform on which our data scientist and product teams run ETL and machine learning routines. We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with. Come and learn how we do it.
Next ’19 Data Analytics Sessions here → https://bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → https://bit.ly/Next19AllSessions
Subscribe to the GCP Channel → https://bit.ly/GCloudPlatform
Speaker(s): Jamie Thomson
Session ID: DA210
Publisher: Google Cloud
You can watch this video also at the source.