Democratizing Dataproc (Cloud Next '19)


dunnhumby uses Dataproc as a data platform on which our data scientist and product teams run ETL and machine learning routines. We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with. Come and learn how we do it.

Watch more:
Next ’19 Data Analytics Sessions here → https://bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → https://bit.ly/Next19AllSessions

Subscribe to the GCP Channel → https://bit.ly/GCloudPlatform

Speaker(s): Jamie Thomson

Session ID: DA210
product:Cloud Dataproc;


Duration: 40:44
Publisher: Google Cloud
You can watch this video also at the source.


Join Us