Transforming to a Data Business – Dow Jones DNA (Cloud Next '19)


It is our hope that our lessons learned will spare other engineers restless nights tossing and turning over infrastructure considerations in transitioning to a data business. Dow Jones DNA (Data, News, and Analytics) platform was the answer to evolving customer requests to leverage premium news for text mining, machine learning, and AI solutions.

This talk will detail our decisions made as we migrated our 30-year archive of big data to Google Cloud Storage and BigQuery. We will discuss the architecture trade-offs we faced in migrating 50 TB of historic data to the cloud as well as our ever-growing corpus ingesting 1.3 million articles daily. Dow Jones DNA required data migration, data processing ongoing at scale, and performance required in query responses. The DNA platform started with a team of two data engineers and grew to a team of five data engineers. Managed services were key in making the platform possible with a small team. We will detail the balance of real-world constraints such as small team size, performance versus cost, and the upper limits of quotas as our data ever expands.

Build with Google Cloud → https://bit.ly/2KaUXgA

Watch more:
Next ’19 Architecture Sessions here → https://bit.ly/Next19Architecture
Next ‘19 All Sessions playlist → https://bit.ly/Next19AllSessions

Subscribe to the GCP Channel → https://bit.ly/GCloudPlatform

Speaker(s): Patricia Walsh, Dylan Roy

Session ID: ARC204


Duration: 23:57
Publisher: Google Cloud
You can watch this video also at the source.


Furlow consulting