It is our hope that our lessons learned will spare other engineers restless nights tossing and turning over infrastructure considerations in transitioning to a data business. Dow Jones DNA (Data, News, and Analytics) platform was the answer to evolving customer requests to leverage premium news for text mining, machine learning, and AI solutions.
This talk will detail our decisions made as we migrated our 30-year archive of big data to Google Cloud Storage and BigQuery. We will discuss the architecture trade-offs we faced in migrating 50 TB of historic data to the cloud as well as our ever-growing corpus ingesting 1.3 million articles daily. Dow Jones DNA required data migration, data processing ongoing at scale, and performance required in query responses. The DNA platform started with a team of two data engineers and grew to a team of five data engineers. Managed services were key in making the platform possible with a small team. We will detail the balance of real-world constraints such as small team size, performance versus cost, and the upper limits of quotas as our data ever expands.
Build with Google Cloud → https://bit.ly/2KaUXgA
Next ’19 Architecture Sessions here → https://bit.ly/Next19Architecture
Next ‘19 All Sessions playlist → https://bit.ly/Next19AllSessions
Subscribe to the GCP Channel → https://bit.ly/GCloudPlatform
Speaker(s): Patricia Walsh, Dylan Roy
Session ID: ARC204
Publisher: Google Cloud
You can watch this video also at the source.