AWS re:Invent 2017: Tensors for Large-scale Topic Modeling and Deep Learning (MCL337)

Tensors are higher order extensions of matrices that can incorporate multiple modalities and encode higher order relationships in data. This session will present recently developed tensor algorithms for topic modeling and deep learning with vastly improved performance over existing methods.

Topic models enable automated categorization of large document corpora, without requiring labeled data for training. They go beyond simple clustering since they allow for documents to have multiple topics. Tensor methods provide a fast and a guaranteed method for training these models. They incorporate co-occurrence statistics of triplets of words in documents. We are releasing a fast and a robust implementation that vastly outperform existing solutions while providing significantly faster training times and better topic quality. Moreover, training and inference are decoupled in our algorithm, so the user can select the relevant part based on their requirements. We will present benchmarks across multiple datasets of different sizes and AWS instance types, and provide notebook examples.

Duration: 57:36
Publisher: Amazon Web Services
You can watch this video also at the source.