BuzzFeed's large social, search, and organic traffic footprint can be attributed to both its content and content curation strategies. Recirculation on the website is powered through feeds optimized for metrics such as CTR, recency, and user preference.
Surprisingly, one avenue of recirculation not explored until recently on the website was a way to identify and display related content. It was hypothesized that surfacing related content would result in an increase in both pageviews per viewing session and improved SEO performance.
This talk will detail how word and sentence embedding models were used to vectorize BuzzFeed's content, how the model was validated and brought to production using AWS EMR and NSQ, and the impact the new unit has had on the website.
Carolyn Huangci is a data scientist at BuzzFeed, a media and tech company. She works on the Network Growth team focused on growing content views across BuzzFeed destinations. Prior to BuzzFeed she worked at Uber. She received her Bachelors in Applied Mathematics from University of California Berkeley.