Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Parent-child joins and tensors for content recommendation

Blog post from Vespa

Post Details
Company
Date Published
Author
Aaron Nagao
Word Count
1,190
Language
English
Hacker News Points
-
Summary

Verizon Media's use of Vespa, an open-source big data serving engine, exemplifies how parent-child joins and tensor functions can enhance content recommendation by modeling topic popularity. Each time a user visits Yahoo.com, the system selects the best news articles from a vast pool by leveraging the topic's click-through rate (CTR), a key feature that addresses the cold-start problem and simplifies the complexity of categorical topics into a manageable numerical format. Vespa structures these CTRs using a global document approach, where each article references a global document containing topic CTRs, streamlining updates and minimizing data duplication. The article ranking process involves real-time joins and the use of Vespa's Tensor API to compute features like average and maximum topic CTRs, which contribute to the efficiency of machine-learned ranking models. By co-locating the global document on content nodes, Vespa reduces network load and system complexity, achieving rapid ranking of 10,000 articles in just 17.5 milliseconds, showcasing its capability to handle large-scale content recommendations in real-time.