Company
Date Published
Author
Giulia Carella
Word count
1617
Language
English
Hacker News points
None

Summary

The text discusses building a recommendation system for site planning, which involves combining data from different sources to derive insights at specific locations and comparing geographic locations to find similar characteristics. The authors developed a method to compute a similarity score with respect to an existing site for target locations, taking into account various data points measured at that location, including traditional sources like the census and modern sources like mobile financial transactions and point of interest (POI) data. The similarity score is calculated by computing the distance between each variable of interest at these locations, resulting in a lower-dimensional representation of the data. To address issues with missing data, the authors used a probabilistic approach to principal component analysis (PPCA), which iteratively updates the expected complete data log-likelihood and maximum likelihood estimates of the parameters. The PPCA method is faster than traditional PCA because it does not require computation of the eigen-decomposition of the data covariance matrix. The similarity score is then used to select target locations with the smallest distances as the best candidates for opening or relocating offices, taking into account uncertainty in the computation of the distance for each target location.