Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
- Suprovat Ghoshal ,
- Aadirupa Saha
AISTATS 2022 |
We introduce the Correlated Preference Bandits problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of \(n\) items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g., low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank’ choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of Block-Rank based RUM model, where the best item is shown to be \((\epsilon ,\delta )\)-PAC learnable with only \(O(r{\epsilon }_{-2}log(n/\delta ))\) samples. This improves on the standard sample complexity bound of \(O_~(n{\epsilon }_{-2}log(1/\delta ))\) known for the usual learning algorithms which might not exploit the item-correlations (\(r\ll n\)). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Surprisingly, we also show a lower bound of \(\Omega (n{\epsilon }_{-2}log(1/\delta ))\) when the learner is forced to play just duels instead of larger subsetwise queries. Further, we extend the results to a more general `noisy Block-Rank‘ model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences \((k=2)\), we show the latter provably fails to exploit correlation.