And, for not much more effort (computing the variance of your samples), you can ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nestorD on May 27, 2019 \| parent \| context \| favorite \| on: The multi-armed bandit problem (2012) And, for not much more effort (computing the variance of your samples), you can use UCB1-tuned [0] which gets rid of the 'c' parameter and tends to be even better. I personnaly think that it should replace UCB1 as a baseline when trying bandit algorithms. [0]: https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf

j2kun on June 10, 2019 | [–]

It's funny, I had read that paper a few times while learning about bandit learning, and I never noticed their version, which funnily enough outperforms vanilla UCB1 in all of their tests!

60654 on May 27, 2019 | [–]

That's a nice paper, thanks for posting this!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact