Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And, for not much more effort (computing the variance of your samples), you can use UCB1-tuned [0] which gets rid of the 'c' parameter and tends to be even better.

I personnaly think that it should replace UCB1 as a baseline when trying bandit algorithms.

[0]: https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf



It's funny, I had read that paper a few times while learning about bandit learning, and I never noticed their version, which funnily enough outperforms vanilla UCB1 in all of their tests!


That's a nice paper, thanks for posting this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: