And, for not much more effort (computing the variance of your samples), you can use UCB1-tuned [0] which gets rid of the 'c' parameter and tends to be even better.
I personnaly think that it should replace UCB1 as a baseline when trying bandit algorithms.
It's funny, I had read that paper a few times while learning about bandit learning, and I never noticed their version, which funnily enough outperforms vanilla UCB1 in all of their tests!
I personnaly think that it should replace UCB1 as a baseline when trying bandit algorithms.
[0]: https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf