Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

change my mind: bayes in practice is just a way to regularize your model and the language of bayes makes it seem principled but really you could use literally any regularizer and it would work almost just as well. i believe this because ultimately you're always going to minimize loglikelihood anyway (and so the prior becomes the regularization term).


Counterpoint: regularization is just a way of specifying a Bayesian prior for maximum a posteriori estimation.


but what value is that perspective? how do i use this to actually fit a model?


Well you could draw samples of the parameters with a Bayesian setup via MCMC or get a distribution over them via a variational approximation, rather than getting some sort of maximum likelihood (MAP whatever) value for the parameters of the model via solving an optimization problem. This seems much more general (and practically useful). So I think it is the other way around regularizers are just priors (that you arrived at somehow).


A bottling company is interested in determining the accuracy with which their equipment is filling bottles of water. One answer would be "95% percent of the bottles contain between 11.9 and 12.1 ounces". A different way of answering the question would be to estimate the actual distribution of water amounts.

The difference here, is that knowing a distribution is often more useful than just knowing the mean, or the variance, or some confidence intervals. Bayesian methods tend to be useful when you want this sort of information which is often the case when you are using it for decision making (or something like game theory).

Another uses case is when you are making decisions requiring multiple pieces of information that don't neatly fit together. A simple example is cancer screening. A rational decision about the proper threshold requires you to combine information about (1) The accuracy of your test, (2) The prevalence of the cancer in the population.

I will also add that the formula presented in the article is the simple case with discrete distributions. The more general version of the formula can also handle continuous distributions.


lol is this copypasta? i'm quite familiar with all of these toy examples of inference instead of point estimation. i'm talking about fitting models rather than descriptive statistics (or decision theory).


Commonly a model is being used primary to make better decisions. Specifically in the context of fitting models, Bayesian methods are really popular for hyperparameter tuning.

I guess my main point is that at least one reason people are using Bayesian methods is because they are dealing with problems that are qualitatively different than more prototypical prediction problems.


You cannot use any prior, let alone literally any regularizer, and say it would work almost just as well.

A standard normal prior centered at 0 and one centered at 42 can give very different results.


i said almost - that's code for "obviously i'm not talking about pathological regularizers"


Well, in that case minimizing the (negative) loglikelihood seems principled but you could minimize literally any loss function and it would work almost just as well.


Lol agreed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: