As someone who took statistics 30 years ago and promptly forgot most of it, I fo...

dxbydt · on Oct 25, 2022

Look, if you want to draw from a unit circle, you could enclose the unit circle inside a square. If the center of the unit circle is the origin, it should be clear from some middle school geometry that such a square has upper left cartesian coordinates at (-1,1), bottom right at (1,-1). So its a square of side two. Now you can sample from this square by calling the rand() function twice in any programming language, scaling & translating. So def foo() { -1 + 2*rand() } will do the trick in c/c++/scala/python/whatnot ( I've scaled by two and translated by minus one). So you have your two random variables. Pair them up & that's your tuple (x,y). Now if you make 100 such tuples, not all the tuples will lie inside the unit circle. So you have to toss out the ones that don't. So your drawing isn't 100% efficient. How efficient is it ? Well if you toss out say 21 of those 100, your sampler is 79% efficient. Now where the fuck does the 21 come from ? Well, if you use some high school geometry, unit circle has area pi and that enclosing square has area 4, so pi/4 is approximately 79%, so 100-79 is 21 and so on...So one can construct more efficient samplers for the unit circle by not being so foolish. We should stop enclosing circles in squares & listen to Marsaglia. He died a decade back but before his death he solved the above problem, among others, so we don't waste 21% of our energy. That said, most programs I've seen in banking, data science etc are written by programmers, not statisticians. So they happily use an if statement & reject x% of the samples, so they are super-inefficient. Drawing can be efficient if the statistician codes it up. But that fucker wants to use R, so given the choice between some diehard R fucker & python programmer who can mess around with kubernetes & terraform in their spare time, hapless manager will pick the python programmer everytime, so that's what makes the drawing inefficient. /s tag, but not really. Just speaking from bitter personal experience :)

WaxProlix · on Oct 25, 2022

I have to agree, we could save the world a lot of wasted energy if there were a way to get statisticians off of R/matlab and into more 'portable' spaces.

melagonster · on Oct 25, 2022

R contain so many statistical function in basic function set. I can understand why they do not want to leave.

idontpost · on Oct 25, 2022

Isn't the obvious solution to sample in polar coordinates instead? 0-1 for the radial coordinate, and (0 - 1) * 2pi for the angle.

zorgmonkey · on Oct 25, 2022

If you do this you won't get a uniform distribution on the circle, the points will be the most dense at the center and get less dense as you go towards the edge. To make the points uniform you need to use inverse transform sampling[0], which will give the formula r*sqrt(rand()) for radius poolcoordinate, where r is the radius of the circle and rand() returns and uniform random number from the interval 0 to 1.

[0]: https://en.wikipedia.org/wiki/Inverse_transform_sampling

tfehring · on Oct 25, 2022

Yes, though you have to take the square root of the sampled radius for the resulting distribution to be uniform on the unit circle. (The area of the donut with r>0.5 is greater than the area of the circle with r<0.5, but the naive implementation would sample from each of those with probability 0.5.)

It’s still a useful illustration, though, since MCMC samplers used in practice do end up throwing away lots of the sampled points based on predefined acceptance criteria.

yxwvut · on Oct 25, 2022

Computational efficiency is a major consideration in sampling from a high dimensional distribution. https://en.wikipedia.org/wiki/Rejection_sampling#Drawbacks

thehumanmeat · on Oct 25, 2022

If you want to select an integer uniformly random from 0...n-1, you need an expected logn mutually independent random bits. What if you don't want it to be uniformly random, but some other distribution instead? That's where Markov chains help; they use random bits efficiently to draw from an interesting distribution.

ajkjk · on Oct 25, 2022

Yeah the article really disappoints right away by saying something seemingly arbitrary in the first paragraph

cygaril · on Oct 25, 2022

One possible cost measure is thr number of evaluations of the probability function.