> 3% more/less likely is a tiny effect. What is the power of this result? Isn't there always noise in this type of thing?
A result does not have "power". An experiment has power -- the ability to detect a a given effect size a certain percentage of the time -- but a result is either statistically significant, or it is not.
As for "noise", statistical significance takes random noise into account. That is the point of the calculation -- it asks if a given result exceeds the threshold of what you'd expect to find at random some percentage of the time. If it does, the result is deemed significant.
A 3% difference could be enormous, or it could be miniscule. We can't say anything based on this information alone, and certainly can't say it's "likely a tiny effect". On a sample of thousands, a 3% difference is big. On a sample of tens, a 3% difference is small.
>On a sample of thousands, a 3% difference is big.
Not really. Only if it is many, many thousands. Assuming a totally random acceptance rate of 1/5:
a = 0;
b = 0;
for (c of Array(1000)) {
if (Math.random() > .8)
a++;
if (Math.random() > .8)
b++;
}
console.log(`a=${a}, b=${b}, a is ${(a/b - 1)*100}% more likely than b`)
> a=209, b=201, a is 3.9800995024875663% more likely than b
literally the first run. And even in absolute terms, I got this on the third run:
>a=192, b=219, a is -12.328767123287676% more likely than b
That's an absolute difference of 2.7%. Again, 100% random data.
> That's an absolute difference of 2.7%. Again, 100% random data.
I think I get what you're going for here -- you're trying to simulate a coin flip? -- but what you've actually done is made successive draws from a uniform random number generator. The software is designed to return numbers that fall along the interval [0,1) with equal probability. Thresholding the numbers and dividing their counts is not a meaningful transformation; the result is still just a uniformly distributed random number. It's like...the ratio of heads in two identical, unfair coins or something.
If all "random numbers" were uniform like this, then no, we wouldn't expect an X% difference to be any more or less likely based on the magnitude of the underlying sample. But when we're talking about something like a a population mean, then the behavior of the errors on estimates is very different indeed, and most estimates cluster around the true (aka population) value:
As the sample size for an experiment of this sort gets larger, the bell curve of expected errors gets sharper and sharper, and it becomes increasingly less likely to see errors >= X, for any value X. In the limit of large N, the distribution of sample errors around a known mean approach a normal distribution:
For what it's worth, the expected proportion of N heads in M coin flips is modeled using the binomial distribution, which is also bell-shaped and illustrates the same idea:
> I think I get what you're going for here -- you're trying to simulate a coin flip? -- but what you've actually done is made successive draws from a uniform random number generator. The software is designed to return numbers that fall along the interval [0,1) with equal probability. Thresholding the numbers and dividing their counts is not a meaningful transformation;
This is wrong. That is a very meaningful transformation. It is the standard way (https://stats.stackexchange.com/questions/240338) to turn a uniform distribution into a Bernoulli distribution.
Getting a single value with Bernoulli distribution is called a Bernoulli trial (https://en.wikipedia.org/wiki/Bernoulli_trial). Repeating this gives you a Binomial distribution (see your own wikipedia link).
Long story short: GPs code is a perfectly valid way of sampling the Bernoulli distribution. It is inefficient because it needs so many random values, but it mimics the actual process happening in real life making it easier to understand than generating a Binomial sample from the Binomial distribution's CDF.
> This is wrong. That is a very meaningful transformation. It is the standard way (https://stats.stackexchange.com/questions/240338) to turn a uniform distribution into a Bernoulli distribution.
The OP didn't do what was described in the SO post. They did something else -- they calculated the ratio of two binomial random variables, and presented that as a percentage.
Also, no, the SO comment you've cited doesn't describe how to generate a "Bernoulli distribution" (not a thing, btw; it's called a binomial distribution) from a uniform distribution. It tells how to make a single Bernoulli trial...but even that isn't what OP did.
This is how you actually do what you're discussing (draw from the Binomial CDF given a uniform RNG, via a table):
> they calculated the ratio of two binomial random variables, and presented that as a percentage.
Ok, now I'm confused. I 100% agree with that statement. I thought your whole point was that OPs code was not a valid way to sample from a binomial distribution?
But then what is your criticism? Are you arguing that the Binomial distribution does not model the original experiment correctly?
OK, if what you were doing there was using "bernoulli distribution" to mean "bernoulli trial", then I stand corrected. But that's different than the binomial distribution, which is the more common thing to discuss, and what I was assuming you were talking about.
> I thought your whole point was that OPs code was not a valid way to sample from a binomial distribution?
The code OP posted was just taking the ratio of two binomial random variables. It's not "sampling from a binomial", except (perhaps) in the sense that each of those random variables was the result of independent coin flips.
We really need to be more precise in our terminology here. "Sampling from a distribution" can mean a lot of things. Based on the sibling comments, it seems like they were trying (?) to sample from the binomial CDF.
Setting this aside, my high-level point was OPs calculation doesn't have anything to do with error distributions.
> We really need to be more precise in our terminology here. "Sampling from a distribution" can mean a lot of things.
I know it to mean only one thing: Generating a value in such a way, so that if that process were to be repeated, the generated values follow the given distribution. How exactly this is done is irrelevant, as long as the distribution is correct. See also https://en.wikipedia.org/wiki/Pseudo-random_number_sampling.
> [...] it seems like they were trying (?) to sample from the binomial CDF.
This is technically unclear usage of terminology. You can not actually sample from a CDF. But it is clear that you are referring to Inverse transform sampling (https://en.wikipedia.org/wiki/Inverse_transform_sampling) – where you sample from a Uniform distribution and use that sample to generate a sample form a non-uniform distribution using that distribution's CDF.
> The code OP posted was just taking the ratio of two binomial random variables. It's not "sampling from a binomial", except (perhaps) in the sense that each of those random variables was the result of independent coin flips.
Once again: Since the Binomial distribution is the distribution of a series of independent coin flips, doing a series of independent coin flips is a perfectly valid way of sampling the binomial distribution.
> Based on the sibling comments, it seems like they were trying (?) to sample from the binomial CDF.
As they explain in the sibling comment, they generate two samples from the binomial distribution and compare them to each other the same way the original authors did. What they achieve by this is sampling from the same random variable that the original authors were implicitly sampling from. They then took multiple samples from that variable in order to get a feel for its distribution, to confirm their original point: That 3% is not an uncommonly big value under that distribution.
> Setting this aside, my high-level point was OPs calculation doesn't have anything to do with error distributions.
So I don't quite now what you mean by "error distribution". I assume you mean the distribution of the random variable that had the value of 3% in the article? If so, then OPs calculation does – as explained – have a lot to do with that distribution. It does not calculate that distribution, but it samples from it, which is a useful way to get a feel for a distribution without having to do any fancy mathematics or research.
That's precisely what this is trying to model, yes. The standard computational way to simulate a binary event with probability p is to call rand() and check if rand(0,1) < p (or > 1-p, what I did). Or as you called it, an unfair coin flip.
This model is built on the assumptions that if candidates are actually totally equally likely to be picked (the null hypothesis for the experiment above), any given candidate has a p=.2 chance of being hired (given an arbitrary but reasonable hire vs interview ratio of 1:5). Which is just a weighted coin flip. This is indeed a binomial distribution, and my point is that results ±3% of the mean (p*M), even at M=1000, are still fairly probable. When comparing two such results, it's almost expected.
The part where you did rand() < 0.8 ? 1 : 0 is fine. That's a Bernoulli trial with p=0.8
The part where you did this in a loop, with two calls per iteration, and then divided the counts and called it a percentage is wrong. It's certainly not a Binomial distribution. It's just the ratio of two binomial random variables.
A result does not have "power". An experiment has power -- the ability to detect a a given effect size a certain percentage of the time -- but a result is either statistically significant, or it is not.
As for "noise", statistical significance takes random noise into account. That is the point of the calculation -- it asks if a given result exceeds the threshold of what you'd expect to find at random some percentage of the time. If it does, the result is deemed significant.
A 3% difference could be enormous, or it could be miniscule. We can't say anything based on this information alone, and certainly can't say it's "likely a tiny effect". On a sample of thousands, a 3% difference is big. On a sample of tens, a 3% difference is small.