For example, lets say height is a feature in your model. No matter how big the s...

QuesnayJr · on Jan 2, 2020

If the t test is on a regression coefficient, then the sampling distribution is approximately Gaussian (for big enough data). It doesn't matter how many modes that the original data feature has. This is standard asymptotics in hypothesis testing.

eanzenberg · on Jan 2, 2020

No, a t-test makes no assumptions that the underlying data is Gaussian. Again, most ml is done on raw data. If the raw feature is bimodal then the raw data is bimodal.

QuesnayJr · on Jan 3, 2020

I can't figure out if you are agreeing or disagreeing with me. If you do non-penalized regression on raw data, then the t statstic will be approximately Gaussian, even if the raw data is bimodal. This follows from the CLT.

eanzenberg · on Jan 5, 2020

What is the standard deviation of a bimodal distribution with modes at 0 and inf? Is that a meaningful stat?