Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The intuitive explanation is that you really are writing down a model of the form

  result = a + b*treatment + c*gender
Let's ignore the interaction term for the sake of simplicity. It doesn't change anything in what follows. The two operations I think about for interpreting this are:

1. What equation results when I fix a particular value of a factor? This is equivalent to taking a subset of the population I'm studying.

2. What equation results when I average over a factor? This is equivalent to removing a dependent variable from my model.

If I remove the dependent variable gender and look at the equation when I fix the two values of treatment, what do I want my model to look like? I think that

  result = a
for no treatment and

  result = a + b
for treatment is the easiest to interpret. Then a is the baseline without intervention, and b is the effect of the treatment. To get those, I make no treatment = 0 and treatment = 1. If I made no treatment = -0.5 and treatment = 0.5, I would get

  result = a - 0.5*b
for no treatment and

  result = a + 0.5*b
for treatment. I've messed up my interpretation of the parameters. But what contrast do I need for gender to get this? If I use male = 0 and female = 1, then when I average over the whole population (assuming that it's balanced male/female), I get

  result = a + 0.5*c
for the baseline of no treatment, averaged across the population and

  result = a + b + 0.5*c
for treatment. a and c are now mixed, messing up what I can interpret off the parameters. If I want to keep the baseline a an average over gender, then I need my levels for gender to sum to zero. But what happens if I use, as you suggest, -643 and 643 for male and female? Let's fix treatment to zero and look at the equation for the result for male and for female. For male:

  result = a - 643*c
and for female:

  result = a + 643*c
All we've ended up doing is scaling the parameter that we interpret as the difference between male and female response by 643. That would be a lot easier to use if the parameter actually measured the difference between male and female directly instead, so let's set it up so that the difference between the two levels is 1, or use -0.5 and 0.5.


>"Let's ignore the interaction term for the sake of simplicity. It doesn't change anything in what follows."

It changes the estimates if you don't assume the interaction coefficient = 0, how can you just ignore it? Actually, not assuming zero interaction changes everything: https://en.wikipedia.org/wiki/Principle_of_marginality


For the purpose of getting intuition about setting levels, it doesn't change anything. Obviously for what you actually fit, it matters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: