Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

nonbel · on March 30, 2018

I don't even think simple linear models are actually explainable. They just seem to be. Eg, try this in R:

  set.seed(12345)
  treatment = c(rep(1, 4), rep(0, 4))
  gender1   = rep(c(1, 0), 4)
  gender2   = rep(c(0, 1), 4)
  result    = rnorm(8)

  summary(lm(result ~ treatment*gender1))
  summary(lm(result ~ treatment*gender2))

Your average user will think coefficient for treatment tells you something like "the effect of the treatment on the result in this population when controlling for gender". I get a treatment effect of 1.17 in the first case, but -0.38 in the second case, just by switching whether male = 0 and female = 1 or vice versa.

madhadron · on March 31, 2018

Of course not. Your contrasts aren't centered on zero. Here's the right way to approach it:

  set.seed(12345)
  treatment = c(rep(0.5, 4), rep(-0.5, 4))
  gender1   = rep(c(0.5, -0.5), 4)
  gender2   = rep(c(-0.5, 0.5), 4)
  result    = rnorm(8)

  summary(lm(result ~ treatment*gender1))
  summary(lm(result ~ treatment*gender2))

Now the two are identical except for the sign of the gender coefficient.

madhadron · on March 31, 2018

Actually, treatment should stay at 0/1, now that I think about it. You want the intercept of the model to represent the result without treatment. The intercept should be the average of the two genders, though.

nonbel · on March 31, 2018

I'd look at it like this. So now we have:

  set.seed(12345)
  treatment = c(rep(0, 4), rep(1, 4))
  gender1   = rep(c(1, 0), 4)
  gender2   = rep(c(0, 1), 4)
  gender3   = rep(c(.5, -.5), 4)
  result    = rnorm(8)

  summary(lm(result ~ treatment*gender1))
  summary(lm(result ~ treatment*gender2))
  summary(lm(result ~ treatment*gender3))

Each gives a different estimate of the treatment effect and the third also gives a different uncertainty than the others for that effect.

nonbel · on March 31, 2018

So what is the intuitive explanation here? You can choose to code it 1/0 0/1 or -.5/.5, or -643/643 and the result you care about is different.

madhadron · on March 31, 2018

The intuitive explanation is that you really are writing down a model of the form

  result = a + b*treatment + c*gender

Let's ignore the interaction term for the sake of simplicity. It doesn't change anything in what follows. The two operations I think about for interpreting this are:

1. What equation results when I fix a particular value of a factor? This is equivalent to taking a subset of the population I'm studying.

2. What equation results when I average over a factor? This is equivalent to removing a dependent variable from my model.

If I remove the dependent variable gender and look at the equation when I fix the two values of treatment, what do I want my model to look like? I think that

  result = a

for no treatment and

  result = a + b

for treatment is the easiest to interpret. Then a is the baseline without intervention, and b is the effect of the treatment. To get those, I make no treatment = 0 and treatment = 1. If I made no treatment = -0.5 and treatment = 0.5, I would get

  result = a - 0.5*b

for no treatment and

  result = a + 0.5*b

for treatment. I've messed up my interpretation of the parameters. But what contrast do I need for gender to get this? If I use male = 0 and female = 1, then when I average over the whole population (assuming that it's balanced male/female), I get

  result = a + 0.5*c

for the baseline of no treatment, averaged across the population and

  result = a + b + 0.5*c

for treatment. a and c are now mixed, messing up what I can interpret off the parameters. If I want to keep the baseline a an average over gender, then I need my levels for gender to sum to zero. But what happens if I use, as you suggest, -643 and 643 for male and female? Let's fix treatment to zero and look at the equation for the result for male and for female. For male:

  result = a - 643*c

and for female:

  result = a + 643*c

All we've ended up doing is scaling the parameter that we interpret as the difference between male and female response by 643. That would be a lot easier to use if the parameter actually measured the difference between male and female directly instead, so let's set it up so that the difference between the two levels is 1, or use -0.5 and 0.5.

nonbel · on March 31, 2018

>"Let's ignore the interaction term for the sake of simplicity. It doesn't change anything in what follows."

It changes the estimates if you don't assume the interaction coefficient = 0, how can you just ignore it? Actually, not assuming zero interaction changes everything: https://en.wikipedia.org/wiki/Principle_of_marginality

madhadron · on April 1, 2018

For the purpose of getting intuition about setting levels, it doesn't change anything. Obviously for what you actually fit, it matters.

curiousgal · on March 30, 2018

Hence the need for permutation tests in your specific example.

nonbel · on March 30, 2018

Interesting, I'm not sure what to expect. Can you code up the permutation test and share the results?

Also, in what cases would you say a permutation test would be unnecessary, only one independent variable? I tried to choose a very simple example.