I shudder at the data reliability issues of a self-selecting sample of individua...

geofft · on July 18, 2015

I think there's as much a story in the managerial response to the spreadsheet as in the spreadsheet itself. If there really was not a problem (or not a statistically-significant one, or not a statistically-significantly worse-than-industry-norms one), it seems like there would have been better ways to respond. Even "HR did the stats, you can't see them, but our data implies less wage gap than your data" would be worth something.

And regardless of whether we can infer anything about the true pay inequity from the sample, the response by management described here is just totally awful, and strongly implies behavior that is illegal in the US. You're not allowed to penalize an employee for discussing wages simply because their discussion is factually inaccurate or statistically unsound.

cperciva · on July 18, 2015

Oh, I agree that the managerial response (particularly with her getting flooded by bonuses which her manager vetoed) was idiotic. That doesn't take away from my concerns about how people may be analysing and using the anecdata though.

If someone came to me and said "I did a self-selecting self-reporting survey of salaries, and it says I'm underpaid", I think I'd be more likely to fire them for incompetence than to give them a raise.

geofft · on July 18, 2015

> If someone came to me and said "I did a self-selecting self-reporting survey of salaries, and it says I'm underpaid", I think I'd be more likely to fire them for incompetence than to give them a raise.

OK, except that's not what's happening here. What seems to be happening here is someone saying "I did a self-reporting survey of salaries, and there's an existence proof that someone at my level is getting paid $X, can I get paid closer to $X than my current salary." Which is sound.

The statistics and pivot tables are very interesting, but I suspect they're not directly being used for negotiations. Nobody has any good information about how much money they should be asking for. The data this spreadsheet provides is that, if you ask for $X, it's within the range of reason, and not so high that you'll get laughed at.

If I apply for Google and try to negotiate a $200K salary, is that low? high? so high I'll get my offer revoked? (I've heard of that happening in tech!) I have no idea. Knowing even one person who works in my future role and makes $200K would be valuable information. So would knowing that I can't find anyone in that role making more than $110K. Knowing the mean and standard deviation of salaries at 95% confidence is not particularly more useful to me.

cperciva · on July 18, 2015

What seems to be happening here is someone saying "I did a self-reporting survey of salaries, and there's an existence proof that someone at my level is getting paid $X, can I get paid closer to $X than my current salary." Which is sound.

It's sound modulo the assumption that everybody is being honest when they report their salaries, sure. Of course, in a company Google's size, there are always going to be outliers, so "someone at my level is getting paid $X" is a long way from "everybody at my level should be getting paid $X".

If I apply for Google and try to negotiate a $200K salary, is that low? high? so high I'll get my offer revoked? (I've heard of that happening in tech!) I have no idea.

Right, and that's the proper use of such anecdata -- to help you formulate your bargaining strategy, not as a bargaining chip itself.

Ironically, if information like this had been available back in 2006, I might be working for Google: A few months after I rejected their offer, I was told that it was "pathetically low" and that I could have negotiated at least 50% more. If I had been offered that much more it might have factored into my decision; but I didn't know I could negotiate, so I ended up simply rejecting the offer.

chetanahuja · on July 19, 2015

"but I didn't know I could negotiate"

This is shocking to me. Was it your first formal job offer?

cperciva · on July 19, 2015

eitally · on July 19, 2015

I have precisely the same fears. I have been relying on Glassdoor, which seems risky at best. A self-reported internal spreadsheet would be a lot more helpful.

asgard1024 · on July 18, 2015

> If someone came to me and said "I did a self-selecting self-reporting survey of salaries, and it says I'm underpaid", I think I'd be more likely to fire them for incompetence than to give them a raise.

Even if besides them being wrong in this one thing, they would actually be competent and underpaid?

If the answer to the above question is no, why do you even have to elaborate. Then it doesn't actually matter at all if the employee complains rightly or wrongly! I have to conclude the answer has to be yes, then.

It's perfectly understandable that you, as a manager or employer, don't want to pay people for their merit if you don't have to. But it's not something you should be proud of.

cperciva · on July 18, 2015

Drawing conclusions from self-selected self-reported data (without a careful process for validation and debiasing) is a big red flag for (a) being ignorant of basic statistics, and (b) not even being aware of your ignorance of basic statistics. I mean, "why do we use random samples?" is covered in the first lecture of a 1st year stats course.

I expect developers to be familiar with basic statistics.

dlitz · on July 18, 2015

Wow. Colin, are you just trying to win an argument online or are you really that flippant when it comes to firing an employee? Keep in mind that we're talking about people's livelihoods, here---it's the sort of thing that can stress people out to the point of mental breakdown, even if you're not actively threatening them.

Firstly, this isn't a job interview. If someone's been working with you for long enough that they want to discuss a salary adjustment, then shouldn't your assessment of the value that they bring to the table be more informed than looking for "red flags" on the spot?

Secondly, developers are generally not familiar with basic statistics. If it's important to your business that they be better-than-average at stats, then budget some time and money for training.

Thirdly, in the real world, we make decisions from incomplete and biased information all the time. Sometimes, those decisions turn out to be wrong, but on the whole, it still works out better than pretending that we have no information at all. If you're in management and you have access to more complete information, then the smart thing for an employee to do is to have a conversation with you. It makes no sense to fire someone for that.

I have a lot of respect for your technical work, but the management style you're describing so far sounds authoritarian and abusive. I really hope that's not how you'd actually treat your reports, in practice.

cperciva · on July 18, 2015

are you really that flippant when it comes to firing an employee?

I don't know; I've never had to fire anyone yet.

I'll admit that when writing at 1AM I may have let some hyperbole slip into my comments. It's quite possible that my reaction would be "great, your assignment for the next week is to do a rigorous statistical analysis of this data".

On the other hand, as someone else commented, I said "more likely"; and that's probably true. I can't imagine a situation where I'd accept a naive analysis of self-selected self-reporting data as having any substance at all, so I would certainly not give someone a raise based on that.

discuss a salary adjustment

Discussing salary is fine. My point is that bad statistics is worse than nothing in this context.

asgard1024 · on July 18, 2015

> My point is that bad statistics is worse than nothing in this context.

Actually, even this point is wrong, IMHO. I can easily imagine situations where you get just one data point and correctly infer from it that you're underpaid.

One example is that someone who obviously less competent than you will reveal to you higher salary than you have.

Another example is that someone has access to information will tell you that your salary is low, and you have good reason to trust that person.

I understand that you as an employer naturally don't think that you are or ever will be biased in this way. But it still may happen in the wild.

In any case, I think it was already said, if you still think that bad statistics is worse than none, you should make the information publicly available. It's the similar thing as with rumors.

cperciva · on July 18, 2015

I can easily imagine situations where you get just one data point and correctly infer from it that you're underpaid.

One data point isn't even bad statistics. It's anecdote. Maybe the other person is being vastly overpaid.

asgard1024 · on July 19, 2015

> Maybe the other person is being vastly overpaid.

Maybe; but if you think like that, then no sample can give you that information. Because you won't get any absolute reference point either way. In any case, such situation is a good reason to talk about your salary.

So your argument is wrong; the sample size is not important (that doesn't mean it's useless, though!). Either you have some reference point, and then you can judge the fairness of the salary (even from one data point), or you don't have a reference point, and then no sample size will help you get it.

cperciva · on July 19, 2015

Let me rephrase that: If you get told that one person is earning far more than you, then you don't know if you're the anomaly or if they're the anomaly.

If you get told that a large representative sample of people all get paid far more than you, it suggests that you're the anomaly.

asgard1024 · on July 19, 2015

There is something wrong with your thinking, but I can't quite put my finger on it. Let me try:

Either you assume a prior distribution of salaries or not. If you don't, then you don't know mean salary, and so group of any size will not tell you if you're anomaly. Because they can all be anomalous as well.

On the other hand, if you assume a prior distribution (which is pretty much what Bayesian statistics does), then even one sample will modify the prior, and you gain information (i.e. "suggestion that you're anomaly or not").

Of course more samples is always better, but if you can make conclusion from multiple samples, then you can make conclusion from one sample.

It seems to me that in the first case, you're saying we cannot assume any prior, but in the second case, you're doing exactly that - assuming that there is a mean - 1st moment of the prior distribution and maybe even other moments - which indicate whether an observation is anomaly or not.

anu_gupta · on July 18, 2015

The Conversation

"Hey Coliin, I've been chatting with a couple of people, and I've got a concern about my salary"

"You're fired"

The End

rustynails · on July 18, 2015

I'm going to break something to you - work, recognition and pay is a popularity contest in many organisations. Someone who is seen as a trouble maker gets alienated quickly, regardless of talent, skin colour, gender or looks. I have managed a few large teams (100+ people). My observations were: - gender was not a factor in income, nor was race - but being one of the "players" was important (regardless of competency) - performance appraisals were popularity contests (ratings were often changed or negotiated with senior management) - people that caused trouble (regardless of gender, race) were ostracised. Language was also a major barrier, or cliques that established for one reason or another. They usually worked against the culture management tried to create and were seen as a threat. - good people were nearly always paid well, unless they pissed the wrong people off.

So Colin spoke the truth (as far as I can tell). It is very hard to change a bad culture unless the person at the top is committed to it. I fought a number of injustices and was kicked in the balls many times (metaphorically of course). You learn what you can and can't get away with or end up on the outer - truth be damned. I was often respected for my principles but seen as foolhardy for my commitment to truth. From my experience, gender and race is no basis for judging decent & compassionate management either. I worked with some brutal men and women over the years. It doesn't take Einstein to sort the Wheat from the chaff, people normally knew - although terms like "makes stuff happen", "doesn't take prisoners", "hard ass" were some alternate names for these brutal types. I can't speak for all work environments, but I've seen hundreds of salaries and appraisals over the years. I can not guarantee my experience reflects the broader world of business.

vacri · on July 18, 2015

So here is another example of not understanding the basics of statistics.

The statement was 'more likely to fire than give a raise'. This does not mean that these are the only two options available. 'More likely' just means the chance is higher overall, whether it's 99% likely to fire or 2% likely to fire.

It's also not what was said - it wasn't "chatting led to a concern", it was "self-reported, self-selecting study says X". That stuff really is a red flag to folks who have a decent grounding in statistics.

anu_gupta · on July 18, 2015

Literal much?

Brilliant mansplaining though!

oldmanjay · on July 18, 2015

It's a fairly reliable signal that once you start to use words like "mansplaining" you're well past having a useful point to make in a reasonable discussion.

vacri · on July 18, 2015

Why are you complaining that my comment is taking yours seriously, when yours is taking colin's seriously? Nice hypocrisy there.

I was actually referencing your underlying understanding for your ridicule, not the joke itself.

anu_gupta · on July 18, 2015

Can you point out where Colin indicated (clearly too subtly for me) that his repeated comments on this subject were a joke and should not be taken seriously.

Once you do that, I'll try and explain (mansplain if necessary) how a play of 2 lines and 1 act is probably not a very serious response.

wdewind · on July 18, 2015

> a careful process for validation and debiasing

What might something like this look like?

cperciva · on July 18, 2015

Generally speaking, you would want to take a random sample of employees, verify that the information they self-reported was correct, and perform a regression analysis of response rate vs. other characteristics (age, seniority, gender, salary, ethnicity, etc.).

In this particular context, the biggest threats to data integrity are probably:

1. People lie, and men probably lie about their salaries more than women,

2. Willingness to divulge salary is highly culture-dependent, and in societies where salary is seen as a sign of social success, can be highly salary-dependent (and probably more so for men than for women).

If you're interested in learning more I suggest looking for information about how national census data is processed; they have lots of experience dealing with these sorts of issues (albeit with different factors influencing response rates and data reliability).

wdewind · on July 18, 2015

Interesting, thanks!

jsmthrowaway · on July 18, 2015

In this thread, the repeatedly reinforced observation that expertise in crafting software does not automatically bestow skills for managing, understanding, or empathizing with people, and those whose work you respect tremendously may just disappoint you when it comes to working for them.

DannyBee · on July 18, 2015

Vetoing > 1 peer bonus for the same thing is pretty much policy.

Certainly it's manager discretion, but that's usually the right thing to do.

(There are other forms of bonusing that would make more sense in such a case, like a spot bonus, or whatever)

anu_gupta · on July 18, 2015

An astonishing attitude.

Perhaps, if you ever end up managing people, you'll hopefully take a step back and contemplate investigating whether there was, in fact, a valid concern.

georgemcbay · on July 18, 2015

Well, the obvious solution to the self-selecting issue is just have the company publish all salaries publicly.

Of course, very few would dare to do that because they know that the data will be hiding all sorts of embarrassing and possibly resignation-causing truths. Which is kind of shitty if you think about it, especially when taken with the notion that "companies are people"... sociopathic, lying, cheating people...

icebraining · on July 18, 2015

On the other hand, if I was a manager at an established company with thousands of workers, I'd be very wary of "just publishing all salaries publicly" even if I did think they were all fair (which I don't believe it's true for most companies, granted).

For one thing, fairness is subjective. For many, "equal pay for equal work" is fair, but many others believe that seniority, or a better CV, or many others factors should be rewarded differently.

Secondly, there are issues of perception. A worker might think (s)he should be paid more because (s)he always leaves later than his/her colleague, without taking into account that the colleague prefers to take shorter lunches or fewer breaks and leave earlier. You can explain it to them, but why would they believe you?

It'd be different if a company published the salaries from the start, but switching after growing to a huge size can be very disruptive, and I don't blame managers for avoiding opening that can of worms.

(Again, I don't believe the salaries at Google are perfectly fair, I'm talking about an hypothetical company)

mtrimpe · on July 18, 2015

Then again; in all of Sweden everybody's tax returns are public information and their economy still seems to do just fine...

icebraining · on July 18, 2015

Yes, but that's been open for over 100 years. What I was talking about was the issues surrounding the transition from closed to open, not the openness itself.

nhebb · on July 18, 2015

They could publish descriptive statistics of the salaries without publishing the salaries themselves.

saalweachter · on July 18, 2015

Many universities publish everyone's salary yearly. Everyone always says Google feels very collegiate...

mrow84 · on July 18, 2015

Statistics is as much (if not more) about careful thought as it is about applying standard tests.

What sampling biases do you think might be present in this sampling that invalidate the stated conclusions?

For example, for a conclusion about gender differences to be exaggerated by the data there would have to be systematic biases to sampling women with lower salaries, and men with higher salaries, in the same position. Does this seem likely?

cperciva · on July 18, 2015

For a conclusion about gender differences to be exaggerated by the data there would have to be systematic biases to sampling women with lower salaries, and men with higher salaries, in the same position. Does this seem likely?

For a self-selected sample, absolutely. Self-worth is far more tied to salary in males than in females. (I don't think this is innate; I think it's a result of social conditioning which tells males that their role in society is to be breadwinners.)

But you don't need to be able to point to a specific source of bias to say that self-selected samples are problematic. It's up to the person who intends to use the statistics to show that they were either gathered reliably or carefully examined to remove biases.

mrow84 · on July 19, 2015

I don't understand the link you're drawing between self-worth and willingness to participate in this spreadsheet exercise, but I suppose it is beside the point.

I certainly agree that if someone is trying to make a statistical claim then they should be the ones to do the due diligence on the data, and on their analyses, to show that they stand up - I would certainly expect this in a more formal publication.

What I am disputing is that one can dismiss the conclusions simply due to the lack of these checks, without considering what the data might tell you.

Nearly always such dismissive arguments strike me as a disagreement with the conclusions, rather than a claim about failings in the data - a more convincing argument would present a reason why the data are flawed, such as the one you have given in this response (even if I don't understand it).

walshemj · on July 18, 2015

Ah the want do you want 2+2 to equal boss :-)