Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Accounting for" using e.g. regression techniques is complex guesswork and leaves room for residual confounding. (Just try to imagine how exactly you'd do it in this case!)

There is no perfect solution, but simply excluding problematic data is often the least confounding option.



in this case, the data are not merely problematic, but actually relevant to the study's conclusion. excluding those data is certainly more confounding than including them, because those data are relevant to the conclusion of the study.

Imagine if you did a study on whether air bags in automobiles reduced fatalities, and excluded 'problematic' data involving cases where the occupants were injured by the air bag.


I'm not saying to exclude the event where the person got heart disease. I'm saying to exclude the data after the diagnosis, where the person is avoiding the sauna under doctor's advice.

This isn't difficult to do in a longitudinal study and would indeed be standard in many such studies. You wouldn't want to include data where the arrow of causation is known to run opposite to what you are trying to test for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: