Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the issues that I faced during my short stint building ML models for fraud detection in debit card transactions was dealing with class imbalance. I was not completely convinced that over sampling techniques or under sampling techniques would work. My initial experiments just resulted in more false positives. Just curious if you guys faced similar problems.

The other point I bring about is rather rhetorical - There are no open standards, model baselines and datasets in the Fraud domain. Compare building a model for fraud detection to building a model for image recognition or object detection There is a standard baseline, standard datasets and your model competes against that baseline. Because of the open nature of image recognition, the models have improved astronomically. I feel that a lack of such openness is fraud is holding back on innovation. I could be wrong in this assessment so please correct me if so.



I agree that the lack of standards and baselines in the fraud detection space isn't ideal. One example: some fraud products will build models using human labels as the target to be predicted. Radar, on the other hand, tries to predict whether a charge actually turns out to be fraudulent (we use dispute/chargeback data we get directly from card issuers/networks). These are in fact different problems and the fact that the industry generally doesn't have a consistent target makes discourse and comparisons more muddled.

(And on class imbalance: we spent quite a bit of time experimenting/analyzing how to deal with it—we found that sampling rate has a marginal impact on performance but not a huge one.)


If you are still on this thread - check this video out at 28:56

https://www.youtube.com/watch?v=4inIBmY8dQI&feature=youtu.be


The problem with fraud openness is that you’re effectively teaching people how to commit fraud.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: