That's what I'm saying. Classifying one data point is very straightforward and b...

Tycho · on July 14, 2022

Are you trolling us here? The act of classifying is easy. It needs to be repeated many times. So you scale up by hiring many people to do it.

alxlaz · on July 20, 2022

No, I am not trolling you here.

First of all, real-life data sets have hundreds of thousands of endpoints, and it's easy to classify a few hundred, maybe a few thousands endpoints for a single person. So scaling it up to a point where it's easy for every person involved in it requires hiring a team of dozens, or even 100+ people. That is absolutely not easy, especially not on a short term notice, and not when it's 100% a dead end job, so it's difficult to convince people to come do it in the first place. I'm yet to have met a single company whose idea of scaling it up involved something more elaborate than "we're gonna hire three freelancers". At 30k endpoints/person that's about an order of magnitude away from "easy", and transferring that order of magnitude to the hiring process ("we're gonna hire thirty freelancers") isn't trivial at all.

Second, it is absolutely not at all easy to supervise. QC for classification problems is comparable to QC for other easily-replicable but difficult to automate industrial processes, like semi-automatic manufacturing processes. There is ample literature on the topic, starting with the late period of the industrial revolution and going all the way to present times, and all of it suggests that it's a very hairy problem even without taking into account the human part. Perfect verification requires replicating the classification process. Verification by sampling makes it very hard to guarantee the accuracy requirements of the model. Checking accuracy post-factum poses the same problems.

This idea that classifying training models is a simple job that you can just outsource somewhere cheap is the framework of a bad strategy. Training data accuracy is absolutely critical. If you optimize for time and cost, you get exactly what you pay for: a rushed, cheap model.