That's what I'm saying. Classifying one data point is very straightforward and brings negligible value to a company. Reliably classifying hundreds of thousands of them is very complicated and not at all easily supervised. And if your company's business model is based on applying trained models to $real_world_problem, it doesn't just bring a lot of value to your company, it's literally critical for its success, just like a solid CI/CD pipeline or having a good security process.
It's attractive to think that this is just like classifying one data point over and over again. It's nothing like that, just like crossing the Atlantic from Galway to New York is nothing like kayaking around Mutton Island over and over again.
First of all, real-life data sets have hundreds of thousands of endpoints, and it's easy to classify a few hundred, maybe a few thousands endpoints for a single person. So scaling it up to a point where it's easy for every person involved in it requires hiring a team of dozens, or even 100+ people. That is absolutely not easy, especially not on a short term notice, and not when it's 100% a dead end job, so it's difficult to convince people to come do it in the first place. I'm yet to have met a single company whose idea of scaling it up involved something more elaborate than "we're gonna hire three freelancers". At 30k endpoints/person that's about an order of magnitude away from "easy", and transferring that order of magnitude to the hiring process ("we're gonna hire thirty freelancers") isn't trivial at all.
Second, it is absolutely not at all easy to supervise. QC for classification problems is comparable to QC for other easily-replicable but difficult to automate industrial processes, like semi-automatic manufacturing processes. There is ample literature on the topic, starting with the late period of the industrial revolution and going all the way to present times, and all of it suggests that it's a very hairy problem even without taking into account the human part. Perfect verification requires replicating the classification process. Verification by sampling makes it very hard to guarantee the accuracy requirements of the model. Checking accuracy post-factum poses the same problems.
This idea that classifying training models is a simple job that you can just outsource somewhere cheap is the framework of a bad strategy. Training data accuracy is absolutely critical. If you optimize for time and cost, you get exactly what you pay for: a rushed, cheap model.
It's attractive to think that this is just like classifying one data point over and over again. It's nothing like that, just like crossing the Atlantic from Galway to New York is nothing like kayaking around Mutton Island over and over again.