> The issue with encountering CSV in the wild is that everybody who appreciates standards and interoperability ditched it a long time ago.
I worked on a team that used CSV somewhat extensively. For the data we generated, it was RFC complaint. It's pretty trivial to get RFC-compliant CSVs, too; most languages have a library — ours was in the standard library, too.
We also had a ("terrible", as we joked) idea to create a subset of CSV that would contain typing information in a required header row. (We never did it, and it is a bad idea.)
> If you are consuming CSV files in the wild, you can be sure that whoever is supplying them to you is using horrible tools to create them and will be unwilling or unable to address issues you find in them.
…but this is absolutely true. We also consumed CSVs from external sources and contractors, and this was an absolute drain on our productivity. I've also worked with engineers of this caliber, and changing CSV wouldn't change the terrible output. I've seen folks approach eMail, HTTP with a cavalier "oh, it's a trivial text format, I don't need a library!" attitude, and inevitably get it wrong. Pointing out the flaws in their implementation and that a library would fulfill their use-case just fine is just met with more hacks (not fixes) to try to further munge the output into shape. It is decidedly not software engineering. I've seen this even with JSON.
But yeah, even with RFC standard CSV, you shouldn't be parsing it with awk. It is the wrong tool.
I worked on a team that used CSV somewhat extensively. For the data we generated, it was RFC complaint. It's pretty trivial to get RFC-compliant CSVs, too; most languages have a library — ours was in the standard library, too.
We also had a ("terrible", as we joked) idea to create a subset of CSV that would contain typing information in a required header row. (We never did it, and it is a bad idea.)
> If you are consuming CSV files in the wild, you can be sure that whoever is supplying them to you is using horrible tools to create them and will be unwilling or unable to address issues you find in them.
…but this is absolutely true. We also consumed CSVs from external sources and contractors, and this was an absolute drain on our productivity. I've also worked with engineers of this caliber, and changing CSV wouldn't change the terrible output. I've seen folks approach eMail, HTTP with a cavalier "oh, it's a trivial text format, I don't need a library!" attitude, and inevitably get it wrong. Pointing out the flaws in their implementation and that a library would fulfill their use-case just fine is just met with more hacks (not fixes) to try to further munge the output into shape. It is decidedly not software engineering. I've seen this even with JSON.
But yeah, even with RFC standard CSV, you shouldn't be parsing it with awk. It is the wrong tool.