Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This keeps coming up as new people discover what CSVs are. An ancient TEXT data exchange format. The lowest vaguely common denominator. A style of format with flavors software long out of support contract are happy to export data in.

The intent of the format is to be human readable and editable. Sure, Tab characters can be used instead of commas. (TSV files) Yes that's that "" to escape a quote rule. Oh and quoted values are optional, unquoted strings are fine as long as they contain no newline or record separator characters.

Sure, you could make another CSV inspired format which uses the old mainframe control characters; except as keeps getting pointed out, even programmers often don't know how to enter raw flow control characters on their systems. Who even bashes those out these days? I know I have to look it up every time.

Rejoice that the format is so simple, it's all just text which software might convert to numbers or other values as it might desire.



I agree completely. Its simplicity is what gives it staying power.

When I was an undergrad, I had kind of an anal software engineering 101 professor who was treating the course like he was a scrum master. The deliverable was to make some dumb crud app, and a requirement was it used a "database." It was so stupid simple to write a csv to s3 or local disk that I just used that for the entire project. He tried to fail me for not following the requirements, and I had to go to the dean of CS and argue that by definition, a structured data format on a disk is absolutely a database, and I won. I got graded horribly after that though.


> even programmers often don't know how to enter raw flow control characters on their systems.

Yes, but that is because those characters are not meant to be entered directly. DSV values should either be created by a dedicated DSV editor or they should be constructed by a software library. You would rather use a paint program to create an image instead of writing the image's bytes in a text editor.


Aka a completely different use case than CSV.


> Aka a completely different use case than CSV.

How many CSVs are generated, edited, or viewed by Notepad.exe and how many by Excel (or Google Sheets)?

I would posit the vast majority of CSVs are generated through some kind of program where you go to File > Export or File > Save As…. In which case doing selecting a drop down with the option for File Format to be TSV or DSV (with the corresponding file extension) would solve a lot of problems. (Or at least if CSVs from Excel were RFC 4810 compliant by default.)


How many get edited or inspected in notepad at some point in their life? Nearly all of them (for any given workflow).


It is nice that text editors are abundantly available and that they can be used for the task. But once the CSV columns get too wide and irregular, then you probably want to reach for a dedicated spreadsheet program, because it is otherwise too hard to figure out which column you are currently reading.

There is still room between a text editor and a full-blown spreadsheet program. New DSV editors could emerge when the DSV format gains popularity.


At the point someone is using a different format, they’ll likely pick something explicitly structured. Like everything from JSON, to Yaml, to Protobufs, or hell even XML.

DSV seems like worst of both worlds. Not really structured, AND also not really viewable/editable by lowest common denominator tooling.


> when the DSV format gains popularity

CSV is equivalent to Voyager I, the chances of catching up with that kind of head start are extremely low.


Right, the author skipped right over human-readable TSV files which play nicely with sed/awk/grep/sort pipelines, and are supported by all CSV parsers and spreadsheet software.


TSV is also my go-to when mucking around on the command line. Perfect for noodling with data before you have to put together an Excel file to show to management.


The problem is that people (non-technical mostly), put tabs in fields, and then you have all the problems that the article notes.


I personally find that this happens (a lot) less often than with commas or quote characters.


That's fair, but it only takes one to mess up the rest of the file.


Agreed, that's why it's not good for production processes.


CSV isn't a common denominator of anything. Everything is communicated out of band. Nobody understands your CSV files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: