There could be a major schema change that breaks the contract, but one of the nice things about JSON output is that it allows the creation of new fields without affecting downstream consumers.
That is, if I have a CLI program that spits out a list of IP addresses and one day I want to also output the corresponding dns names, I can simply add the "dns" field and existing pipelines will ignore the field and work just fine.
This is better than grep/awking/etc. unstructured text to STDOUT because, depending on how the author decides to add the new field, it can easily break existing pipelines that rely on the shape of the data to stay the same.
`jc` author here. I've been maintaining `jc` for nearly four years now. Most of the maintenance is choosing which new parsers to include. Old parsers don't seem to have too many problems (see the Github issues) and bugs are typically just corner cases that can be quickly addressed along with added tests. In fact there is a plugin architecture that allows users to get a quick fix so they don't need to wait for the next release for the fix. In practice it has worked out pretty well.
Most of the commands are pretty old and do not change anymore. Many parsers are not even commands but standard filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and string types (IP addresses, URLs, email addresses, datetimes, etc.) which don't change or use standard libraries to parse.
Additionally, I get a lot of support from the community. Many new parsers are written and maintained by others, which spreads the load and accelerates development.
Also, `jc` automatically selects the correct /proc/file parser so you can just do `jc /proc/meminfo` or `cat /proc/meminfo | jc --proc` without specifying the actual proc parser (though you can do that if you want)
Neat! Your parser [1] almost has a similar issue because a comm could contain parenthesis, e.g., `foo) R 123 456`. But since a comm is limited to 64 bytes, I don't think it is possible to fit a fully matching string inside of the comm before the closing parent after the comm, which would thus make your regexp fail to match.
I just had a quick read of the pid/stat parser, and the regex pattern starts with ^, but there's no $. Doesn't this mean that this parser suffers exactly the bug of the original post?
Right, it's not a security problem on its own, but it can make the regex not match at all causing jc to return an error. So jc suffers from the parsing bug mentioned in the post.
[edit:] In order to get jc to return an error one has to actually read the regex. Here is a file name that gets it to return an error:
Edit: looks like I can tighten up the signature matching regex for the "magic" syntax per the issue found above. The greedy regex matching for the parser does seem to work fine, though.
Fortunately `jc`[0] does parse `/proc/<pid>/stat` correctly. I, of course, originally implemented it the naive/incorrect way until a contributor fixed it. :)
$ cat /proc/2001/stat | jc --proc
{"pid":2001,"comm":"my program with\nsp","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}
Also note that you are looking at plaintext output here. By default `jc` and other JSON filtering tools do syntax highlighting when outputting to the terminal so it's actually quite easy to read JSON these days.
That is, if I have a CLI program that spits out a list of IP addresses and one day I want to also output the corresponding dns names, I can simply add the "dns" field and existing pipelines will ignore the field and work just fine.
This is better than grep/awking/etc. unstructured text to STDOUT because, depending on how the author decides to add the new field, it can easily break existing pipelines that rely on the shape of the data to stay the same.