Hacker Newsnew | past | comments | ask | show | jobs | submit | kbrazil's commentslogin

There could be a major schema change that breaks the contract, but one of the nice things about JSON output is that it allows the creation of new fields without affecting downstream consumers.

That is, if I have a CLI program that spits out a list of IP addresses and one day I want to also output the corresponding dns names, I can simply add the "dns" field and existing pipelines will ignore the field and work just fine.

This is better than grep/awking/etc. unstructured text to STDOUT because, depending on how the author decides to add the new field, it can easily break existing pipelines that rely on the shape of the data to stay the same.



`jc` author here. I've been maintaining `jc` for nearly four years now. Most of the maintenance is choosing which new parsers to include. Old parsers don't seem to have too many problems (see the Github issues) and bugs are typically just corner cases that can be quickly addressed along with added tests. In fact there is a plugin architecture that allows users to get a quick fix so they don't need to wait for the next release for the fix. In practice it has worked out pretty well.

Most of the commands are pretty old and do not change anymore. Many parsers are not even commands but standard filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and string types (IP addresses, URLs, email addresses, datetimes, etc.) which don't change or use standard libraries to parse.

Additionally, I get a lot of support from the community. Many new parsers are written and maintained by others, which spreads the load and accelerates development.


Also, `jc` automatically selects the correct /proc/file parser so you can just do `jc /proc/meminfo` or `cat /proc/meminfo | jc --proc` without specifying the actual proc parser (though you can do that if you want)

Disclaimer: I'm the author of `jc`.


You can do it like this with Jello (I am the author):

    jello '[e.commit for e in _ if e.commit.author == "Tom Hudson"]'
Jello let’s you use python syntax with dot notation without the stdin/stdout/json.loads boilerplate.

https://github.com/kellyjonbrazil/jello


jc[0] supports proc files. Converts them to JSON or YAML. (I am the author)

[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc


Neat! Your parser [1] almost has a similar issue because a comm could contain parenthesis, e.g., `foo) R 123 456`. But since a comm is limited to 64 bytes, I don't think it is possible to fit a fully matching string inside of the comm before the closing parent after the comm, which would thus make your regexp fail to match.

[1] https://github.com/kellyjonbrazil/jc/blob/master/jc/parsers/...


I just had a quick read of the pid/stat parser, and the regex pattern starts with ^, but there's no $. Doesn't this mean that this parser suffers exactly the bug of the original post?


See https://news.ycombinator.com/item?id=34097179. A comm is limited to 64 bytes, so I don't think it is possible to fit a long enough comm to match the full regexp.


Right, it's not a security problem on its own, but it can make the regex not match at all causing jc to return an error. So jc suffers from the parsing bug mentioned in the post.

[edit:] In order to get jc to return an error one has to actually read the regex. Here is a file name that gets it to return an error:

  bad) S 1 2 3 4 5


It doesn't look like jc suffers from this bug since the regex match is greedy:

    % echo '2001 (my (file) with) S 1888 2001 1888 34816 2001 4202496 428 0 0 0 0 0 0 0 20 0 1 0 75513 115900416 297 18446744073709551615 4194304 5100612 140737020052256 140737020050904 140096699233308 0 65536 4 65538 18446744072034584486 0 0 17 0 0 0 0 0 0 7200240 7236240 35389440 140737020057179 140737020057223 140737020057223 140737020059606 0' | jc --proc

    {"pid":2001,"comm":"my (file) with","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}
Edit: looks like I can tighten up the signature matching regex for the "magic" syntax per the issue found above. The greedy regex matching for the parser does seem to work fine, though.


Interesting - the proc-pid-parser actually parses that file name just fine:

    $ echo '2001 (bad) S 1 2 3 4 5) S 1888 2001 1888 34816 2001 4202496 428 0 0 0 0 0 0 0 20 0 1 0 75513 115900416 297 18446744073709551615 4194304 5100612 140737020052256 140737020050904 140096699233308 0 65536 4 65538 18446744072034584486 0 0 17 0 0 0 0 0 0 7200240 7236240 35389440 140737020057179 140737020057223 140737020057223 140737020059606 0' | jc --proc-pid-stat

    {"pid":2001,"comm":"bad) S 1 2 3 4 5","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}
But the "magic" signature doesn't recognize it:

    $ echo '2001 (bad) S 1 2 3 4 5) S 1888 2001 1888 34816 2001 4202496 428 0 0 0 0 0 0 0 20 0 1 0 75513 115900416 297 18446744073709551615 4194304 5100612 140737020052256 140737020050904 140096699233308 0 65536 4 65538 18446744072034584486 0 0 17 0 0 0 0 0 0 7200240 7236240 35389440 140737020057179 140737020057223 140737020057223 140737020059606 0' | jc --proc             
    jc:  Error - Parser issue with proc:
                 ParseError: Proc file could not be identified.
                 ...
I can fix the "magic" signature (regex) to account for such cases.


Fortunately `jc`[0] does parse `/proc/<pid>/stat` correctly. I, of course, originally implemented it the naive/incorrect way until a contributor fixed it. :)

    $ cat /proc/2001/stat | jc --proc
    {"pid":2001,"comm":"my program with\nsp","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}
[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc_pid_st...


Both ways work:

    $ jc ifconfig lo0

    $ ifconfig lo0 | jc --ifconfig


Good news - `jc` supports JSONLines output for many commands and file-types, too!


Also note that you are looking at plaintext output here. By default `jc` and other JSON filtering tools do syntax highlighting when outputting to the terminal so it's actually quite easy to read JSON these days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: