Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find it surprising that a fairly straightforward parser runs "30% faster" than simdjson (full disclosure: I wrote the original simdjson code) while parsing things byte by byte with copious conditional branches. This looks exactly like the approach used in most JSON parsers, which generally were beaten by simdjson by an order of magnitude or more.

"Informal tests", especially ones with zero description as to what was being tested or even ballpark numbers as to how fast this system supposedly run, may not entirely address this point.

We wrote a paper and anyone curious about our methodology can download and run simdjson with a few clicks and verify our performance claims, which (I hope) are well-described in the paper and line up with the arguments we make in favour of the design.

Given that no indication of what was being measured is provided, it's kind of hard to tell what this performance claim even is, or how to verify it.



Are there benchmarks with simdjson getting the JSON reified into native data structures outside the parser?


No. This is an interesting problem but a non-goal for the C++ version of simdjson, which produces usable "native data structures" albeit not outside the parser (speaking informally - the structures aren't really "in the parser" but they are specific to simdjson if you see what I mean).

The reification problem is awkward but it's a different problem for each language. I did daydream of breaking some API boundaries and using SIMD to magic up some Python structures directly, but this would be incredibly brittle. Someone would probably use it but the chance of getting burned by this is... high.


We have started experimenting how to do it with simdjson, though :) https://github.com/simdjson/simdjson/pull/947


Sorry, I meant the applications data structures and not the libraries. And C++ yes, at least that is my exposure to simdjson.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: