Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry, this is not realistic in C. The moment you get the json data, you have to also get the length, or you could not allocate enough memory for it.

Also, number of iterations is not the issue. Length of the input is. What happens if you r json is a few kb, is bigger than l1/l2 cache, ....



Have you tried running the tests with a pre-calculated length? When I made that change there was no noticeable difference. The iterations/msec went from 195 to 192 which is not statistically significant.

I have no doubt a more complete set of benchmarks could be made with various sized JSON and from file as well as string. This was put together quickly to get an answer to the questions raised in this post. Since it does seem to be a topic of interest I'll expand and cleanup the tests in the future but for now it does give at least one data point.

You might notice that the somdjson::dom::parser is reused. Without that optimization to allow warming up the iteration/msec was only 160.


Nope. You've got a code review from an interested random stranger on the net that spend 10s looking at your code, who noticed something suspect is going on.

Given your response, I spent 10s extra reading the simdjson docs and noting you violate the SIMDJSON_PADDING requirement. So either your code is a crash waiting to happen, or you use a very non- optimal code path in simdjson that requires it to re-copy all data.

That's also the maximum amount of attention you'l get from this random stranger. my time is up ;-)


The code example followed was in the simdjson basics.md. The example in the error handling example left off the realloc_if_needed argument which does default to true. I have updated the tests to be explicit that third argument is set for clarity.

As for the path in simdjson being non-optimum and having to copy bytes, that should be expected if the string is to be modified. The buf argument type is a const after all so it should not be modified. In any case, glangdale's code is clean and solid so I doubt his code is anything but optimized and the examples correct.


You can thank the cat on my lap for some bonus attention. I'll try to answer some points in this and other threads of yours. All this from a simdjson non-expert with a reasonable amount of C experience.

Suppose there is a socket with json data. What you do is, at init you create 1 or more buffers of, say, 64kb+required padding. Then, when epoll or whatever says there is data, you call read on this buffer. This gives you a length.

At this point, you have a padded buffer and a length, so requirements for simdjson(reallocifneeded=false) are met, so you can now parse at full speed. When done, reuse the buffer for the next epoll/read cycle.

There are complications, of course. Data is not guaranteed to arrive all together in 1 read call. There might be a http library feeding you. All of this amounts to mostly buffering and chunking. When carefull, they can be solved in a mostly zero copy way, ready for optimal simdjson.

The example simdjson code you refer to is a kind of demo mode. It gets you of the ground quickly, but is far from optimal.

I assume simdjson does not write to the buffer. It just reads more than 1 byte at once, so if it reads say 8 bytes and there is only 1 left, it will read 7 bytes of random junk. And discard them when it notices its unheeded optimism. However, it needs to be allowed to read these extra bytes without causing a SISEGV, hence the padding.

UPDATE all of this an educated guess, the simdjson authors are welcome to fix/finetune whatever I said.


Seeing the downvotes I assume something is wrong in here. Feel free to add a correction.


The issue is youve been pretty arrogant/rude over multiple comments. Talking about your time like its some kind of gift from the heavens.


Ok sorry. Apologies to everyone involved, the author first of all. It wasn't ment like that, for what its worth, but it clearly came out like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: