Without validation, all the speed and performance is worthless, if a bad formed ...

beached_whale · on July 5, 2020

There are many circumstances where the JSON is always perfect. Having the option to not validate is beneficial

fnord123 · on July 5, 2020

If you have so much control over the JSON and performance is a big deal, then there's a big chance you can get rid of the JSON in favor of a more performant format.

beached_whale · on July 5, 2020

That isn't true though. It's the lingua franca of data transfer. Also, why are we giving away money because of this view? We are stuck with JSON for better or worse.

I've seen parsers where the same real task, say parse, map, and reduce a 100MB of doubles encoded as JSON. In a decent library it is taking much less than half a second and very little memory, say the size of the result and the memory mapping of the JSON document. In many common libraries it takes multiple seconds while using hundreds/gigs of memory. That means one has to pay for bigger machines with more uptime per task. That is giving money away.

fnord123 · on July 5, 2020

If you want performance and you control both ends to the point that you don't want to validate the format, why are you using a lingua franca and not a specific solution? You could swap it out for bson or msgpack.

A little ETL goes a long way.

beached_whale · on July 5, 2020

For sure, if you can control both ends, use something more efficient. But, at least in my cases, I am consuming data from others.

dharmon · on July 6, 2020

So you are consuming data from others yet assuming it to be perfect and not in need of validation?

bonestormii_ · on July 5, 2020

> It's the lingua franca of data transfer.

Fortunately, this is really only true at the application level. Space/line delimited byte strings and binary formats are used frequently at lower levels.

beached_whale · on July 5, 2020

Interesting data sources and even RPC for remote things are in JSON though. But, I am only arguing that it shouldn't be slow because JSON is "slow", make the best of what we have.

waltpad · on July 5, 2020

> 100MB of doubles encoded as JSON.

That sounds like a very bad use case for JSON. I would be surprised if your program wasn't more efficient with an ad-hoc binary format for that piece of data.

beached_whale · on July 5, 2020

It's really hard to get away from getting data in JSON these days. It's ubiquitous.

It was a bit contrived true, but when some are doing it in 0.1s and many are in the 1.5-2s range, and that was parse time, not loading the data or startup. If I would go binary it would probably be something like protobuf, not ad-hoc. Ad-hoc has issues with maintenance, interop, and tooling.

waltpad · on July 9, 2020

> t's really hard to get away from getting data in JSON these days. It's ubiquitous.

Yes, I agree with that. It's a neat format if you have small pieces of information to move around, and it's very easy to read for humans, but for large enough data, wouldn't it turn into a bottle neck?

> If I would go binary it would probably be something like protobuf, not ad-hoc. Ad-hoc has issues with maintenance, interop, and tooling.

Indeed, there's always these dimensions to take into consideration, as well as evolution. The main issue is to find a library/format which is well supported across all sort of languages, and JSON has that. I don't know if there are many binary formats with the same level of support.

I suggested ad-hoc because the format seemed simple enough to be mmapped directly, or something equivalent (not sure how scripting languages would do in that case).

mister_hn · on July 6, 2020

The Lingua franca of data transfer until today is the unsigned byte array, which is understood from any major programming language existing up to today.

JSON is just an higher abstraction layer which might introduce errors.

mister_hn · on July 6, 2020

If JSON is always perfect (under your control), why then using JSON and not something like Protobuf or Thrift and use byte arrays?