Yeah, this is pretty much it. The author complains about CSVs being "notoriously inconsistent" as though switching to some other format would magically change that. They're only inconsistent because sometimes lazy programmers do ",".join(mylist) instead of using an RFC4180 compliant CSV writer. Lazy programmers will just use non-compliant methods of creating whatever magic format OP is dreaming about. Case in point: trailing commas in JSON objects, and other ridiculous things that people have come up with such as encoding a date in JSON like this: "\/Date(628318530718)\/" https://docs.microsoft.com/en-us/previous-versions/dotnet/ar...
CSVs also are great because you can parse them one row at a time. This makes for a very scale-able and memory-efficient way of processing very large files containing millions of rows.
Let there be no mistake: Everyone reading this today will retire long before CSVs retire. And that's just fine by me.
>CSVs also are great because you can parse them one row at a time. This makes for a very scale-able and memory-efficient way of processing very large files containing millions of rows.
Even RFC4180-compliant CSVs can be incredibly memory-inefficient to parse. If you encounter a quoted field, you must continue to the next unescaped quote to discover how large the field is, since all newlines you encounter are part of the field contents. Field sizes (and therefore row sizes) are unbounded, and much harder to determine than simply looking for newlines - if you were to naively treat CSV as a "memory-efficient" format to parse, you would create a parser that would be easy to blow up with a trivial large file.
Great practical note by Josh Berkus on why Uber left Posgresql. Basically: runaway table bloat because Uber had a usecase that postgres doesn't address as well as InnoDB.
The whole VACUUM paradigm is the biggest thing that bugs me about pgsql. The fact that it can actually freeze things always worries me. Can’t this happen constantly in the background like modern GCs?
> I think just a mere existence of flu or cold was a mistake. We should have eradicated those years ago.
It's not clear to me how eradicating these would have ever been possible in the past, or will be in the foreseeable future. The flu has (probably) been around since at least 6000 BC, and numerous strains can be spread by birds & many other species.
No, a significant part of it is that immunity is temporary. The article describes a particular cold virus where immunity lasts about 40 weeks, hence it resurges every winter.
To be fair, python isn't the only language whose package management system is all but incoherent to folks who don't use python every day (and sometimes even to them!). npm is pretty rough to get setup too, and you run into a lot of issues similar to this.
I don’t think Node has trouble from the very first installation. A brew install will set you up with the latest version of both node and npm and they will work.
At most you’ll have trouble running the right version (rare nowadays, unlike in pre-v1 days)
> What are the tradeoffs vs using docker? Just curious.
Probably some combination of memory usage and complexity, depending on your application. If you're already familiar with using docker as a development environment, definitely go for it.
I don't use pipenv, I'm still using plain old virtualenv for development. Mostly it's just a matter of familiarity. If there's not an itch, why scratch?
Certainly not, but considering there should be 100M available by the end of the year, it seems like they are scaling up the production line considerably
Honestly I bet it's a learning algorithm that they used to identify bots, rather than something a human decided.
Spam detection algorithms usually have a training set. One of the features could have been "uses_feature_x", which, according to the training set, is known to have a high probability of being a spam bot (because humans rarely use those features)
CSVs also are great because you can parse them one row at a time. This makes for a very scale-able and memory-efficient way of processing very large files containing millions of rows.
Let there be no mistake: Everyone reading this today will retire long before CSVs retire. And that's just fine by me.