It's seriously a very interesting and useful dataset that you can do a lot of fun stuff with, if you grab one of the zims without pictures it's of very manageable size too of just a few dozen gigabytes compressed, and there are reasonably good library support in many languages.
Last point doesn't go for Java. Only one I could find for that was this <https://github.com/openzim/libzim>, it's antique and extremely poorly optimized and lacks support for newer compression schemes. I have fixed the performance and added support for zstd compression, but not published the code as it's extremely not finished and major features in the original codebase are very broken. I'll get around to sharing the code some day but right now it's basically permanently mid surgery as I've only patched so far as to get it to extract all or specific files. If anyone wants a copy of this code regardless of state, give me a holler.
Russia has already written to the Wikimedia Foundation demanding that they take down Russian Wikipedia's well-sourced and factual article on the cough special operation. Wikimedia said "lol no," of course.
They are getting a lot of traffic from the tjournal.ru domain looking at the stats. They seem to host a Russian article explaining how to download Wikipedia, expecting it soon to be blocked. You can see HTTP Referrals here: https://stats.kiwix.org/index.php?module=CoreHome&action=ind...
Curiously, that's the relationship between the first and second highest frequencies for the Zipfian distribution. However, third place and beyond are much smaller than they should be under that distribution.
It's seriously a very interesting and useful dataset that you can do a lot of fun stuff with, if you grab one of the zims without pictures it's of very manageable size too of just a few dozen gigabytes compressed, and there are reasonably good library support in many languages.
Last point doesn't go for Java. Only one I could find for that was this <https://github.com/openzim/libzim>, it's antique and extremely poorly optimized and lacks support for newer compression schemes. I have fixed the performance and added support for zstd compression, but not published the code as it's extremely not finished and major features in the original codebase are very broken. I'll get around to sharing the code some day but right now it's basically permanently mid surgery as I've only patched so far as to get it to extract all or specific files. If anyone wants a copy of this code regardless of state, give me a holler.