You need a compact encoding for chess engines that explore billions of states as...

solardev · on Sept 15, 2023

Well, this is arguably a kind of compression, right? So you'd be trading CPU time for fewer bytes? Is that a desirable tradeoff at chess engine scales?

hotnfresh · on Sept 15, 2023

Bit packing/mapping et c. isn’t compression the way you’re thinking. What it is, is concise. It requires that the program know what each bit means, rather than telling the program what a value means (as a json structure might), so it shifts where meaning is assigned strictly to the program—a convention must be encoded in the program, not figured out at runtime, though technically you could turn this back into a form of config if you wanted, it just wouldn’t be jumbled up with your data—but it doesn’t really compress the data itself. It’s just efficient at representing it.

[edit] shorter version of the above: it stores the values, but doesn’t store what they mean.

somenameforme · on Sept 15, 2023

It's not compression in the normal sense of the word. Most parsing is directly to data. So e.g. you know the square of some piece is the next 5 bits. In languages that allow it you can cast directly from the next bit offset to an e.g. byte. This is going to dramatically faster than parsing much more loosely structured JSON. As database sizes increase you also get worse performance there, so it's a double hit. So with these sort of representations you get orders of magnitude faster and smaller. Sometimes there really is a free lunch!

Also I'd add the sizes involved here are kind of insane. I wrote a database system that was using a substantially better compression that averaged out to ~19 bytes per position IIRC. And I was still getting on the order of 15 gigabytes of data per million games. Ideally you want to support at least 10 million games for a modern chess database, and 150 gigabytes is already getting kind of insane - especially considering you probably want it on an SSD. But if that was JSON, you'd be looking at terrabytes of data, which is just completely unacceptable.

MenhirMike · on Sept 15, 2023

To give you an example, the Syzygy tablebase for all endgame positions with 7 pieces remaining is 18.4 TB. The estimated size for 8 pieces is 2 PB.

There are different applications for different things: If you want to host a website with real-world tournament results involving only humans, you probably can get away with using more bytes. But if you're writing an engine that uses pre-computed positions, you want to be as compact as possible.

https://en.wikipedia.org/wiki/Endgame_tablebase#Computer_che...

I did laugh a bit at this bit because "conventional server" and "64 TB RAM" is hilarious to think about in 2023, but will probably be the base config in a Raspberry Pi in 2035 or so:

> In 2020, Ronald de Man estimated that 8-man tablebases would be economically feasible within 5–10 years, as just 2 PB of disk space would store them in Syzygy format, and they could be generated using existing code on a conventional server with 64 TB of RAM

dekhn · on Sept 15, 2023

CPUs excel at decompression; more than one engineer has remarked to me that the fastest way to populate a cache is to decompress data in-line.

cylon13 · on Sept 15, 2023

Is your assertion that it takes more time for a CPU to read values out of a 30 byte struct and do a couple shifts and branches than to parse a JSON representation?

hutzlibu · on Sept 15, 2023

The JSON needs to be parsed only once. Then it is (or can be) just any object to your liking.

JSON is for storing data as text. Not work with that text all the time.

croes · on Sept 15, 2023

It's not just reading, you need to process the data to get castling availability, en passant target etc.