You can get this even lower, without the need for the extra promotion bits, if you make a note of a few things:
1. There is still redundancy in that there are multiple pieces of the same type, and if you permute their locations you get another representation of the same state.
2. You don't need to say that some pawn's location is the king if it's en passant. You can just use the back row, which pawns can't get to.
Using this, you can use the permutations of the pawns to store extra information. For instance, all of the pawns will be on the board at squares #a, #b, ... . Since pawns themselves have numerical indices, you could say that the "canonical" representation of the board state has all of the pawn locations sorted in ascending order. Then, different permutations of that can carry information about promotions and which of the "pawns" are really queens, etc.
OP here. Thanks for the suggestion! I really like the idea of there being redundancy for pieces of the same type. If I understand correctly, when a knight is in a particular square say it's not relevant whether it's the king-side or queen-side knight. I believe you still need to keep the ordering for the rooks due to castling, but for the knight and bishop you can store 1-bit each in the ordering. With this we end up with 7 bits for each side, thus below 26 bytes overall!
Let me ponder a bit more on the pawn locations. We do need the pawn ordering since the we've encoded the string representing the promotions in sorted order. In the case of en passant we know exactly where the pawn would be, and while we can use the back row, I haven't quite figured out how to use this to encode reliably.
> In the case of en passant we know exactly where the pawn would be, and while we can use the back row, I haven't quite figured out how to use this to encode reliably.
If a white pawn in x4 (for x between a and h) can be captured en passant, you encode its position as x1. When decoding, if you see a white pawn in rank 1, you know it can't be there, so you place it in x4 and flags it e.p. Black pawns are analogous.
What I meant to say was "I haven't quite figured out how to use this to encode reliably and get an improvement vs simply using own king's position".
If we can guarantee that the pawn stays in its own file then I see a path for improvement (by having the actual position vs back row usage as a 1-bit toggle), but this is not broadly the case due to movement across files on pawn captures.
My method doesn't require any additional bits for the e.p. and is compatible with all the other techniques listed in TFA.
Using own king's position loses the info of the file. Iff you guarantee that the pawns are listed in order, this is not a problem. But later in TFA a permutation or sorting of pawns is suggested, which would then mess things up.
Please note that a slightly modified version of my method, where a pawn that can be captured e.p. is swapped (not merely moved) with whatever is in its corresponding back rank before any other encoding takes place, is compatible with all the other clever techniques suggested in the comments.
OK I see what you mean, I admit this wasn't made completely clear in the post. For en passant using the own king's position, the file where the pawn can be captured en passant is NOT reordered whereas other pawns, captures and promotions are reordered.
Yes using the back row works too! I was trying to see if we can get an improvement on the 18 additional bits needed from the post (or 14 additional bits by taking advantage of knight and bishop ordering).
These discussions have been great, very much enjoying seeing the incremental improvements!
Edit: I did another pass, item 4 in the notes did mention this.
> [4] For en passant we need the pawn to remain on its home file. Hence we exclude the pawn from this step if it can be captured en passant. Captures can appear on any file.
1. There is still redundancy in that there are multiple pieces of the same type, and if you permute their locations you get another representation of the same state. 2. You don't need to say that some pawn's location is the king if it's en passant. You can just use the back row, which pawns can't get to.
Using this, you can use the permutations of the pawns to store extra information. For instance, all of the pawns will be on the board at squares #a, #b, ... . Since pawns themselves have numerical indices, you could say that the "canonical" representation of the board state has all of the pawn locations sorted in ascending order. Then, different permutations of that can carry information about promotions and which of the "pawns" are really queens, etc.