To cut 2 bytes off (while making it much less elegant), huffman coding could be used to store pawns in 3 bits - to guarantee a space reduction, this relies on the fact that if there are n promoted pawns, there must have been at least ceil(n/3) captures, freeing up space for the longer representation of promoted pawns.
It is necessary to find another way to represent en-passant opportunities (swap an en-passantable pawn with the piece in the 1st or last rank as mentioned https://news.ycombinator.com/item?id=37526523 ).
2 rook representations aren't needed (encode a king with castling opportunities as a knight, bishop or rook, and when decoding, if there are no kings of a given color, look at the piece in the king's starting position) so another 4 bits could be saved by making rook, bishop or knight representations shorter.
A Huffman code like this can also represent knights (or any single other piece) in 3 bits.
000 - black pawn
001 - black knight
0100 - black bishop
0101 - black rook
0110 - black king
0111 - black queen
100 - white pawn
101 - white knight
1100 - white bishop
1101 - white rook
1110 - white king
1111 - white queen
Going with "a pawn in the first row is actually en-passantable, and should be swapped with the piece in the appropriate position if it exists", and "kings are represented as knights when they can castle in either direction, bishops when they can only castle kingside, and rooks when they can only castle queenside", that gets the initial representation down to 64 + 16 * 3 (pawns) + 6 * 3 (knights and kings) + 10 * 4 = 170 bits, with a worst case of 172 bits (when kings lose some castling rights), if I understand it correctly.
Huffman encoding for 21.5 bytes seems to win the day.
> To cut 2 bytes off (while making it much less elegant), huffman coding could be used to store pawns in 3 bits - to guarantee a space reduction, this relies on the fact that if there are n promoted pawns, there must have been at least ceil(n/3) captures, freeing up space for the longer representation of promoted pawns.
Can you explain this in more detail? Curious as to how you could save space with this.
You read the piece data bit by bit. If you see 100 or 000 then you stop right there. You have a pawn, and the next piece starts with the next bit. There's no ambiguity.
So before any captures are made, you have 32 pieces, half of which use 3 bits and half of which use 4 bits. 14 bytes, plus the 8 bytes storing the bitboard.
When you promote a piece it goes up in size from 3 to 4 bits, but you can guarantee there have been enough captures to offset that, so you never need more space than you started with.
I see - so you can use the bits you saved by not having separate rook pieces to denote En Passant pawns.
I like the "no kings" idea for castleable kings - though I think with smaller pawn sizes, that gives another opportunity for compression:
- king that can't castle: king
- king that can castle queenside: any other piece in king's position, no king of that color on board
- king that can castle kingside: black pawn on 1st or 8th rank
- king that can castle: white pawn on 1st or 8th rank
Because you are often going to have positions where the king can castle kingside and positions where the king can castle either way, this should maximize how often you manage to save space w/ the pawns.
Another thought I had (which might contain other problems, not sure yet) is to use pawns on the 1st or 8th rank to denote pieces which are in their starting position - the decompression algorithm can then derive what piece it is based on the known starting position. Once we start having data saving because of pawns taking less bits, we want to be able to use them as much as possible to save space.
> I see - so you can use the bits you saved by not having separate rook pieces to denote En Passant pawns.
You could do it that way, but the particular comment was suggesting two entirely separate ways to save bits, one for pawns and one for rooks.
Specifically, the suggestion in that post is to use 1st/8th rank to show en passant, and to show castling by replacing an unmoved king with a different piece. But there's lots of ways to cleverly encode that information.
For what piece to replace the unmoved king with, you can actually use both white AND black pieces, since the starting rank can decode what color the king is:
- White Knight on E1 and no king: Replace with white king that can castle both ways, E4 pawn can be taken En Passant
- White Bishop on E1 and no king: Replace with white king that can castle kingside, E4 pawn can be taken En Passant
- White Rook on E1 and no king: Replace with white king that can castle queenside, E4 pawn can be taken En Passant
- Black Knight on E1 and no king: Replace with white king that can castle both ways, E4 pawn cannot be taken En Passant
- Black Bishop on E1 and no king: Replace with white king that can castle kingside, E4 pawn cannot be taken En Passant
- Black Rook on E1 and no king: Replace with white king that can castle queenside, E4 pawn cannot be taken En Passant
It's the same for E8, just put a black king and not a white king. Sorry if this is repetitive, I could have probably just explained it and let you figure it out, but I put the cases down here for clarity. This should also be combined with the Knight huffman encoding that was talked about here https://news.ycombinator.com/item?id=37527350
I'm actually realizing - you can simply have a case where if a white/black pawn is on the 4th/5th rank, have the next bit be a flag for en passant. This makes all en passant rules stored in 8 bits at max. Since only 1 pawn can be in a status of "en passantable" at a time, we can make it stop assigning a flag bit for future pawns if the answer is ever "yes".
You can have only one en passant situation on the board at once. This means you can use 3 bits to encode the column for en passant, you check the board to evaluate the legality of the solution and discard if it isn't legal (this means you don't have to use a bit to encode whether it's possible as the encoder is guaranteed to be able to pick a column that it's not possible.)
I think we are both wrong. En passant only exists for one move, whatever has happened from captures is irrelevant. However, now I realize you can have two possible en passant captures at once--a pawn moves to 4th rank between two opposing pawns on the 4th rank.
Yes for each color. That means two pawns for each column can be in position to be captured en passant. And also two pawns can be in capturing position.
Edit: Column combined with whose turn it is will work, but not just column.
8 flag bits is a lot though when we're trying to shave down 20-ish bytes! And we're only worried about the worst case scenario, so assume it's the last pawn in the list. Also you can have at least 10 pawns visibly at risk of en passant at once. If you base it on rank alone, you can have all 16 pawns in position at the same time, wasting 16 bits.
If you're going to explicitly store it, at least squeeze down to 4 bits to pick a specific pawn (and picking one visibly not at risk if no pawn is at risk).
But using clever piece rearrangements is a lot better than spending flag bits.
It is necessary to find another way to represent en-passant opportunities (swap an en-passantable pawn with the piece in the 1st or last rank as mentioned https://news.ycombinator.com/item?id=37526523 ).
2 rook representations aren't needed (encode a king with castling opportunities as a knight, bishop or rook, and when decoding, if there are no kings of a given color, look at the piece in the king's starting position) so another 4 bits could be saved by making rook, bishop or knight representations shorter.