First the simpler version of this problem. After 50 full moves without a capture or pawn move, a draw MAY be claimed. After 75 moves, a draw MUST be claimed. This requires a count to be kept that may require up to 7 more bits.
The bigger problem is draw by repetition. If a position repeats exactly (same castling and en passant options) for a third time, then a draw MAY be claimed. If it repeats exactly for a fifth time, then a draw MUST be claimed. (Usually it is claimed on the third time, but you don't have to.) Applying this rule correctly requires not just knowing the current position, but what positions have occurred previously, and how often. Back to the last pawn move, capture, or change in potential castling status. This may require (per the first rule) knowing what up to 75 different past positions were.
The best way to store this history is almost certainly not as a list of positions, but as a history of moves. But, even if done efficiently, we will need more bytes for that history than we needed for the position.
It does not fully capture the history needed for determining future claims of draw by repetition. But by definition, the position fully captures the position.
The notion of position used by the FEN notation [1] includes the board diagram, side to move, castling rights, en-passant options, as well as the number of halfmoves since the last capture or pawn advance, and the total number of moves. The last one or last two are often ignored in everyday notions of position.
[1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...
If a game, you might also include timers or other state as well, including full position history.
Wow I don't know any online or offline platform which draws on five fold repetition. Didn't know that was a thing at all!
0: https://lichess.org/@/revoof/blog/adapting-nnue-pytorchs-bin...
Some systems just have to be observed in order for solutions to be optimally designed around how they actually behave, rather than how they theoretically behave.
Chess Position Ranking provides a 153 bit encoding but it's very slow to decode.
If the encoding only needs to work for the set of positions occurring in some database, then there's almost no limit to the number of coding optimizations one can make (until the encoding just becomes an index in the set of all unique db positions).
The encoding is just the index number of the game + move that resulted in that position.
Edit: Here are the Notes
0 Empty
10 Pawn
1100 Knight
1101 Rook
1110 Bishop
1111 Queen
32 + 32 + 472
2 times 6 bits: position of the kings
30 bits: color mask
120 + 2*6 + 30 = 162 bits
We can store the rest using the methods from the blog post and add 18 bits for promotion, giving 180 bits.
I'm sure this isn't the most efficient way, and I think I had other methods and considered things like the bishops being able to occupy 32 squares, though special casing doesn't make sense because of promotions.
Technically if you got 8 bishops/queens/knights/rooks You would occupy another 16 bits, giving 196 bits
I think the upper limit can be reduced at the cost of increasing the lower limit
EDIT2: I think I made the assumption at the time that to promote one piece you needed to capture at least one enemy pawn, giving the space for the two bits, which means the upper bound is actually 180 bits
Would love to see other people try in the comment section
Also if you're encoding the king as a position instead of a byte sequence you would have to encode their space as empty, that's an extra 2 bits
In the case of (b) there are now fewer pawns that can be promoted, and so worst case, we have to pay a budget of 1 bit per each of 8 promoted pawns.
So I think maximum required bits is only 162 + 8 = 170?
I'm trying to prove that is the worst case, but there are just too many cases. I guess I'll try to use a program o brute force it or just forget about it.
So then we need maximum 182 - 4 = 178 bits
EDIT: Equivalently, we could suffix each non-empty piece in the sequence with an associated color bit
So for each 4 pawn cluster, 1 pawn takes another pawn, and the net result is +1 bit once the captor promotes. The remaining 2 pawns in the cluster each need 2 extra bits when promoted => 2 x 2 = 4 bits. So 5 bits per 4-pawn cluster, of which there are 4.
So maximum representation would be 162 + (5 * 4) = 182 bits?
But if we store castling state, I think we are already trying to store the whole game state, and this is representable only by full history because of move repeating rules.
So, I think storing board state as a sequence of moves is more interesting. I would estimate that a number of possible actions on average is closer to 8 than to 16, so it would give as 3 bits for half-move and 6 bits for full move. 24-move game could be represented with 18 bytes, which is considerably lower than 26 bytes!
You can get close to average bit per move, if you reuse "spare" places for the next move. So, for instance, first move have 20 possibilities, which is representable by 5 bits, but you can reuse "spare" 32-20=12 possibilities as a bit for the next move.
This is a representation assuming you use only "move validator" thing that returns a list of possible moves. I think that if you use a chess engine that would output you a probability distribution of possible moves, you can compress noticeably better on average, but decoding would be slow.
[edit] This made me look for articles estimating this and I found this one [1] which confirms the above is in the right ballpark. Actual study (according to the article) says 4.822 x10^44 is their upper bounds
[1] https://chess-grandmaster.com/how-many-possible-chess-positi...
That and therefore doesn’t follow. As a counterexample, consider a NIM (https://en.wikipedia.org/wiki/Nim) game starting with a googolplex number of piles of size 1. That has way more board states than chess or go, but is easily solved, as the game is trivial.
But each position can only be used once, so you really only have 64*63*62*61*...*33 possibilities = 64!/32! = ~2^53, so you could encode this with less than 7 bytes, and then use the basic 12-byte encoding for captures, castling, en-passant, and promotions, and you are below 19 bytes in total. (Did I miscalculate this?)
Also pawns can never be on the 1st or 8th rank so you can subtract those possibilities from each of the numbers that represents a pawn.
You also don't care about the order of bishops, rooks, knights, and especially pawns - so you should be able to do even better.
Also castling can be encoded as extra squares available to the rooks only ("queen side never moved" and "king side never moved"), and that is almost free.
But without a good encoding for promotions, I doubt you can beat the encoding of the article.
https://duckduckgo.com/?q=log(64!%2F32!%2C+2)&t=ffab&ia=calc...
It misinterprets "log(64!/32!, 2)" as "log(((64!) / (32!)) .2, 10)" which seems absurd, why would you use the comma as both an argument separator and a decimal place??
(Why was I using DuckDuckGo as a calculator? I do in fact keep a Casio scientific calculator on my desk, but I recently bought a new one (a Casio fx-991cw) so that I wouldn't have to keep moving the first one between home and work, but the new one doesn't have an obvious factorial function and I gave up looking - it is much worse than my fx-991es in many other ways as well despite looking superficially newer and better, so I can not recommend the fx-991cw).
Then he introduces the other method (signify that castling is allowed by saying the rook on that side is on the same square as the king) and with that method he doesn't need any extra bits for castling rights.
Edit: it would be better on average to keep the castling bits, and omit the positions of kings and rooks if castling is possible. But that's variable length and it's simply 4 extra bits in the worst case.
On the other hand, there are 32 pieces (max) on a chess board, not 16, so grandparent is off by a factor of more than two.
Since you could move either rook somewhere and then back to their starting squares, you have to track their eligibility separately. If the king moves, both rooks lose eligibility.
For example, you technically don't need to track castling availability. If you're storing the entire match as a set of positions, you can deduct that by replaying the previous positions. A quick search seems to indicate that an average chess match runs for about 40 moves, so replaying all previous positions isn't that bad, on average.
If you need to store millions of chess matches, being able to store them in ~1kb each might be more important, compared the overhead of unpacking each state. If you need to query for certain positions across all those matches, maybe less "compression" is desired.
I always enjoy articles about how people store data and how they think of capturing states, but I also like to know the context and how that data is use or queried.
BTW 495 can be computed as a binomial coefficient C(8+5-1,5-1), the number of combinations of 8 elements chosen with repetitions from 5 elements.
I think that you need one extra bit, that can contextually encode "rook has moved" or "en passant available".
If the king moves (say we're playing the Bongcloud), qR and kR get set back to their original squares, a1 and h1. Now if the King slides back, castling is off the board.
Then, as soon as the rook is moved, it gets its actual positional value. If it moves back later, the positional value will be that of the rook's starting position (guaranteed different from the king's current positional value as the two pieces can't overlap).
Likewise, the position of a pawn can be assigned the king’s position if it has made the double move. You know it’s actually in the legal file and in which rank it sits after the move.
How can we eliminate two bytes?
We can more cleverly encode the mask so that two bytes are used to express the all-pieces-present, and certain other cases. A fully decoded mask is only used for boards that have too few pieces to break over 26 bytes.
For instance suppose we have a two-bit header encoding several cases:
00 - no piece are missing (32 six-bit coordinates follow)
01 - one piece is missing, followed by 5 bit ID of that piece
10 - two pieces are missing, followed by two 5 bit IDs of the two pieces
11 - three or more pieces missing, full mask follows.
In case 11, we need a 32 bit mask, followed by as many as 29 coordinate pairs.
So 2 + 32 + 6 * 29 = 208 bits.
And hey look, by dumb luck, 208 / 8 = 26.
I will check the article later.
However, am I mistaken? The article appears to be neglecting to record one bit of information: whose turn is it for this board position, black or white? If you're going to care about whether castling is available and other such game state, that is not complete without knowing whose turn it is for that position.
Method 1: Lichess method; 64 bit header, 1 bit per square indicating occupied squares, then (up to) 32 4 bit codes for the 32 occupied squares. So 24 bytes worst case!
Method 2: My own method a list of 2 bit codes, one of the 4 codes indicates empty square, the other three codes are prefixes for a second 2 bit code. Three prefixes applied to one of 4 code values gives a total of 12 possibilities corresponding to any possible chess piece. Worst case 32x2 bits plus 32x4 bits = 24 bytes.
In each case there is sufficient entropy to create tricks to add the supplementary information (who to move etc.), similar tricks are in the original article.
I mention my own method from my Tarrasch Chess GUI https://github.com/billforsternz/tarrasch-chess-gui only for completeness. If I had known about method 1 I would have used that, it is simpler and better and there is much more entropy available making the tricks easier.
I would urge commentators to keep clear the difference between compressing a chess position (this challenge), a chess move and a chess game. A chess move needs far less bits of course. A complete chess game is always encoded as a list of moves, not positions, for this reason.
Edit: I should have mentioned that the chief advantage of method 1 over method 2 is average bits required. An empty board is 64x1 bits = 8 bytes for method 1 and 64x2 bits = 16 bytes for method 2.
Edit 2: I am going to describe my tricks, just because they are fun. Two kings the same colour means Black to move. Two White kings means the first king is white, two black kings means the first king is black. Otherwise White to move. A friendly pawn on the first rank means an enpassant vulnerable pawn. Swap it with the 4th rank square to get the actual contents of both squares. A hostile pawn on the first rank is an unmoved rook or king, establishing all castling rights. The castling and en-passant tricks can be combined in a pleasant and harmonious way.
So I wasn't saying the article is wrong, just engaging in a bit of intellectual exercise.
My count was: there are 16 pawns, each could be in one of 64 places or not on the board, so 6+1 bits per pawn for that, and each could be promoted to a knight a bishop a rook or a queen, so 4+1 bits per pawn for that, and one of them might have just moved two spaces, so 3+1 bits for that. And the other 16 pieces don't change their nature so we only need to know where they are, so 6+1 bits for each of those, plus 3 bits for each of the rooks and the king to tell if they have already been moved. That's.. 40 bytes actually. Im really curious how they got it in 26.
Update: I see! They used illegal positions to store all the special statuses. Very clever!
And four promotion possibilities wouldn't be 4 bits, it would be 2. But even better is 5^16 squeezing into 38 bits.
Combining those cuts your strategy down to 192.7+37.2+4+6 bits which is 30 bytes.
The main savings the article has is being smarter about how to store promotions. And then it uses some tricks by swapping pieces to encode extra bits.
(But it turns out that storing which tiles are occupied, then listing each piece in order, works better than encoding locations.)
Therefore (65/64)^32 is roughly sqrt(e). Since 1 < e < 4, 1 < sqrt(e) < 2. So 65^32 < 2 * 64^32.
Hmm as a bit field followed by piece order- that would be 8 bytes followed by a variable number of pieces, perhaps you could do a sort of compression where 0c means pawn and 1xxxc is any other piece. (c stands for a color bit). So thats another 14 bytes. Thats 22 bytes!
The xxx by the way is one of 8 things: k, k not moved, r, r not moved, n, b, q, en passant pawn
With 6 xxx states instead of 8, you can merge "king not moved", "rook not moved", and "en passant pawn" into a single state since those can never overlap. (Though if you're encoding one pawn the long way that means you need 14.125 bytes, oh well.)
TLDR: You stupi...lovely folks, need learn what a bit is. What is a byte, and why integers are NOT BITS.
And no one called this post out from years ago? Crazy.
Someone ask, and I will show you how to do it in 24 BYTES IN BINARY ENCODING FOR REAL - no fake "integers are bits". And can we use compression techniques? We can get this to 10-12 bytes maybe.
So it beats the fake "26 bytes" and my version is actually real binary bits.
Ask.
> This gives us the string `00000034` to uniquely represent this specific set of promotions, without information loss.
> How many possible strings are there? Generating this by brute force, we end up with 495 distinct strings
> This can be stored in 9 bits for each side
Hint: 2^9 is 512 and 512 > 495
And if you look at article, nothing is binary encoded, they are all integer representations all the way down.
Please someone show me a BIT implementation of this - THESE ARE BITS 0 1 0 1 1 0 - It's called BINARY. There are no 9, or 5, or 3 or 4.....That isn't how logic gates work.
A 3 / INT is 8 BITS...1 BYTE.
HINT: I'm right.
And you never answered my question:
"Has anyone implemented this with actual bitwise operations instead of integer packing?"
Still waiting to see these "9-bit" "bytes"."00000034".
Again, show me. There is no such thing as a 9-bit byte, that isn't how CPUs or computation work. ITS 8 BITS 1 BYTE, that is transistor / gate design architecture.
This is how computers work.
If you have 9 bits, you have 2 BYTES!!!!!
BYTE 1: 00100011 (8 bits) BYTE 2: 10000000 (9th - 7 wasted bits)
= 16 bits, 2 BYTES THANK YOU GOODBYEEEEEEEEE
BUT
In the article they don't mean that 00000034 is a bit or a byte. It is one of the possibilities and there are 495 of them and if you index each possibly in a 2 byte integer, you can decode it back to that string and get a representation of the promotions that happened in any game.
And you actually cannot store this in 26 bytes based on your implementation, and then you show integer bits and bytes that aren't even binary...eh.
And to be honest, like how about we store the chess position in 1 bit.
I will execute some chess position program in 1 bit, ON / 1. How about that for ultra compression? Lets just pretend those other random bits and bytes don't exist I mean (...but they do...) - it's stored somewhere else, but "HOW TO STORE A CHESS POSITION IN 1 BIT" - but ok, fine I will play "pretend" |How to store a chess position in 26 bytes (2022)|
You know what I mean? lol
With a few bytes more more you can create an implementation that is a lot easier to understand. Bytes are cheap, developer time isn't.
https://news.ycombinator.com/newsguidelines.html
I'd especially hammer the point in this case, because clever hacks are very much on topic for Hacker News. They are, in fact, what gave birth to the word hacker and the idea of hacking in the first place. Not only that but it was precisely the clever hacks with no particular utility that were prized most highly!
In Stockfish, there will only be one full-fledged board state in memory per search thread. So the size of the board state is pretty much irrelevant to performance. What's important is reducing the overhead of generating possible moves, applying those moves to the board state, and hashing the board state, which is what magic bitboards are for.