The statistics are 9th-order (of 3x3 blocks of pixels) but of a simple form which are hardly more expressive than 2nd-order nearest neighbour statistics (in terms of the different textures that they can reproduce) which are well known. In the approximate case where you only care about the average value of each pixel I think it would collapse to 2nd-order. Texture synthesis with MRFs with local statistics is discretized (in space) Turing reaction-diffusion. I did my PhD on this topic.
Probably the most influential early paper on this kind of simple texture model, where you will see similar patterns, is:
Cross & Jain, 1983, PAMI, Markov Random Field Texture Models
Trying to trade space for time, I used a model that gives every cell a set of all 512 of the possible 3x3 neighborhoods that could have caused that cell's present state ("alibis"). It then goes to each cell, comparing its alibis to those of neighboring cells and eliminating mutually impossible ones from either set. This step has to be repeated until no more alibis are shed in a pass.
When it finally stabilizes, the model is a solution kernel that can then field further manual guesses against it. If a cell's alibis all agree it was dead in the "before", there's no need to guess, but if they're not unanimous, what if we hazard a guess one way or the other for a bit? How does that ripple through the rest of the board? If any of the cells ran completely out of alibis given a certain guess, that guess was clearly not a proper solution, and it's time to back out and try a different one. If there's no solution at all, that's a Garden of Eden.
Ultimately I wanted to generate not just one solution, but all the solutions for a given board. I got stumped because I wasn't convinced I wasn't still working in 2**(n*m) time or worse trying guesses against the kernel.
It's a really fascinating problem, so much so that I even made a pico8 game about it years ago! Even the 6x6 grids are really tough!
"its like showing a solved rubiks cube and asking what the scramble was"
^ this analogy may be the best I've seen in a long time.> First of all, while I said “Predecessorifier” in the talk, “Ataviser” seems to be the accepted word, coming from “Atavism”, which the online Merriam-Webster dictionary defines as “recurrence of or reversion to a past style, manner, outlook, approach, or activity”.
It would probably work even better if you tweak the loss function with some kind of averaging/blurring filter.
(Not saying the goal was working well and being fast to implement.)