To be clear I think GAs are way cooler though haha. So kudos to you for this awesome write up
"Physics simulation involves discontinuities (contacts, friction regimes), long rollouts, and chaotic dynamics where small parameter changes lead to large outcome differences. Even with simulator internals, differentiating through thousands of unstable timesteps would yield noisy, high-variance gradients. Evolution is simpler and more robust for this regime." "The real tradeoff is sample-efficient but complex (RL) vs compute hungry but simple (GA). DQN extracts learning signal from every timestep and assigns credit to individual actions."
DQN likely would have handled this much better.