If I'm not mistaken SGI (Silicon Graphics, Inc.) was already doing that to prevent regression 40 years ago: maybe not SHA but they were taking "screenshots" of the entire screen at a time t and some kind of checksum to then verify (without having to compare every single pixel in the happy case) that enhancement/optimization to their rendering pipeline not supposed to change the output indeed did indeed generate the exact same image as before.
It's basically a 40 years old technique: not too sure what's that wild about it.
Edit: I'm not saying no one does checksums to compare files (lol). I'm saying no one takes screenshots at specific timestamps within an app or game's lifecycle and then compares them to ensure they're identical.
Edit 2: Whoops, looks like I'm wrong and this is apparently a pretty common thing (but not at the startups I've worked at, /shrug). I still think it's cool that Codex is doing it without being told to, though.
Everyone does this to match files as identical, be it sha, md5, or something else. I cannot imagine any other method such that it would first come to mind easily you would be doing to check if two files are the same.
I don't mean to offend but I quite literally mean everyone does this. Every software updater, game patcher, checking if two binary files are identical (pixel perfect/lossless in this case: BMP, PNG created by same encoder off same inputs would qualify, JPG would likely not), all of them do exactly this.
GPT-Analysis or a similarity and image chunk hashing would not be the first thing you turn to if what you wanted was exact identical pixel perfect. I am curious what your background is if this is the case.
No one that I've seen takes automated screenshots of webapps or games or what have you at pre-determined timestamps to make sure the app looks pixel-identical with every change.
(regardless of the method; the SHA'ing isn't the point here, the point is that it's a shortcut instead of "inspect the image for any regressions", since we don't need to inspect the image at all if it is identical)
I'm confused. We have done this at every place I have ever worked, it's very standard. Set timestamps, post-action, pre-action & on dozens to hundreds of combinations of OS and rendering engines. This includes pre LLM, using similarity and perceptual hashing, screenshot-ing single DOM elements during hover and off hover, both fuzzy and pixel perfect.
Have any of your places used a service such as Saucelabs or Browserstack or rolled their own similar inhouse, or seen such as https://percy.io/how-it-works (random example; not affiliated or recommending this)?
I am hope I was not being too rude about it, not my intent, mostly surprising to me because a service like Browserstack is a decade and a half old already and the concept predates that.
My first software job out of college was actually a QA Automation / SDET position, wrote an automated framework with Ruby + Selenium + Browserstack which did take screenshots of the app, but the app loaded dynamic content and there were frequent feature adjustments so no two screenshots were ever identical.
All other jobs I've had since then have been writing smart contracts for Ethereum apps - 100% backend, (I hate having to deal with frontend) so all our tests were just units & coverage & what have you.
I suppose if your environment holds constant and your features don't change frontend structure or behavior (eg refactors), then this is what you should expect.
Though, do note that this only works because my app is based on a tick/game-loop system without callbacks; if this was the standard game-development pattern of callbacks & message handling (especially w/ React / JS) to invoke events, it wouldn't work, because the timing would be slightly different each time, and an enemy would be a few pixels to the left/right of its position in the past run.