Codex's precision and attention to detail is *crazy* when set up correctly

4 pointsby ditchfieldcaleb3 hours ago2 comments

TacticalCoder3 hours ago
> But doing an explicit SHA compare - that's just...not something I would've ever thought of. Wild.
If I'm not mistaken SGI (Silicon Graphics, Inc.) was already doing that to prevent regression 40 years ago: maybe not SHA but they were taking "screenshots" of the entire screen at a time t and some kind of checksum to then verify (without having to compare every single pixel in the happy case) that enhancement/optimization to their rendering pipeline not supposed to change the output indeed did indeed generate the exact same image as before.
It's basically a 40 years old technique: not too sure what's that wild about it.
- ditchfieldcaleb3 hours ago
  Sure, it's been done before, and I'm sure not just limited to SGI, but no one does this for regular apps these days - never heard of it before. I just find it neat that Codex came up with this - not something I ever would have.
  Edit: I'm not saying no one does checksums to compare files (lol). I'm saying no one takes screenshots at specific timestamps within an app or game's lifecycle and then compares them to ensure they're identical.
  Edit 2: Whoops, looks like I'm wrong and this is apparently a pretty common thing (but not at the startups I've worked at, /shrug). I still think it's cool that Codex is doing it without being told to, though.
  - kay_o2 hours ago
    > but no one does this for regular apps these days - never heard of it before
    Everyone does this to match files as identical, be it sha, md5, or something else. I cannot imagine any other method such that it would first come to mind easily you would be doing to check if two files are the same.
    I don't mean to offend but I quite literally mean everyone does this. Every software updater, game patcher, checking if two binary files are identical (pixel perfect/lossless in this case: BMP, PNG created by same encoder off same inputs would qualify, JPG would likely not), all of them do exactly this.
    GPT-Analysis or a similarity and image chunk hashing would not be the first thing you turn to if what you wanted was exact identical pixel perfect. I am curious what your background is if this is the case.
    ditchfieldcaleban hour ago
    Not sure if you're getting what I'm saying.
    No one that I've seen takes automated screenshots of webapps or games or what have you at pre-determined timestamps to make sure the app looks pixel-identical with every change.
    (regardless of the method; the SHA'ing isn't the point here, the point is that it's a shortcut instead of "inspect the image for any regressions", since we don't need to inspect the image at all if it is identical)
    kay_oan hour ago
    > No one takes automated screenshots of webapps or games or what have you at pre-determined timestamps to make sure the app looks identical with every change.
    I'm confused. We have done this at every place I have ever worked, it's very standard. Set timestamps, post-action, pre-action & on dozens to hundreds of combinations of OS and rendering engines. This includes pre LLM, using similarity and perceptual hashing, screenshot-ing single DOM elements during hover and off hover, both fuzzy and pixel perfect.
    ditchfieldcaleb41 minutes ago
    Huh! Well, I stand corrected. I've never seen that done (but I've only worked at startups with < 20 headcount for my entire software career so far, so that might be why).
    kay_o34 minutes ago
    Huh. Were they anywhere that pixel perfection was necessary such as games, or required constant browser universal testing for compliance, accessibility, being required to support cross platform?
    Have any of your places used a service such as Saucelabs or Browserstack or rolled their own similar inhouse, or seen such as https://percy.io/how-it-works (random example; not affiliated or recommending this)?
    I am hope I was not being too rude about it, not my intent, mostly surprising to me because a service like Browserstack is a decade and a half old already and the concept predates that.
    ditchfieldcaleb18 minutes ago
    I was wrong & you called me out on it, not rude, all good.
    My first software job out of college was actually a QA Automation / SDET position, wrote an automated framework with Ruby + Selenium + Browserstack which did take screenshots of the app, but the app loaded dynamic content and there were frequent feature adjustments so no two screenshots were ever identical.
    All other jobs I've had since then have been writing smart contracts for Ethereum apps - 100% backend, (I hate having to deal with frontend) so all our tests were just units & coverage & what have you.
    I suppose if your environment holds constant and your features don't change frontend structure or behavior (eg refactors), then this is what you should expect.
    Though, do note that this only works because my app is based on a tick/game-loop system without callbacks; if this was the standard game-development pattern of callbacks & message handling (especially w/ React / JS) to invoke events, it wouldn't work, because the timing would be slightly different each time, and an enemy would be a few pixels to the left/right of its position in the past run.
  - tomjakubowski2 hours ago
    https://en.wikipedia.org/wiki/Checksum
    ditchfieldcaleban hour ago
    ...yes, I'm aware of what a checksum is.
    38 minutes ago
    undefined
3 hours ago
undefined