the very first thing i usually do when wanting specific assembly like SIMD, is to grep output for the instructions i expect there to be (like how you found it exactly). That way if there's any surprises (usually my flawed understanding more than actualy problems like this example) i will be alerted to then right away.
interesting writeup. definitely need to be careful and inspect the machine code! cant be understated