AMD's cores have SMT, allowing them to run two threads at a time and appear to the OS and its scheduler as two logical cores despite being implemented as a single physical core.
What pattern in the data shows that's what's being measured? I would expect to see basically 0 latency between adjacent "cores" then since L1 is shared per thread?
Co resident threads might not get any speed up here since coherency instructions are functionally operations on the L2 cache.