1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines 2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline. 3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section
Here is what it looks like:
10010000 = A
01100110 = B
00001001 = M1 = ~A & ~B
00101010 = M2 = M1 + (A << 1) + 1
00100010 = M3 = M2 & ~M1
Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.statement meaning string from first non-space till next EOL or EOF.
Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.
Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!
That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.
Falvyu's and bremac's solution seems to be the best.
https://gist.github.com/zokrezyl/8574bf5d40a6efae28c9569a8d6...
[5] codereview.stackexchange.com