[…]
Real-life CSV data is usually consistent. What I mean is that tabular data often has a fixed number of columns. Indeed, rows suddenly demonstrating an inconsistent number of columns are typically frowned upon. What's more, columns often hold homogeneous data types: integers, floating point numbers, raw text, dates etc. Finally, rows tend to have a comparable size in number of bytes. We would be fools not to leverage this consistency.
So now, before doing any reckless jumping, let's start by analyzing the beginning of our CSV file to record some statistics that will be useful down the line.
[…]
Anyway, we now have what we need to be able to jump safely”
‘Safely’. An attacker who has control over a row in that file can easily embed data that satisfies the statistical checks, thus injecting data.
The author also admits that, saying “This technique is reasonably robust and will let you jump safely”
I agree with “reasonably robust”, but not with “will let you jump safely”.
This is clearly not the sort of thing you should expose to anyone, it is an optimization technique. The same way you would not use a fast but DOSable hash function for your hashmap.
Used statistics are a bit different though.