This is, in all honesty, a solved problem in any reasonable build system. (And I have little patience left for people making life hard for themselves through their own choices.)
If I'm not misremembering that case then, it sounds like this should've never been an issue (well, as long as this is after basic version control and make). Curious if I'm missing something.
On the other hand, if your parser combinators process char-by-char, then maintaining a small "is this valid UTF-8 so far" context on the side should be pretty simple, so providing it would be an useful option, but actually decoding? Please don't.