I was surprised standard prompting techniques couldn't coax accurate financial budget line items from frontier models. There are ~250 of these pdfs published every year, each town does it differently, but many towns are surprisingly consistent over the years. It seems like these aren't in training set, and there is enough noise and complexity in layout that nothing could accurately accomplish the task.
Absolutely wild when I think about the code I've coaxed out of Claude Code... in any event, I would now love to really try to automate book keeping and invoicing of even a moderately large construction or similar logistics heavy firm. I thought this was largely solved, but this evidence suggests its probably not (in a universally applicable way)?