To start with, I'm not coordinating across multiple systems or chaining LLM calls. I write all data to a central data store, run a state machine on it, and each LLM call then operates independently.
At that point, you can measure and optimize each call as needed and it is fairly straightforward.