That's not how LLM quality works.
I’m observing a law that states: There appears to be a direct relationship between model performance and cost, such that whenever a company claims to have reduced inference costs, customers immediately notice a corresponding decline in model performance.
How are they "being played" if Claude 5 isn't even out yet
Claude 3 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens
Claude 4 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens
Claude 4.1 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens
Claude 4.5 Opus: $5.00 (Input) / $25.00 (Output) per 1M tokens
There are plenty of ways to reduce inference cost for a high-intelligence model. Making sparser weights, for example, can increase the parameter count while reducing the inference cost and time.
Let’s see what happens :)