Most open-source AlphaEvolve alternatives (OpenEvolve, ShinkaEvolve, GEPA) assume you'll throw a frontier model at every mutation. Thats a bummer for most people trying to enter this domain. LEVI argues: invest in the search harness, not the model.
The core idea is that if you maintain diversity properly well enough, using both code structure and behavioral differences to keep genuinely different strategies alive, you don't need GPT-5 or Claude Opus generating every candidate! A local Qwen 30B handles 95%+ of mutations. Frontier models are reserved for periodic paradigm shifts where you actually need broad knowledge.
This ends up being a 3-7x cost reduction while scoring higher (on every single tried problem). On the UC Berkeley ADRS benchmark (7 real-world systems problems: scheduling, load balancing, SQL optimization), LEVI holds the top score on every problem where improvement is possible, at $4.50/problem vs $15-30 for baselines. Berkeley wrote it up here: https://ucbskyadrs.github.io/blog/levi/
In controlled comparisons (same model, same budget, three seeds), the architecture alone gets you to near-peak performance ~12x faster in sample efficiency.
Docs: https://ttanv.github.io/levi/docs
On a side note, Google's TRC program is amazing, it gives you free TPU access for a month, and are usually very fast with requests. So go check it out!
Would especially love to hear from people with unusual problem domains they'd want to point this at. Scheduling, packing, and optimization benchmarks are well-trodden. Curious what happens on weirder stuff.