it's open-source and would love to see people build on it! it basically measures the spatial capabilities of these models and while I am unable to spend more for running new tests (since grok models take up soooo much cost but they're also turning out to be best at this), open to suggestions and would love to chat about it :)