2. Use bigger models for creating detailed plan, and then use smaller cheap models, like deepseek-v4-flash, or gemini-3-flash for actual implementation. This works really well.
3. Do not just keep chatting in the same session. Try to start a new session for evrry new chat message, or at most like after 2-3 chat messages. if needed, you can ask it summarize the details and use that as context for new chat session.
4. Implement the features in small sets, not one go.. and reset session after every set is done.
5. Keep AGENTS.md small, just the basic info about your project, and the file paths and what that file contains and do and then, general guideleines (10 max).
Outside of that, the key is to keep context as low as possible. The cost of a token increases a lot as context grows. My current favourite approach to that is RPI. Run the Research, Plan, and Implement phases each in their own isolated agent that produces a markdown file for the next.