> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.
Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.
This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.
A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.
I've only used codex with the responses v1 API and there it's the complete opposite. Already generated reasoning tokens even persist when you send another message (without rolling back) after cancelling turns before they have finished the thought process
Also with responses v1 xhigh mode eats through the context window multiples faster than the other modes, which does check out with this.
It’s not the responsibility of the agent to write this transcript, it’s emacs, so I don’t have to worry about the agent forgetting to log something. It’s just writing the buffer to disk.
That said, faster inference can't come soon enough.
why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.
I'm pretty sure that Codex uses reasoning.encrypted_content=true and store=false with the responses API.
reasoning.encrypted_content=true - The server will return all the reasoning tokens in an encrypted blob you can pass along in the next call. Only OpenaAI can decrypt them.
store=false - The server will not persist anything about the conversation on the server. Any subsequent calls must provide all context.
Combined the two above options turns the responses API into a stateless one. Without these options it will still persist reasoning tokens in a agentic loop, but it will be done statefully without the client passing the reasoning along each time.
I would see my context window jump in size, after each user turn (i.e. from 70 to 85% remaining).
Built a tool to analyze the requests, and sure enough the reasoning tokens were removed from past responses (but only between user turns). Here are the two relevant PRs [0][1].
When trying to get to the bottom of it, someone from OAI reached out and said this was expected and a limitation of the Responses API (interesting sidenote: Codex uses the Responses API, but passes the full context with every request).
This is the relevant part of the docs[2]:
> In turn 2, any reasoning items from turn 1 are ignored and removed, since the model does not reuse reasoning items from previous turns.
[0]https://github.com/openai/codex/pull/5857
[1]https://github.com/openai/codex/pull/5986
[2]https://cookbook.openai.com/examples/responses_api/reasoning...
Generally I have noticed Gpt 5.2 codex is slower compared to Sonnet 4.5 in Claude Code.
Or am I not understanding this right?
Call the model. If it asks for a tool, run the tool and call again (with the new result appended). Otherwise, done
https://i.ytimg.com/vi/74U04h9hQ_s/maxresdefault.jpghttps://github.com/anthropics/claude-plugins-official/commit...
However, I decided to try codex cli after hearing they rebuilt it from the ground up and used rust(instead of JS, not implying Rust==better). It's performance is quite literally insane, its UX is completely seamless. They even added small nice to haves like ctrl+left/right to skip your cursor to word boundaries.
If you haven't I genuinely think you should give it a try you'll be very surprised. Saw Theo(yc ping labs) talk about how open ai shouldn't have wasted their time optimizing the cli and made a better model or something. I highly disagree after using it.
But tbh OpenAI openly supporting OpenCode is the bigger draw for me on the plan but do want to spend more time with native Codex as a base of comparison against OpenCode when using the same model.
I’m just happy to have so many competitive options, for now at least.
- hooks (this is a big one)
- better UI to show me what changes are going to be made.
the second one makes a huge diff and it's the main reason I stopped using opencode (lots of other reasons too). in CC, I am shown a nice diff that I can approve/reject. in codex, the AI makes lots of changes but doesn't pin point what changes it's doing or going to make.
I work on SSL bio acoustic models as context.
I also was annoyed by Theo saying that.
However, it seems to really only be good at coding tasks. Anything even slightly out of the ordinary, like planning dialogue and plot lines it almost immediately starts producing garbage.
I did get it stuck in a loop the other day. I half-assed a git rebase and asked codex to fix it. It did eventually resolve all debased commits, but it just kept going. I don't really know what it was doing, I think it made up some directive after the rebase completed and it just kept chugging until I pulled the plug.
The only other tool I've tried is Aider, which I have found to be nearly worthless garbage
I have a tool that reduces agent token consumption by 30%, and it's only viable because I can hook the harness and catch agents being stupid, then prompt them to be smarter on the fly. More at https://sibylline.dev/articles/2026-01-22-scribe-swebench-be...
Codex works by repeatedly sending a growing prompt to the model, executing any tool calls it requests, appending the results, and repeating until the model returns a text response