Unrolling the Codex agent loop(openai.com)

456 pointsby tosh15 days ago22 comments

postalcoder15 days ago
The best part about this blog post is that none of it is a surprise – Codex CLI is open source. It's nice to be able to go through the internals without having to reverse engineer it.
Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.
https://github.com/openai/codex
- vinhnx15 days ago
  This came as a big surprise to me last year. I remember they announced that Codex CLI is opensource, and the codex-rs [0] from TypeScript to Rust, with the entire CLI now open source. This is a big deal and very useful for anyone wanting to learn how coding agents work, especially coming from a major lab like OpenAI. I've also contributed some improvements to their CLI a while ago and have been following their releases and PRs to broaden my knowledge.
  [0] https://github.com/openai/codex/tree/main/codex-rs
  - phrotoma14 days ago
    I know very little about typescript and even less about rust. Am I getting the rust version of codex when I do `npm i -g @openai/codex`?
    A stand alone rust binary would be nicer than installing node.
    vorticalbox14 days ago
    yes [0]
    > The Rust implementation is now the maintained Codex CLI and serves as the default experience
    [0] https://github.com/openai/codex/tree/main/codex-rs#whats-new...
    alabhyajindal14 days ago
    They should switch to a native installer then. Quite confusing
    quinncom14 days ago
    brew install codex
    https://developers.openai.com/codex/quickstart/?setup=cli
    phrotoma11 days ago
    Yeah I'm out here installing a billion node things to have codex hack on my python app. Def gonna look into a standalone rust binary.
    Leynos14 days ago
    They're leveraging the (relative) ubiquity of npm amongst developers.
- redox9915 days ago
  For some reason a lot of people are unaware that Claude Code is proprietary.
  - atonse15 days ago
    Probably because it doesn’t matter most of the time?
    fragmede15 days ago
    If the software is, say, Audacity, who's target market isn't specifically software developers, sure, but seeing as how Claude code's target market has a lot of people who can read code and write software (some of them for a living!) it becomes material. Especially when CC has numerous bugs that have gone unaddressed for months that people in their target market could fix. I mean, I have my own beliefs as to why they haven't opened it, but at the same time, it's frustrating hitting the same bugs day after day.
    rmunn15 days ago
    > ... numerous bugs that have gone unaddressed for months that people in their target market could fix.
    THIS. I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.
    For example, I use Docker Desktop on Linux rather than native Docker, because other team members (on Windows) use it, and there were some quirks in how it handled file permissions that differed from Linux-native Docker; after one too many times trying to sort out the issues, my team lead said, "Just use Docker Desktop so you have the same setup as everyone else, I don't want to spend more time on permissions issues that only affect one dev on the team". So I switched.
    But there's a bug in Docker Desktop that was bugging me for the longest time. If you quit Docker Desktop, all your terminals would go away. I eventually figured out that this only happened to gnome-terminal, because Docker Desktop was trying to kill the instance of gnome-terminal that it kicked off for its internal terminal functionality, and getting the logic wrong. Once I switched to Ghostty, I stopped having the issue. But the bug has persisted for over three years (https://github.com/docker/desktop-linux/issues/109 was reported on Dec 27, 2022) without ever being resolved, because 1) it's just not a huge priority for the Docker Desktop team (who aren't experiencing it), and 2) the people for whom it IS a huge priority (because it's bothering them a lot) aren't allowed to fix it.
    Though what's worse is a project that is open-source, has open PRs fixing a bug, and lets those PRs go unaddressed, eventually posting a notice in their repo that they're no longer accepting PRs because their team is focusing on other things right now. (Cough, cough, githubactions...)
    pxc15 days ago
    > I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.
    This exact frustration (in his case, with a printer driver) is responsible for provoking RMS to kick off the free software movement.
    fragmede14 days ago
    GitHubactions is a bit of a special case, because it's mostly run in their systems, but that's when you just fork and, I mean, the problems with their (original) branch is their problem.
    arthurcolle15 days ago
    They are turning it into a distributed system that you'll have to pay to access. Anyone can see this. CLI is easy to make and easy to support, but you have to invest in the underlying infrastructure to really have this pay off.
    Especially if they want to get into enterprise VPCs and "build and manage organizational intelligence"
    storystarling14 days ago
    The CLI is just the tip of the iceberg. I've been building a similar loop using LangGraph and Celery, and the complexity explodes once you need to manage state across async workers reliably. You basically end up architecting a distributed state machine on top of Redis and Postgres just to handle retries and long-running context properly.
    lomase15 days ago
    [dead]
    mi_lk15 days ago
    Same. If you're already using a proprietary model might as well just double down
    swores14 days ago
    But you don't have to be restricted to one model either? Codex being open source means you can choose to use Claude models, or Gemini, or...
    It's fair enough to decide you want to just stick with a single provider for both the tool and the models, but surely still better to have an easy change possible even if not expecting to use it.
    mi_lk14 days ago
    Codex CLI with Opus, or Gemini CLI with 5.2-codex, because they're open sourced agents? Go ahead if you want but show me where it actually happens with practical values
    behnamoh15 days ago
    until Microsoft buys it and enshits it.
    consumer45114 days ago
    This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?
    How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?
    Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!
    Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?
    tern14 days ago
    While I remember $5 Ubers fondly, I think this situation is significantly more complex:
    - Models will get cheaper, maybe way cheaper
    - Model harnesses will get more complex, maybe way more complex
    - Local models may become competitive
    - Capital-backed access to more tokens may become absurdly advantaged, or not
    The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.
    But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.
    All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.
    depr14 days ago
    If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.
    andai14 days ago
    The real question is how long it'll take for Z.ai to clone it at 80% quality and offer it at cost. The answer appears to be "like 3 months".
    consumer45114 days ago
    This is a super interesting dynamic! The CCP is really good at subsidizing and flooding global markets, but in the end, it takes power to generate tokens.
    In my Uber comparison, it was physical hardware on location... taxis, but this is not the case with token delivery.
    This is such a complex situation in that regard, however, once the market settles and monopolies are created, eventually the price will be what market can bear. Will that actually create an increase in gross planet product, or will the SOTA token providers just eat up the existing gross planet product, with no increase?
    I suppose whoever has the cheapest electricity will win this race to the bottom? But... will that ever increase global product?
    ___
    Upon reflection, the comment above was likely influenced by this truly amazing quote from Satya Nadella's interview on the Dwarkesh podcast. This might be one of the most enlightened things that I have ever heard in regard to modern times:
    > Us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.
    https://www.dwarkesh.com/p/satya-nadella#:~:text=Us%20self%2...
    YetAnotherNick14 days ago
    With optimizations and new hardware, power is almost a negligible cost that $5/month would be sufficient for all users, contrary to people's belief. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing even if you exclude architecture/model improvements. The thing is currently Nvidia is swallowing up a massive revenue which China could possible solve by investing in R and D.
    [1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...
    FuckButtons14 days ago
    I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.
    consumer45114 days ago
    I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.
    I have never seen a use case where a "lower" model was useful, for me, and especially my users.
    I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.
    This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.
    BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.
    Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.
    barrenko14 days ago
    Is Azure's pricing competitive on openAI's offerings through the api? Thanks!
    consumer45114 days ago
    The last I checked, it is exactly equivalent per token to direct OpenAI model inference.
    The one thing I wish for is that Azure Opus 4.5 had json structured output. Last I checked that was in "beta" and only allowed via direct Anthropic API. However, after many thousands of Opus 4.5 Azure API calls with the correct system and user prompts, not even one API call has returned invalid json.
    EnPissant14 days ago
    I'm guessing that's ~26 decode tokens/s for 2-bit or 3-bit quantized Minimax-m2.1 at 0 context, and it only gets worse as the context grows.
    I'm also sure your prefill is slow enough to make the model mostly unusable, even at smallish context windows, but entirely at mid to large context.
    15 days ago
    undefined
  - stavros15 days ago
    Can't really fault them when this exists:
    https://github.com/anthropics/claude-code
    bad_haircut7215 days ago
    What even is this repo? Its very deceptive
    adastra2215 days ago
    Issue tracker for submitting bug reports that no one ever reads or responds to.
    stavros15 days ago
    Now that's not fair, I'm sure they have Claude go through and ignore the reports.
    adastra2215 days ago
    Unironically yes. If you file a bug report, expect a Claude bot to mark it as duplicate of other issues already reported and close. Upon investigation you will find either
    (1) a circular chain of duplicate reports, all closed: or
    (2) a game of telephone where each issue is subtly different from the next, eventually reaching an issue that has nothing at all to do with yours.
    At no point along the way will you encounter an actual human from Anthropic.
    kylequest15 days ago
    By the way, I reversed engineered the Claude Code binary and started sharing different code snippets (on twitter/bluesky/mastadon/threads). There's a lot of code there, so I'm looking for requests in terms of what part of the code to share and analyze what it's doing. One of the requests I got was about the LSP functionality in CC. Anything else you would find interesting to explore there?
    I'll post the whole thing in a Github repo too at some point, but it's taking a while to prettify the code, so it looks more natural :-)
    lifthrasiir15 days ago
    Not only this would violate the ToS, but also a newer native version of Claude Code precompiles most JS source files into the JavaScriptCore's internal bytecode format, so reverse engineering would soon become much more annoying if not harder.
    arianvanp14 days ago
    Claude code is very good at reverse engineering. I reverse engineer Apple products in my MacBook all the time to debug issues
    kylequest15 days ago
    Also some WASM there too... though WASM is mostly limited to Tree Sitter for language parsing. Not touching those in phase 1 :-)
    embedding-shape14 days ago
    > Not only this would violate the ToS
    What specific parts of the ToS does "sharing different code snippets" violate? Not that I don't believe you, just curious about the specifics as it seems like you've already dug through it.
    pxc15 days ago
    Using GitHub as an issue tracker for proprietary software should be prohibited. Not that it would, these days.
    Codeberg at least has some integrity around such things.
    majkinetor14 days ago
    That must be the worst repo I have ever seen.
  - huevosabio14 days ago
    I frankly don't understand why they keep CC proprietary. Feels to me that the key part is the model, not the harness, and they should make the harness public so the public can contribute.
  - causalmodels15 days ago
    Yeah this has always seemed very silly. It is trivial to use claude code to reverse engineer itself.
    mi_lk15 days ago
    looks like it's trivial to you because I don't know how to
    n2d415 days ago
    If you're curious to play around with it, you can use Clancy [1] which intercepts the network traffic of AI agents. Quite useful for figuring out what's actually being sent to Anthropic.
    [1] https://github.com/bazumo/clancy
    fragmede15 days ago
    If only there were some sort of artificial intelligence that could be asked about asking it to look at the minified source code of some application.
    Sometimes prompt engineering is too ridiculous a term for me to believe there's anything to it, other times it does seem there is something to knowing how to ask the AI juuuust the right questions.
    lsaferite14 days ago
    Something I try to explain to people I'm getting up to speed on talking to an LLM is that specific word choices matter. Mostly it matters that you use the right jargon to orient the model. Sure, it's good and getting the semantics of what you said, but if you adjust and use the correct jargon the model gets closer faster. I also explain that they can learn the right jargon from the LLM and that sometimes it's better to start over once you've adjusted you vocabulary.
    adastra2215 days ago
    That is against ToS and could get you banned.
    Der_Einzige15 days ago
    GenAI was built on an original sin of mass copyright infringement that Aaron Swartz could only have dreamed of. Those who live in glass houses shouldn't throw stones, and Anthropic may very well get screwed HARD in a lawsuit against them from someone they banned.
    Unironically, the ToS of most of these AI companies should be, and hopefully is legally unenforceable.
    adastra2215 days ago
    Are you volunteering? Look, people should be aware that bans are being handed out for this, lest they discover it the hard way.
    If you want to make this your cause and incur the legal fees and lost productivity, be my guest.
    fragmede15 days ago
    You're absolutely right! Hey Codex, Claude said you're not very good at reading obfuscated code. Can you tell me what this minified program does?
    adastra2215 days ago
    I don't know what Codex's ToS are, but it would be against ToS to reverse engineer any agent with Claude.
    chillfox14 days ago
    Then use something like deepseek.
    mlrtime14 days ago
    How would they know what you do on your own computer?
    adastra2214 days ago
    Claude is run on their servers.
- frumplestlatz15 days ago
  At this point I just assume Claude Code isn't OSS out of embarrassment for how poor the code actually is. I've got a $200/mo claude subscription I'm about to cancel out of frustration with just how consistently broken, slow, and annoying to use the claude CLI is.
  - andy12_14 days ago
    > how poor the code actually is.
    Very probably. Apparently, it's literally implemented with a React->Text pipeline and it was so badly implemented that they were having problems with the garbage collector executing too frequently.
    [1] https://news.ycombinator.com/item?id=46699072#46701013
  - stavros15 days ago
    OpenCode is amazing, though.
    skerit14 days ago
    I switched to Opencode a few weeks ago. What a pleasant experience. I can finally resume subagents (which has been Broken in CC for weeks), copy the source of the Assistant's output (even over SSH), have different main agents, have subagents call subagents,... Beautiful.
    fragmede15 days ago
    Especially that RCE!
    qaz_plm15 days ago
    A new one or one previously patched?
  - Razengan15 days ago
    Anthropic/Claude's entire UX is the worst among the bunch
    sakesun12 days ago
    Clause web is very slow compare to others.
    halfcat15 days ago
    What’s the best?
    Razengan14 days ago
    In my experience, ChatGPT, and then Grok.
    I've posted a lot of feedback about Claude since several months and for example they still don't support Sign in with Apple on the website (but support Sign in with Google, and with Apple on iOS!)
  - rashidae15 days ago
    Interesting. Have you tested other LLMs or CLIs as a comparison? Curious which one you’re finding more reliable than Opus 4.5 through Claude Code.
    frumplestlatz14 days ago
    Codex is quite a bit better in terms of code quality and usability. My only frustration is that it's a lot less interactive than Claude. On the plus side, I can also trust it to go off and implement a deep complicated feature without a lot of input from me.
  - kordlessagain15 days ago
    Yeah same with Claude Code pretty much and most people don’t realize some people use Windows.
  - athrowaway3z14 days ago
    I'm almost certain their code is a dumpster fire.
    As for your 200$/mo sub. Dont buy it. If you read the fine print, their 20x usage is _per 5h session_, not overall usage.
    Take 2x 100$ if you're hitting the limit.
- boguscoder15 days ago
  I thought Eric Traut was famous for his pioneering work in virtualization, TIL he has Pyright fame too !
- 14 days ago
  undefined
- appplication15 days ago
  I appreciate the sentiment but I’m giving OpenAI 0 credit for anything open source, given their founding charter and how readily it was abandoned when it became clear the work could be financially exploited.
  - jstummbillig14 days ago
    > when it became clear the work could be financially exploited
    That is not the obvious reason for the change. Training models got a lot more expensive than anyone thought it would.
    You can of course always cast shade on people's true motivations and intentions, but there is a plain truth here that is simply silly to ignore.
    Training "frontier" open LLMs seems to be exactly possible when a) you are Meta, have substantial revenue from other sources and simply are okay with burning your cash reserves to try to make something happen and b) you copy and distill from the existing models.
  - seizethecheese15 days ago
    I agree that openAI should be held with a certain degree of contempt, but refusing to acknowledge anything positive they do is an interesting perspective. Why insist on a one dimensional view? It’s like a fraudster giving to charity, they can be praiseworthy in some respect while being overall contemptible, no?
    cap1123515 days ago
    Why even acknowledge them in any regard? Put trash where it belongs.
  - edmundsauto14 days ago
    By this measure, they shouldn’t even try to do good things in small pockets and probably should just optimize for profits!
    Fortunately, many other people can deal with nuance.
- psychoslave14 days ago
  Is it just a frontend CLI calling remote external logic for the bulk of operations, or does it come with everything needed to run lovely offline? Does it provide weights under FLOW license? Does it document the whole build process and how to redo and go further on your own?
westoncb15 days ago
Interesting that compaction is done using an encrypted message that "preserves the model's latent understanding of the original conversation":
> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.
- icelancer15 days ago
  Their compaction endpoint is far and away the best in the industry. Claude's has to be dead last.
  - nubg15 days ago
    Help me understand, how is a compaction endpoint not just a Prompt + json_dump of the message history? I would understand if the prompt was the secret sauce, but you make it sound like there is more to a compaction system than just a clever prompt?
    FuckButtons14 days ago
    They could be operating in latent space entirely maybe? It seems plausible to me that you can just operate on the embedding of the conversation and treat it as an optimization / compression problem.
    e1g14 days ago
    Yes, Codex compaction is in the latent space (as confirmed in the article):
    > the Responses API has evolved to support a special /responses/compact endpoint [...] it returns an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation
    xg1514 days ago
    Is this what they mean by "encryption" - as in "no human-readable text"? Or are they actually encrypting the compaction outputs before sending them back to the client? If so, why?
    e1g14 days ago
    "encrypted_content" is just a poorly worded variable name that indicates the content of that "item" should be treated as an opaque foreign key. No actual encryption (in the cryptographic sense) is involved.
    whatreason14 days ago
    This is not correct, encrypted content is in fact encrypted content. For openai to be able to support ZDR there needs to be a way for you to store reasoning content client side without being able to see the actual tokens. The tokens need to stay secret because it often contains reasoning related to safety and instruction following. So openai gives it to you encrypted and keeps the keys for decrypting on their side so it can be re-rendered into tokens when given to the model.
    There is also another reason, to prevent some attacks related to injecting things in reasoning blocks. Anthropic has published some studies on this. By using encrypted content, openai and rely on it not being modified. Openai and anthropic have started to validate that you're not removing these messages between requests in certain modes like extended thinking for safety and performance reasons
    EnPissant14 days ago
    Are you sure? For reasoning, encrypted_content is for sure actually encrypted.
    e1g14 days ago
    Hmmm, no, I don't know this for sure. In my testing, the /compact endpoint seems to work almost too well for large/complex conversations, and it feels like it cannot contain the entire latent space, so I assumed it keeps pointers inside it (ala previous_response_id). On the other hand, OpenAI says it's stateless and compatible with Zero Data Retention, so maybe it can contain everything.
    EnPissant14 days ago
    They say they do not compress the user messages, but yeah, it's purpose is to do very lossy compression of everything else. I'd expect it to be small.
    xg1514 days ago
    Ah, that makes more sense. Thanks!
    Art968114 days ago
    Their models are specifically trained for their tools. For example the `apply_patch` tool. You would think it's just another file editing tool, but its unique diff format is trained into their models. It also works better than the generic file editing tools implemented in other clients. I can also confirm their compaction is best in class. I've imlemented my own client using their API and gpt-5.2 can work for hours and process millions of input tokens very effectively.
    EnPissant14 days ago
    Maybe it's a model fine tuned for compaction?
  - kordlessagain15 days ago
    Yes, agree completely.
- swalsh15 days ago
  Is it possible to use the compactor endpoint independently? I have my own agent loop i maintain for my domain specific use case. We built a compaction system, but I imagine this is better performance.
  - __jl__15 days ago
    Yes you can and I really like it as a feature. But it ties you to OpenAI…
  - westoncb15 days ago
    I would guess you can if you're using their Responses api for inference within your agent.
- jswny15 days ago
  How does this work for other models that aren’t OpenAI models
  - westoncb15 days ago
    It wouldn’t work for other models if it’s encoded in a latent representation of their own models.
jumploops15 days ago
One thing that surprised me when diving into the Codex internals was that the reasoning tokens persist during the agent tool call loop, but are discarded after every user turn.
This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.
A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.
- EnPissant15 days ago
  I don't think this is true.
  I'm pretty sure that Codex uses reasoning.encrypted_content=true and store=false with the responses API.
  reasoning.encrypted_content=true - The server will return all the reasoning tokens in an encrypted blob you can pass along in the next call. Only OpenaAI can decrypt them.
  store=false - The server will not persist anything about the conversation on the server. Any subsequent calls must provide all context.
  Combined the two above options turns the responses API into a stateless one. Without these options it will still persist reasoning tokens in a agentic loop, but it will be done statefully without the client passing the reasoning along each time.
  - jumploops15 days ago
    Maybe it's changed, but this is certainly how it was back in November.
    I would see my context window jump in size, after each user turn (i.e. from 70 to 85% remaining).
    Built a tool to analyze the requests, and sure enough the reasoning tokens were removed from past responses (but only between user turns). Here are the two relevant PRs [0][1].
    When trying to get to the bottom of it, someone from OAI reached out and said this was expected and a limitation of the Responses API (interesting sidenote: Codex uses the Responses API, but passes the full context with every request).
    This is the relevant part of the docs[2]:
    > In turn 2, any reasoning items from turn 1 are ignored and removed, since the model does not reuse reasoning items from previous turns.
    [0]https://github.com/openai/codex/pull/5857
    [1]https://github.com/openai/codex/pull/5986
    [2]https://cookbook.openai.com/examples/responses_api/reasoning...
    EnPissant15 days ago
    Thanks. That's really interesting. That documentation certainly does say that reasoning from previous turns are dropped (a turn being an agentic loop between user messages), even if you include the encrypted content for them in the API calls.
    I wonder why the second PR you linked was made then. Maybe the documentation is outdated? Or maybe it's just to let the server be in complete control of what gets dropped and when, like it is when you are using responses statefully? This can be because it has changed or they may want to change it in the future. Also, codex uses a different endpoint than the API, so maybe there are some other differences?
    Also, this would mean that the tail of the KV cache that contains each new turn must be thrown away when the next turn starts. But I guess that's not a very big deal, as it only happens once for each turn.
    EDIT:
    This contradicts the caching documentation: https://developers.openai.com/blog/responses-api/
    Specifically:
    > And here’s where reasoning models really shine: Responses preserves the model’s reasoning state across those turns. In Chat Completions, reasoning is dropped between calls, like the detective forgetting the clues every time they leave the room. Responses keeps the notebook open; step‑by‑step thought processes actually survive into the next turn. That shows up in benchmarks (TAUBench +5%) and in more efficient cache utilization and latency.
    jumploops15 days ago
    I think the delta may be an overloaded use of "turn"? The Responses API does preserve reasoning across multiple "agent turns", but doesn't appear to across multiple "user turns" (as of November, at least).
    In either case, the lack of clarity on the Responses API inner-workings isn't great. As a developer, I send all the encrypted reasoning items with the Responses API, and expect them to still matter, not get silently discarded[0]:
    > you can choose to include reasoning 1 + 2 + 3 in this request for ease, but we will ignore them and these tokens will not be sent to the model.
    [0]https://raw.githubusercontent.com/openai/openai-cookbook/mai...
    EnPissant15 days ago
    > I think the delta may be an overloaded use of "turn"? The Responses API does preserve reasoning across multiple "agent turns", but doesn't appear to across multiple "user turns" (as of November, at least).
    Yeah, I think you may be correct.
- CjHuber15 days ago
  It depends on the API path. Chat completions does what you describe, however isn't it legacy?
  I've only used codex with the responses v1 API and there it's the complete opposite. Already generated reasoning tokens even persist when you send another message (without rolling back) after cancelling turns before they have finished the thought process
  Also with responses v1 xhigh mode eats through the context window multiples faster than the other modes, which does check out with this.
  - jumploops14 days ago
    That’s what I used to think, before chatting with the OAI team.
    The docs are a bit misleading/opaque, but essentially reasoning persists for multiple sequential assistant turns, but is discarded upon the next user turn[0].
    The diagram on that page makes it pretty clear, as does the section on caching.
    [0]https://cookbook.openai.com/examples/responses_api/reasoning...
  - jswny14 days ago
    How do you know/toggle which API path you are using?
  - 15 days ago
    undefined
- xg1514 days ago
  I think it might be a good decision though, as it might keep the context aligned with what the user sees.
  If the reasoning tokens where persisted, I imagine it would be possible to build up more and more context that's invisible to the user and in the worst case, the model's and the user's "understanding" of the chat might diverge.
  E.g. image a chat where the user just wants to make some small changes. The model asks whether it should also add test cases. The user declines and tells the model to not ask about it again.
  The user asks for some more changes - however, invisibly to the user, the model keeps "thinking" about test cases, but never telling outside of reasoning blocks.
  So suddenly, from the model's perspective, a lot of the context is about test cases, while from the user's POV, it was only one irrelevant question at the beginning.
- olliepro15 days ago
  I made a skill that reflects on past conversations via parallel headless codex sessions. Its great for context building. Repo: https://github.com/olliepro/Codex-Reflect-Skill
- hedgehog15 days ago
  This is effective and it's convenient to have all that stuff co-located with the code, but I've found it causes problems in team environments or really anywhere where you want to be able to work on multiple branches concurrently. I haven't come up with a good answer yet but I think my next experiment is to offload that stuff to a daemon with external storage, and then have a CLI client that the agent (or a human) can drive to talk to it.
  - hhmc15 days ago
    git worktrees are the canonical solution
    hedgehog15 days ago
    worktrees are good but they solve a different problem. Question is, if you have a lot of agent config specific to your work on a project where do you put it? I'm coming around to the idea that checked in causes enough problems it's worth the pain to put it somewhere else.
    ndriscoll14 days ago
    I have this in my AGENTS.md:
    ## Task Management - Use the projects directory for tracking state - For code review tasks, do not create a new project - Within the `open` subdirectory, make a new folder for your project - Record the status of your work and any remaining work items in a `STATUS.md` file - Record any important information to remember in `NOTES.md` - Include links to MRs in NOTES.md. - Make a `worktrees` subdirectory within your project. When modifying a repo, use a `git worktree` within your project's folder. Skip worktrees for read-only tasks - Once a project is completed, you may delete all worktrees along with the worktrees subdirectory, and move the project folder to `completed` under a quarter-based time hierarchy, e.g. `completed/YYYY-Qn/project-name`.
    More stuff, but that's the basics of folder management, though I haven't hooked it up to our CI to deal with MRs etc, and have never told it that a project is done, so haven't ironed out whether that part of the workflow works well. But it does a good job of taking notes, using project-based state directories for planning, etc. Usually it obeys the worktree thing, but sometimes it forgets after compaction.
    I'm dumb with this stuff, but what I've done is set up a folder structure:
    dev/ dev/repoA dev/repoB ... dev/ai-workflows/ dev/ai-workflows/projects
    And then in dev/AGENTS.md, I say to look at ai-workflows/AGENTS.md, and that's our team sharable instructions (e.g. everything I had above), skills, etc. Then I run it from `dev` so it has access to all repos at once and can make worktrees as needed without asking. In theory, we all should push our project notes so it can have a history of what changed when, etc. In practice, I also haven't been pushing my project directories because they have a lot of experimentation that might just end up as noise.
    fragmede15 days ago
    worktrees are a bunch of extra effort. if your code's well segregated, and you have the right config, you can run multiple agents in the same copy of the repo at the same time, so long as they're working on sufficiently different tasks.
    energy12314 days ago
    How do you achieve coordination?
    Or do you require the tasks be sufficiently unrelated?
    tomashubelbauer14 days ago
    I do this sometimes - let Claude Code implement three or four features or fixes at the same time on the same repository directory, no worktrees. Each session knows which files it created, so when you ask CC to commit the changes it made in this session, it can differentiate them. Sometimes it will think the other changes are temporary artifacts or results of an experiment and try to clear them (especially when your CLAUDE.md contains an instruction to make it clean after itself), so you need to watch out for that. If multiple features touch the same file and different hunks belong to different commits, that's where I step in and manually coordinate.
    fragmede14 days ago
    I'm insane and run sessions in parallel. Claude.md has Claude committing to git just the changes that session made, which lets me pull each sessions changes into their own separate branch for review without too much trouble.
- vmg1215 days ago
  I think this explains why I'm not getting the most out of codex, I like to interrupt and respond to things i see in reasoning tokens.
  - behnamoh15 days ago
    that's the main gripe I have with codex; I want better observability into what the AI is doing to stop it if I see it going down the wrong path. in CC I can see it easily and stop and steer the model. in codex, the model spends 20m only for it to do something I didn't agree on. it burns OpenAI tokens too; they could save money by supporting this feature!
    zeroxfe15 days ago
    You're in luck -- /experimetal -> enable steering.
    behnamoh15 days ago
    I first need to see real time AI thoughts before I can steer it tho! Codex hides most of them
    15 days ago
    undefined
- ljm15 days ago
  I’ve been using agent-shell in emacs a lot and it stores transcripts of the entire interaction. It’s helped me out lot of times because I can say ‘look at the last transcript here’.
  It’s not the responsibility of the agent to write this transcript, it’s emacs, so I don’t have to worry about the agent forgetting to log something. It’s just writing the buffer to disk.
- crorella15 days ago
  Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.
- sdwr15 days ago
  That could explain the "churn" when it gets stuck. Do you think it needs to maintain an internal state over time to keep track of longer threads, or are written notes enough to bridge the gap?
- pcwelder15 days ago
  Sonnet has the same behavior: drops thinking on user message. Curiously in the latest Opus they have removed this behavior and all thinking tokens are preserved.
- behnamoh15 days ago
  but that's why I like Codex CLI, it's so bare bone and lightweight that I can build lots tools on top of it. persistent thinking tokens? let me have that using a separate file the AI writes to. the reasoning tokens we see aren't the actual tokens anyway; the model does a lot more behind the scenes but the API keeps them hidden (all providers do that).
  - postalcoder15 days ago
    Codex is wicked efficient with context windows, with the tradeoff of time spent. It hurts the flow state, but overall I've found that it's the best at having long conversations/coding sessions.
    behnamoh15 days ago
    yeah it throws me out of the "flow", which I don't like. maybe the cerebras deal helps with that.
    postalcoder15 days ago
    It's worth it at the end of the day because it tends to properly scope out changes and generate complete edits, whereas I always have to bring Opus around to fix things it didn't fix or manually loop in some piece of context that it didn't find before.
    That said, faster inference can't come soon enough.
    behnamoh15 days ago
    > That said, faster inference can't come soon enough.
    why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.
- dayone115 days ago
  where do you save the progress updates in? and do you delete them afterwards or do you have like 100+ progress updates each time you have claude or codex implement a feature or change?
- lighthouse12128 days ago
  [dead]
- lighthouse121214 days ago
  [dead]
coffeeaddict115 days ago
What I really want from Codex is checkpoints ala Copilot. There are a couple of issues [0][1] opened about on GitHub, but it doesn't seem a priority for the team.
[0] https://github.com/openai/codex/issues/2788
[1] https://github.com/openai/codex/issues/3585
- wahnfrieden15 days ago
  They routinely mention in GitHub that they heavily prioritize based on "upvotes" (emoji reacts) in GitHub issues, and they close issues that don't receive many. So if you want this, please "upvote" those issues.
- toephu215 days ago
  Gemini CLI has this
- 15 days ago
  undefined
- adam_patarino14 days ago
  I’ve never understood checkpoints / forks. When do you use them?
  - coffeeaddict114 days ago
    Usually, I tell the agent to try out an idea and if I don't like the implementation or approach I want to undo the code changes. Then I start again, feeding it more information so it can execute a different idea or the same one with a better plan. This also helps the context window small.
    adam_patarino14 days ago
    Can’t you use git for that? I do that often and just revert changes. It does require me to commit often but that’s probably good anyways.
    Sateeshm13 days ago
    It's about not polluting the context. AI doesn't need information about things that didn't work in the new requests' context.
    adam_patarino12 days ago
    That's interesting. I use those moments to show it what not to do. Does it not just repeat the mistakes?
SafeDusk15 days ago
These can also be observed through OTEL telemetries.
I use headless codex exec a lot, but struggles with its built-in telemetry support, which is insufficient for debugging and optimization.
Thus I made codex-plus (https://github.com/aperoc/codex-plus) for myself which provides a CLI entry point that mirrors the codex exec interface but is implemented on top of the TypeScript SDK (@openai/codex-sdk).
It exports the full session log to a remote OpenTelemetry collector after each run which can then be debugged and optimized through codex-plus-log-viewer.
- fcoury14 days ago
  This looks pretty cool, nice UI too. Gonna take it for a spin.
mkw505315 days ago
I guess nothing super surprising or new but still valuable read. I wish it was easier/native to reflect on the loop and/or histories while using agentic coding CLIs. I've found some success with an MCP that let's me query my chat histories, but I have to be very explicit about it's use. Also, like many things, continuous learning would probably solve this.
daxfohl15 days ago
I like it but wonder why it seems so slow compared to the chatgpt web interface. I still find myself more productive copying and pasting from chat much of the time. You get virtually instant feedback, and it feels far more natural when you're tossing around ideas, seeing what different approaches look like, trying to understand the details, etc. Going back to codex feels like you're waiting a lot longer for it to do the wrong thing anyway, so the feedback cycle is way slower and more frustrating. Specifically I hate when I ask a question, and it goes and starts editing code, which is pretty frequent. That said, it's great when it works. I just hope that someday it'll be as easy and snappy to chat with as the web interface, but still able to perform local tasks.
- energy12315 days ago
  xhigh reasoning effort for 5.2 Thinking is not available for ChatGPT Plus subscribers in the web interface.
written-beyond15 days ago
Has anyone seriously used codex cli? I was using LLMs for code gen usually through the vscode codex extension, Gemini cli and Claude Code cli. The performance of all 3 of them is utter dog shit, Gemini cli just randomly breaks and starts spamming content trying to reorient itself after a while.
However, I decided to try codex cli after hearing they rebuilt it from the ground up and used rust(instead of JS, not implying Rust==better). It's performance is quite literally insane, its UX is completely seamless. They even added small nice to haves like ctrl+left/right to skip your cursor to word boundaries.
If you haven't I genuinely think you should give it a try you'll be very surprised. Saw Theo(yc ping labs) talk about how open ai shouldn't have wasted their time optimizing the cli and made a better model or something. I highly disagree after using it.
- georgeven15 days ago
  I found codex cli to be significantly better than claude code. It follows instructions and executes the exact change I want without going off on an "adventure" like Claude code. Also the 20 dollars per month sub tier gives very generous limits of the most powerful model option (5.2 codex high).
  I work on SSL bio acoustic models as context.
  - behnamoh15 days ago
    codex the model (not the cli) is the big thing here. I've used it in CC and w/ my claude setup, it can handle things Opus could never. it's really a secret weapon not a lot of people talk about. I'm not even using xhigh most of the time.
    straydusk14 days ago
    Yo, mind explaining your setup in a bit more detail? I agree completely - I like the Claude Code harness, but think Codex (the model) is significantly better as a coding model.
    I'm struggling with landing in a good way to use them together. If you have a way you like, I'd love to hear it.
    wahnfrieden15 days ago
    No, the codex harness is also optimized for the codex models. Highly recommend using first-party OpenAI harnesses for codex.
    behnamoh15 days ago
    I used that too, but CC currently has features like hooks that codex team has refused to add far too many times.
    copperx14 days ago
    Hmm, this seems promising?
    https://github.com/openai/codex/pull/9796
    wahnfrieden14 days ago
    OpenAI does not merge feature PRs. But you could try it out in a fork.
    jswny14 days ago
    What’s the point of having a public GitHub repo with PRs enabled if they will never merge any of them?
    wahnfrieden14 days ago
    They merge bugfixes and documentation and they allow discussion in employee PRs
    copperx14 days ago
    Thanks. I didn't know they didn't merge those.
    wahnfrieden15 days ago
    FYI there are forks that add it
    copperx15 days ago
    When you say CC is it Codex CLI or Claude Code?
    behnamoh15 days ago
    claude code
    copperx14 days ago
    If I understand correctly, you're using CC with an OpenAI API key, right?
    CC doesn't support it natively, so I'm assuming is some sort of mod, and it still outclasses Opus? That's interesting.
    Do you mind sharing what tool/mod you are using?
  - samstevens14 days ago
    hey I’m just spinning up in ssl birdsong models (BirdMAE, SongMAE, etc) can you share any resources? My email is stevens.994@osu.edu, would love to read your work.
    georgeven12 days ago
    dude you are spinning up my models! Emailing you!
- ewoodrich15 days ago
  OpenCode also has an extremely fast and reliable UI compared to the other CLIs. I’ve been using Codex more lately since I’m cancelling my Claude Pro plan and it’s solid but haven’t spent nearly as much time compared to Claude Code or Gemini CLI yet.
  But tbh OpenAI openly supporting OpenCode is the bigger draw for me on the plan but do want to spend more time with native Codex as a base of comparison against OpenCode when using the same model.
  I’m just happy to have so many competitive options, for now at least.
  - behnamoh15 days ago
    Seconded. I find codex lacks only two things:
    - hooks (this is a big one)
    - better UI to show me what changes are going to be made.
    the second one makes a huge diff and it's the main reason I stopped using opencode (lots of other reasons too). in CC, I am shown a nice diff that I can approve/reject. in codex, the AI makes lots of changes but doesn't pin point what changes it's doing or going to make.
    written-beyond15 days ago
    Yeah it's really weird with automatically making changes. I read in it's chain of thought that it's going to request approval for something from the user, the next message was approval granted doing it. Very weird...
    nl15 days ago
    I think Codex is probably marginally stronger than Opus in my testing.
    But it's much much worse at writing issues than Claude models.
    zoho_seni15 days ago
    You can't see diffs in git?
    How you using hooks?
    rco878614 days ago
    That’s a separate tool though. You don’t want to have to open another terminal to git diff every 30 seconds and then give feedback. Much better UX when it’s inline.
    My main hooks are desktop notifications when Claude requires input or finishes a task. So I can go do other things while it churns and know immediately when it needs me.
- williamstein15 days ago
  I strongly agree. The memory and cpu usage of codex-cli is also extremely good. That codex-cli is open source is also valuable because you can easily get definitive answers to any questions about its behavior.
  I also was annoyed by Theo saying that.
- estimator729215 days ago
  It's pretty good, yeah. I get coherent results >95% of the time (on well-known problems).
  However, it seems to really only be good at coding tasks. Anything even slightly out of the ordinary, like planning dialogue and plot lines it almost immediately starts producing garbage.
  I did get it stuck in a loop the other day. I half-assed a git rebase and asked codex to fix it. It did eventually resolve all debased commits, but it just kept going. I don't really know what it was doing, I think it made up some directive after the rebase completed and it just kept chugging until I pulled the plug.
  The only other tool I've tried is Aider, which I have found to be nearly worthless garbage
- CuriouslyC15 days ago
  The problem with codex right now is it doesn't have hook support. It's hard to understate how big of a deal hooks are, the Ralph loop that the newer folks are losing their shit over is like the level 0, most rudimentary use of hooks.
  I have a tool that reduces agent token consumption by 30%, and it's only viable because I can hook the harness and catch agents being stupid, then prompt them to be smarter on the fly. More at https://sibylline.dev/articles/2026-01-22-scribe-swebench-be...
- procinct15 days ago
  Same goes for Claude Code. Literally has vim bindings for editing prompts if you want them.
  - AlexCoventry15 days ago
    Codex has Ctrl-G to start an $EDITOR of your choice, FWIW.
  - behnamoh15 days ago
    CC is the clunkiest PoS software I've ever used in terminal; feels like it was vibe coded and anthroshit doesn't give a shit
    estimator729215 days ago
    All of these agentic UIs are vibe coded. They advertise the percent of AI written code in the tool.
    behnamoh15 days ago
    which begs the question: which came first—agentic AI tools or the AI that vibe coded them?
    songodongo14 days ago
    …the AI…
- 15 days ago
  undefined
- karmasimida15 days ago
  Codex’s only caveat is too slow.
  This is the biggest UX killer，unfortunately
tecoholic15 days ago
I use 2 cli - Codex and Amp. Almost every time I need a quick change, Amp finishes the task in the time it takes Codex to build context. I think it’s got a lot to do with the system prompt and a the “read loop” as well, amp would read multiple files in one go and get to the task, but codex would crawl the files almost one by one. Anyone noticed this?
- sumedh15 days ago
  Which Gpt model and reasoning level did you use in Codex and Amp?
  Generally I have noticed Gpt 5.2 codex is slower compared to Sonnet 4.5 in Claude Code.
  - nl15 days ago
    Amp doesn't have a conventional model selector - you choose fast vs smart (I think that's what it is called).
    In smart mode it explores with Gemini Flash and writes with Opus.
    Opus is roughly the same speed as Codex, depending on thinking settings.
- nl15 days ago
  Amp uses Gemini 3 Flash to explore code first. That's model is a great speed/intelligence trade-off especially for that use case.
  - tecoholic14 days ago
    Ah! Ha. Thank you.
- anukin15 days ago
  What is your general flow with amp? I plan to try it out myself and have been on the fences for a while.
  - tecoholic14 days ago
    I do the same thing with both. Nothing specific to Amp. But I have read it’s great for brainstorming and planning if I “ask oracle” - oracle being their tool that enables deep thinking. So I tend to use that when I think I have multiple solutions to something or the problem is big enough and I need to plan and break it down into smaller ones
dfajgljsldkjag15 days ago
The best part about this is how the program acts like a human who is learning by doing. It is not trying to be perfect on the first try, it is just trying to make progress by looking at the results. I think this method is going to make computers much more helpful because they can now handle the messy parts of solving a problem.
rvnx15 days ago
Codex agent loop:
```
    Call the model. If it asks for a tool, run the tool and call again (with the new result appended). Otherwise, done
```
https://i.ytimg.com/vi/74U04h9hQ_s/maxresdefault.jpg
- 15 days ago
  undefined
- 15 days ago
  undefined
- jmkni15 days ago
  I think this should be called the Homer Simpson loop, it seems more apt
  - rvnx15 days ago
    They sadly renamed the Ralph Wiggum loop due to copyright concerns so little hope for Homer :(
    https://github.com/anthropics/claude-plugins-official/commit...
    jmkni15 days ago
    ha I didn't know that, very interesting
gzalo14 days ago
Wow, this part where they describe skills sounds quite odd https://github.com/openai/codex/blob/99f47d6e9a3546c14c43af9...
Why wouldnt they just expose the files directly? Having the model ask for them as regular files sounds a bit odd
- mike_hearn14 days ago
  That's the whole point of skills - they help reduce context window usage by letting the model open only the ones that are relevant.
  - gzalo12 days ago
    I know, but part of the logic from below the line I linked could have been deterministic, it could benefit from a single "load skill" tool that just loads the files client side!
- rco878614 days ago
  Think of it as Just-In-Time context injection/enhancement
albert_e15 days ago
Offtopic but --
The "Listen to article" media player at the top of the post -- was super quick to load on mobile but took two attempts and a page refresh to load on desktop.
If I want to listen as well as read the article ... the media player scrolls out of view along with the article title as we scroll down ..leaving us with no way to control (pause/play) the audio if needed.
There are no playback controls other than pause and speed selector. So we cannot seek or skip forward/backward if we miss a sentence. the time display on the media player is also minimal. Wish these were a more accessible standardized feature set available on demand and not limited by what the web designer of each site decides.
I asked "Claude on Chrome" extension to fix the media player to the top. It took 2 attempts to get it right. (It was using Haiku by default -- may be a larger model was needed for this task). I think there is scope to create a standard library for such client side tweaks to web pages -- sort of like greasemonkey user scripts but at a slightly higher level of abstraction with natural language prompts.
ipotapov14 days ago
Regarding the user instruction aggregation process in the agent loop, I'm curious how you manage context retention in multi-turn interactions. Have you explored any techniques for dynamically adjusting the context based on the evolving user requirements?
kordlessagain15 days ago
If anyone cares to use Codex in a nice Docker container: https://github.com/DeepBlueDynamics/codex-container
doanbactam15 days ago
I completely agree. I use the Codex for complex, hard-to-handle problems and use OpenCode alongside other models for development tasks. The Codex handles things quite well, including how it handles hooks, memory, etc.
mohsen115 days ago
Tool call during thinking is something similar to this I am guessing. Deepseek has a paper on this.
Or am I not understanding this right?

Pity it doesn't support other llms.

It does, it's just a bit annoying.

I have this set up as a shell script (or you could make it an alias):

    codex --config model="gpt-oss-120b" --config model_provider=custom

with ~/.codex/config.toml containing:

    [model_providers.custom]
    name = "Llama-swap Local Service"
    base_url = "http://localhost:8080/v1"
    http_headers = { "Authorization" = "Bearer sk-123456789" }
    wire_api = "chat"

    # Default model configuration
    model = "gpt-oss-120b"
    model_provider = "custom"

https://developers.openai.com/codex/config-advanced#custom-m...

Donkey_Kane9 days ago
extremely helpful
MultifokalHirn15 days ago
thx :)
ltbarcly315 days ago
Codex is extremely bad to the point it is almost useless.
Claude Code is very effective. Opus is a solid model and claude very reliably solves problems and is generally efficient and doesn't get stuck in weird loops or go off in insane tangents too often. You can be very very efficient with claude code.
Gemini-cli is quite good. If you set `--model gemini-3-pro-preview` it is quite usable, but the flash model is absolute trash. Overall gemini-3-pro-preview is 'smarter' than opus, but the tooling here is not as good as claude code so it tends to get stuck in loops, or think for 5 minutes, or do weird extreme stuff. When Gemini is on point it is very very good, but it is inconsistent and likely to mess up so much that it's not as productive to use as claude.
Codex is trash. It is slow, tends to fail to solve problems, gets stuck in weird places, and sometimes has to puzzle on something simple for 15 minutes. The codex models are poor, and forcing the 5.2 model is expensive, and even then the tooling is incredibly bad and tends to just fail a lot. I check in to see if codex is any good from time to time and every time it is laughably bad compared to the other two.
- cactusplant737414 days ago
  I have the complete opposite experience. Claude Code is for building small demo apps. Like a 10 line Javascript example. Codex is for building GPU pipelines and emulators.
ppeetteerr15 days ago
I asked Claude to summarize the article and it was blocked haha. Fortunately, I have the Claude plugin in chrome installed and it used the plugin to read the contents of the page.
- sdwvit15 days ago
  Great achievement. What did you learn?
  - ppeetteerr15 days ago
    Nothing particularly insightful other than avoiding messing with previous messages so as not to mess with the cache.
    rvnx15 days ago
    Summary by Claude:
    Codex works by repeatedly sending a growing prompt to the model, executing any tool calls it requests, appending the results, and repeating until the model returns a text response