LLMs are far from consistent.
This works in my experience
Saw another comment on a different platform where someone floated the idea of dynamically injecting context with hooks in the workflow to make things more deterministic.
The sessions where I gave Claude Code context about the problem space (job seekers rewriting resumes 20x) produced dramatically better results than sessions where I just said "build a resume generator." The AI designed a conversational intake flow, live resume preview with real-time updates, and PDF export - things I hadn't explicitly asked for but that emerged from understanding the problem.
Would love to see Rudel break down sessions by "context richness" vs output quality. My gut says the first 60 seconds of context-setting predicts the entire session's productivity.
Built on SuperNinja (no-code AI app platform): https://super.myninja.ai/apps/6de082c7-a05f-4fc5-a7d3-ab56cc...
Does this include the files being worked on by the agent in the session, or just the chat transcript?
if you dont trust us with that data though (which i can understand) you can host that thing locally on your machine
Starting new sessions frequently and using separate new sessions for small tasks is a good practice.
Keeping context clean and focused is a highly effective way to keep the agent on task. Having an up to date AGENTS.md should allow for new sessions to get into simple tasks quickly so you can use single-purpose sessions for small tasks without carrying the baggage of a long past context into them.
I have longer threads that I don't want to pollute with side quests. I will pull up multiple other chats and ask one or two questions about completely tangential or unrelated things.
It seems to me that sometimes it's better and more effective to remove, clean up, and simplify (both from CLAUDE.md and the code) rather than having everything documented in detail.
Therefore, from session analysis, it would be interesting to identify the relationship between documentation in CLAUDE.md and model efficiency. How often does the developer reject the LLM output in relation to the level of detail in CLAUDE.md?
I do not see any link or source for the data. I assume it is to remain closed, if it exists.
but i think the prior on 'this team fabricated these findings' is v low
TBH, I am very hesitant to upload my CC logs to a third-party service.
Thx for the link - sounds great !
I scrolled through and didn’t see enough to justify installing and running a thing
With this data, you can measure if you are spending too many tokens on sessions, how successful sessions are, and what makes them successful. Developers can also share individual sessions where they struggle with their peers and share learnings and avoid errors that others have had.
No, thanks
Or you can run your own instance, but we will need to add docs, on how to control the endpoint properly in the CLI.
would love to know your actual day to day use case for what you built
I would say roughly equal amount of sessions between them (very roughly)
Also maybe 40% of coding sessions in large brownfield project. 50% greenfield, and remaining 10% non coding tasks.
What tools do you use to run your analysis?
So the skills are mostly a sort of on-demand AGENTS.md specific to the task.
Another example is I have a `plan-review` skill, so when planning something I add at the end of the prompt something like: "plan the task, .... then launch claude and codex /plan-review agents in parallel and take their findings into account before producing the final plan".
> When Claude Code gets stuck in a loop, tries an unexpected tool chain, or produces inconsistent outputs under adversarial prompts — those aren't just UX failures, they're security surface area.
Twice in one paragraph, not even trying to blend in.
It became very hard to understand what exactly is sent to LLM as input/context and how exactly is the output processed.