From today's report: three separate papers on stabilizing RL fine-tuning for LLM reasoning landed from three different groups. STAPO silences rare token probability spikes during training, Experiential RL adds memory of past feedback to handle sparse rewards, and TAROT uses test-driven curriculum RL for code generation. Read individually they're three unrelated papers. Clustered together they tell you the field is stuck on the same problem.
Separately, a paper on "Learning to Configure Agentic AI Systems" and a Reddit post analyzing 44 agent frameworks both surfaced within hours, both independently identifying context management as the key bottleneck. Neither referenced the other.
That's what the site is built to find. Curious if these match what people here are seeing in their own work/pubs/research.