2 pointsby steveharing13 hours ago1 comment

throw3108223 hours ago
If I understand it correctly, this is based on the "RYS" architecture (or findings) by David Ng? ( https://dnhkng.github.io/posts/rys/ )
And, related: if there are small subsets of layers that can be looped inside LLMs to improve their reasoning, and if the layers to loop change depending on the competencies used by the LLM in that particular context, has anyone yet tried to build and train an LLM that can decide which layers to loop and how much?