The idea of detecting frames and using them to tease out the implicit meaning from text is quite nice. It seems there is a lot more to discover about using LLMs prior to RAG. Text is like code, you can't know what it does untill you run it, and in this case, until you annotate it. For example "10+10" won't embed close to "20". And "The fifth letter in this string" won't retrieve "f" by emmbedding similarity.
Let's say LLM while training is ingesting a very long book.
The name of the author of the book would appear at the very beginning.
While inference, how does the llm determine that the last chapter of the book is written by so and so author and hence that chunk should be near that author's style.
An example of a frame is an "Event" - https://framenet.icsi.berkeley.edu/fnReports/data/frameIndex... - where:
> An Event takes place at a Place and Time.
So if you're extracting frames from a piece of text, that's one of the concepts you might be trying to identify - along with what the place and time are.
> Frames are an artificial intelligence data structure used to divide knowledge into substructures by representing "stereotyped situations".
They are highly related to ontology systems and knowledge engineering.
> Frames ... are conceptual structures that capture the semantic and syntactic relationships underlying language. They are helpful in providing a structured semantic context for understanding relationships between entities, enabling tasks like Machine Reading Comprehension and Information Extraction to be more accurate and contextually aware.