So... HN came quickly to mind as a place where I can share a thought, considered opinion, ask questions, with potential to have them be answered by very smart and knowledgeable folks on a neutral ground. If you've made it this far into my comment, I already appreciate you. :)
Ok so... I've already disclaimed any authority, so I will get to my point and see what you guys can tell me. I read the paper (it is 80+ pages, so admittedly I skimmed some math, but also re-read some passages to feel more certain that I understood what they are saying.
I understand the phenomenon, and have no reason to doubt anything they put in the paper. But, as I mentioned, while reading it I had some intangible gut "feelings" that seeing that they have math to back what they're saying could not resolve for me. Maybe this is just because I don't understand the proofs. Still, I realized when I stopped reading at it that it actually wasn't anything that they said, it was what it seemed to my naive brain was not said, and I felt like it should have been.
I'll try to get to the point. I completely buy that reframing prompts can reduce mode collapse. But, as I understand it, the chat interface in front of the backend API of any LLM tested does not have insight into logits, probs, etc. The parameters passed by the prompt request, and the probabilities returned with the generations (if asked for by the API request) do not leak, are not provided in the chat conversation context in any way, so that when you prompt an LLM to return a probability, it's responding with, essentially, the language about probabilities it learned during its training, and it seems rather unlikely that many training datasets contain actual factual information about their own contents' distributions for the model during training or RLHF to "learn" any useful probabilistic information about its own training data.
So, a part of the paper I re-read more than once says at one point (in 4.2): "Our method is training-free, model-agnostic, and requires no logit access." This statement is unequivocally obviously true and honest, but - and I'm not trying to be rude or mean, I just feel like there is something subtle I'm missing or misunderstanding - because, said another way, that statement could also be true and honest if it said "Our method has no logit access, because the chat interface isn't designed that way", and here's what immediately follows then in my mind, which is "the model learned how humans write about probabilities and will output a number that may be near to (or far away from) the actually prob of the token/word/sentence/whathaveyou, and we observed that if you prompt the model in a way that causes it to output a number that looks like a probability (some digits, a decimal somewhere), along with the requested five jokes, it has an effect on the 'creativity' of the list of five jokes it gives you."
So, naturally, one wonders what, if any actual correlation there is between the numbers the LLM generates as "hallucinated" (I'm not trying to use the word in a loaded way; it's just the term that everyone understands for this meaning, with no sentiment behind my usage here) probabilities for the jokes it generated, and the actual probabilities thereof. I did see that they measured empirical frequencies of generated answers across runs and compared that empirical histogram to a proxy pretraining distribution, and that they acknowledge that they did no comparison or correlation of the "probabilities" output by the model, and they clearly state it. So without continuing to belabor that point, this is probably core to my confusion about the framing of what the paper says that the phenomenon indicates.
It is hard for me to stop asking all the slight variations on these questions that lead me to write this, but I will stop, and try to get to a TL;DR I think dear HN readers may appreciate more than my exposition of befuddlement bordering on dubiousness:
I guess the TLDR of my comment is that I am curious if the authors examined any relationship between the LLM verbalized "probabilities" and actual model sampling likelihoods (logprobs or selection frequency). I am not convinced that the verbalized "probabilities" themselves are doing any work other than functioning as token noise or prompt reframing.
I didn't see a control for, or even a comparison to/against multi-slot prompts with arbitrary labels or non-semantic "decorative" annotation. In my experience poking and prodding LLMs as a user, desiring to influence generations in specific and sometimes unknown ways, even lightweight slotting without probability language substantially reduces repetition, which makes me wonder how much of the gain from VS is attributable to task reframing, as opposed to the probability verbalization itself.
This may not even be a topic of interest for anyone, and maybe nobody will even see my comment/questions, so I'll stop for now... but if anyone has insights, clarifications, or can point out where I'm being dense, I actually have quite a bit more to say and ask about this paper.
I can't really explain why I just had to see if I could get another insightful opinion on this paper (I usually don't have such a strong reaction when reading academic papers I may not fully understand, but there's some gap in my knowledge (or less likely, there's something off about the framing of the phenomenon described), and it's causing me to really hope for discussion, so I can ask my perhaps even less-qualified questions pertaining to what boils down to mostly just my intuition (or maybe incomprehension. Heh.)
Thanks so much if you've read this and even more if you can talk to me about what I've used too many words to try to convey here.