gisanokharu4 hours ago
interesting take. curious what you mean by format specifically - is this about tokenization, autoregressive next-token prediction, or something else? my experience is that hallucinations are worse on sparse facts that require precise recall vs things that can be derived. the model knows it doesnt know, but the training pushes it to complete the sequence either way. is that what you mean or is this a different angle