2 pointsby badmonstera day ago1 comment

badmonstera day ago
a subtle but powerful insight: large multimodal models like CLIP don’t just learn individual concepts. they also depend heavily on how often those concepts appear together during training.