1 pointby vukadinovic23 days ago1 comment

vukadinovic23 days ago
Inspired by Karpathy's nanochat, I developed a minimal, from scratch implementation of a visual language model for report generation from medical images. AI in healthcare models are rarely fully open-source, causing researchers to struggle adapting these models on their own data, and barriers to entry for newcomers in the field. I've made echovlm fully open-source. I provide a complete pipeline to train the model end-to-end using 120k publicly available imaging reports and matched synthetic videos for just $5 in 2 hours. To demonstrate echovlm in practice, I've included an inference example using a real video of my own heart. I'm looking forward to researchers using this for their experiments, students to learn about vlms for medicine and welcome new pull requests from contributors.