In all seriousness, there could be more utility in this if it helped explain the figures. I jumped ahead to one of the figures in the example video, and no real attention was given to it. In my experience, this is really where presentations live and die, in the clear presentation of datapoints, adding sufficient detail that you bring people along.
For papers, it doesn't have to go that far, but I imagine a polished AI girl (or guy) reading the summary would be more engaging.
Hah, "SteveGPT, present your PowerPoints like Steve Jobs did!"
Add sex and violence to your boring paper reading sessions more exciting!
[1] https://store.steampowered.com/app/858260/Until_You_Fall/
(and I generally think AI-produced content is slop).
Another thing that improved my personal presentation skills was noting down why I liked a presentation or why I didn’t - what specific things a person did to make it engaging. Just paying attention to that improved my presentation skills enormously
It also works with research papers.
Here is an explainer of the famous Attention is all you need paper https://www.youtube.com/watch?v=7x_jIK3kqfA
(You can try it here https://magnetron.ai)
Congratulations on this cool idea and results.
Where can I follow the progress or get notified ?
> Where can I follow the progress or get notified ?
I send out product updates once a week or so. Will keep you posted.
1. Using a "painter commenter" feedback loop to make sure the slides are correctly laid out with no overflowing or overlapping elements.
2. Having the audio/subtitles not read word-for-word the detailed contents that are added to the slides, but instead rewording that content to flow more naturally and be closer to how a human presenter would cover the slide.
A couple of things might possibly be improved in the prompts for the reasoning features, eg. in `answer_question_from_image.yaml`:
1. Study the poster image along with the "questions" provided.
2. For each question:
• Decide if the poster clearly supports one of the four options (A, B, C, or D). If so, pick that answer.
• Otherwise, if the poster does not have adequate information, use "NA" for the answer.
3. Provide a brief reference indicating where in the poster you found the answer. If no reference is available (i.e., your answer is "NA"), use "NA" for the reference too.
4. Format your output strictly as a JSON object with this pattern:
{
"Question 1": {
"answer": "X",
"reference": "some reference or 'NA'"
},
"Question 2": {
"answer": "X",
"reference": "some reference or 'NA'"
},
...
}
I'd assume you would likely get better results by asking for the reference first, and then the answer, otherwise you probably have quite a number of answers where the model just "knows" the answer and takes from its own training rather than from the image, which would bias the benchmark.example: Geoff Hinton saying "Forward-forward Algorithm" with a long pause after the first "forward".
(first few seconds in the first demo on https://showlab.github.io/Paper2Video/)