There's going to be some bias in the photos and video that are available as training data. People are not normally going to keep shots where a person who's the focus has their face heavily in shadow.
The video shown had no monsters, magic, or similar. I do wonder how you can handle that sort of thing given that there can't be any real images to use for training.