by SXX 9 hours ago
Hey, I just made simple test on 5 minute downloaded YouTube video uploading it to Gemini app.
Source video title: Zelda: Breath of the Wild - Opening five minutes of gameplay
https://www.youtube.com/watch?v=xbt7ZYdUXn8
Prompt:
Please describe what happening in each scene of this video.
List scenes with timestamp, then describe separately:
- Setup and background, colors
- What is moving, what appear
- What objects in this scene and what is happening,
Basically make desceiption of 5 minutes video for a person who cant watch it.
Result on github gist since there too much text:https://gist.github.com/ArseniyShestakov/43fe8b8c1dca45eadab...
I'd say thi is quite accurate.
Another example with completely random 10 minute benchmark video from Tears of Kingdom:
https://gist.github.com/ArseniyShestakov/47123ce2b6b19a8e6b3...