From the Gemini 1.0 Pro API docs (which may not be the same as Gemini 1.5 in Dat...

From the Gemini 1.0 Pro API docs (which may not be the same as Gemini 1.5 in Data Studio): https://cloud.google.com/vertex-ai/docs/generative-ai/multim...

> The model processes videos as non-contiguous image frames from the video. Audio isn't included. If you notice the model missing some content from the video, try making the video shorter so that the model captures a greater portion of the video content.

> Only information in the first 2 minutes is processed.

> Each video accounts for 1,032 tokens.

That last point is weird because there is no way a video would be a fixed amount of tokens and I suspect is a typo. The value is exactly 4x the number of tokens for an image input to Gemini (258 tokens) which may be a hint to the implementation.