I concur, and as you say it comes from a video frame and thus a video. The fact that the video frame contains only a single one seems to change nothing.
If I were to agree with this, then would you be willing to agree that the single-pixel ambient light sensor adorning many pocket supercomputers is a camera?
And that recording a series of samples from this sensor would result in a video?
If there is no lower bound on the size of the image that constitutes a frame, then: Please find the following pictorial summation of my thoughts on this matter to be a sufficient response to your question.
That term is "pixel".