We might be really far from a ASI that completely grok every nuance of that picture... but we already have AI that can understand pieces of the picture... and those AI's can be used to do interesting things today. This isn't an all or nothing endeavor.
A properly trained AI would have recognized Obama in that picture much faster than I did. As for the foot in the scale, I only noticed when the author mentioned it.
So much for the 'quick glance'. Which brings me to another matter. One of the reasons the author can extract all that information from that picture is because all the elements in it have been 'seen' already. A machine might not be able to extract the whole context, but things like the people involved and that they seem happy? Easy (-ish).