Definitely not, that works on any image no matter the source. Also depth probably wouldn't help you that much, practically everything would include the floor or table it was sitting on in that case.
That would be a machine-learning only approach, semantic segmentation.
That would be a machine-learning only approach, semantic segmentation.