Very few sites have everything on a CDN, and that's not really possible for a large class of websites. So they can get IP addresses for web servers, which with reverse DNS they can get the domain name. They can also analyze the number of incoming versus outgoing packets to make certain inferences, like being able to tell whether you're watching a video on Facebook or upload one. That last one isn't that useful for advertising or anything, though, and requires TCP sessionization, which is tricky to scale, so I doubt they would do that.