Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tracking users via CSS (underjord.io)
203 points by lawik on Sept 15, 2020 | hide | past | favorite | 116 comments


What extra data can you track using this method over just normal HTTP logging?

The only things I can think of are how a user interacts with a page - I don't particularly think this is too concerning - although as with all these things there are possibly much more creative uses of it that I haven't considered.

There's a new image property loading="lazy" which generally will load an image when it approaches the viewport. This could also be "abused" in similar ways.

If this does turn out to be a privacy concern, browser settings/privacy addons could simply load all lazy images or images refered to in CSS/JS files on load which would nullify this technique.


You can mostly track the interactions. Just the fact that CSS was loaded distinguishes the user from a lot of automated tracking. You should be able to track time spent on the page with CSS animations as well, up to a point, that's mentioned in the post.

I don't think you can do anything particularly nasty even with CSS variable programming which can apparently be used for interactive games (https://github.com/propjockey/css-sweeper). As I looked into things writing for this post I couldn't come up with a non-JS way to transition much significant data into CSS.


HTTP logging can give a pretty strong indicator of time spent on pages (not perfect, but good with large volumes) as well as navigated routes through the site already.

It would seem to me that the only advantage of this technique is to track mouse movements on the page in a very low resolution, and likely labour intensive way. Low resolution because it only works the first time on each page load, and is going to be tricky to get any meaningful granularity on the data.

I am not particularly concerned about this but my privacy concerns are definitely lower than the general consensus of this site. I only compare to HTTP logging as this is the most hidden/covert way of tracking users.


I used this (via :hover) to play with mouse movement tracking. The resolution was significantly worse than the JS approach. But it worked, and even allowed to use some ML analysis on mouse movement patterns. That was about 10 years ago, so I no longer have the code/site.


Another trick is to use JavaScript to check if images have a size. Most bots don't download those assets to save bandwith.


There are probably quite a few fingerprinting surfaces you could pull out of CSS. For example (these are educated guesses):

* Track what fonts a user has installed by asking for preinstalled fonts and using loaded fonts as a fallback

* Track screen width and height by conditionally loading an image

* Track screen height vs window height (linked to OS and various user settings)


Firefox won't do lazy loading if JS is disabled, because it doesn't want lazy loading to work as a fallback page depth tracker. If JS is enabled, all bets are off, so there's not much point in using lazy loading instead of just polling/intersection observer.


Generally yes, if you stick just to CSS you can log HTTP request initiated by user's interaction or user's environment. To name some environmental: locally installed fonts, viewport dimensions, colour depth, resolution etc. Interaction is anything pointer position related, focus management, and to some degree you can do simple kind of "keylogging" or text character presence detection.

https://github.com/jbtronics/CrookedStyleSheets https://news.ycombinator.com/item?id=16157773


Almost nobody turns off JavaScript because it breaks the whole web. You can't even view YouTube.

This is less interesting than existing JavaScript techniques to identify or filter out crawlers. It works, when the image is not cached, but it's fundamentally inferior to anything you can do with JavaScript, and if you don't have JavaScript at all, I don't see why you would want to care. Just lump that little bit of traffic in with the bots for analytics purposes.


I use NoScript and only turn on Javascript if I have to. It happens less often than you think. Most sites will load enough for you to read them without JS.


You can use CSS to identify fonts and feature detection like Modernizr to uniquely identify browsers and users. Track clicks, mouse over, language, screen size, time spent can be calculated with animations, pretty much anything you would do with javascript except canvas fingerprinting.

The big advantage of this is that you can still track and uniquely identify people even if they turn off cookies and javascript.


> The only things I can think of are how a user interacts with a page - I don't particularly think this is too concerning

Except that many people don't want others to know how much they read something... I don't care, but many believe that is private.

> If this does turn out to be a privacy concern, browser settings/privacy addons could simply load all lazy images or images refered to in CSS/JS files on load which would nullify this technique.

So you load more invisible pixel and make this even more effective as now you'll be able to get much more granular data, like the scroll position!


By “normal HTTP logging” what do you mean exactly? Most logging is done with JavaScript, and this is JS-free, would even work in an environment with JS disabled.


HTTP logging is what an HTTP server does. It doesn't require JavaScript. Fetching a resource with "HTTP Access Logs" enabled on a server will indicate the what and who, if configured properly. The logs typically end up in a file on the server's filesystem.


Thanks for clarifying — but this doesn’t seem particularly useful for many types of tracking. Measuring things like bounces, scroll depth, engagement on modules, etc, require logging post-load.


You can disable lazy loading images in about:config in Firefox


Ad-blockers have been around for years and I have always wondered why the Ad industry has not moved to server-side yet. Then I realized that maybe they weren't hit at all by this, but actually that helped with that 50% budget they often claim go to waste in advertising.

Much like Nigeriam Scams self-select the most naive users with their silly stories, some advertisers may likely get better impression/click ratios once the savvy users are out of their game.


The ad industry has overwhelmingly moved to server side. Google AdWords, Facebook ads, Instagram ads, sponsored search results on travel sites. Server side means you need more trust that it’s an actual user and not a server farm, so it’s less usable outside the walled gardens.


This is also part of why uBlock and others blocking ad elements via heuristics as well as URL patterns is vital and almost certainly a chunk of why Google doesn't look favorably on those techniques.


Server side tracking would mean that ad companies don't have access to cookies + user fingerprints, so they would be less effective at serving targeted ads.


Why so?

Do the tracking inside the browser, send it back, render the ad server side and send it to the user?

It's already being implemented in many places.


Most ad blockers also block the tracking, not just the displaying of ads, so the benefit here would likely be marginal.

That said, I think minimally-tracked (i.e based on what the website you're using knows about you) server side ads would be a great step forward for both user privacy + ethical content consumption, I just don't see advertisers jumping at the opportunity.


User goes to shoes.com, which loads ads from adserver.com, which sets a tracking cookie on domain adserver.com

User goes to news.com, which loads ads from adserver.com. Adserver.com sees the cookie, and shows the ads about shoes.

If you do everything on the server side, shoes.com won't be able to place a global tracking cookie. So when the user goes to news.com, they'll have to see the generic ad, and not a targeted one.


User goes to shoes.com. Adserver.shoes.com acts as frontend for adserver.com. Adserver.com sends a cookie and adserver.shoes.com sets it.

User goes to news.com. adserver.news.com loads ads from adserver.com . Adserver.com checks cookie from adserver.shoes.com and and generates ads server side and shows it with the news.


> Adserver.com checks cookie from adserver.shoes.com

1. This is not possible from server-side. The way cookies work, cookies set by adserver.shoes.com are only visible from documents hosted on *.shoes.com domains. And if the tracking is done client-side, by image, iframe or some other remote call, then this is trivially detected and blocked by adblocker.

2. How did "adserver.com" knew to check for "adserver.shoes.com" cookies? there might be hunderds clients, and trying to load cookies from every domain (adserver.shoes.com, adserver.computers.com, adserver.travel-to-india.com and so on) will take way too long.


Do you think the ads are ‘rendered’ client side? Also you have to specify what server side means because the ad networks don’t trust the web sites their ads are on.


By server side, I mean, everything is loaded on servers as a single file and served as a blob that adblockers can't block or read.

Like Google AMP.

Client site rendering can easily be blocked by just domain blocking.


It depends what "this" refers to.

If it's CSS, then no.

If it's loading an external image, then also no. Certainly no more evil than any other method of getting a user's browser to make an HTTP request anyway.

If it's tracking users then maybe. Gathering data is evil unless you have a very good reason. If you're gathering it and not actually using it then that is definitely evil. If you're gathering everything in case you need it then that is also evil. If you're gathering data that's unique to individuals that's even worse. If you're gathering data that's unique to individuals, and keeping it, and using it to build up profiles by blending it with other sources, and then selling the information that's really evil.

Just gathering browser agent strings or screen resolutions though, it's not terrible. Although I do wonder why you need CSS analytics rather than just using the server log from the request for the HTML file.


> I do wonder why you need CSS analytics rather than just using the server log from the request for the HTML file.

From the article:

Lots of automated traffic on the web, bots, crawlers and scrapers. So if there is a way that can remove most of the automated traffic without loading any JS, is that a win?

   body:hover {
        background-image: url("https://underjord.io/you-was-tracked.png");
    }
[...] This has a certain elegance because it actually requires mouse interaction.


The vast majority of bots and scrapers today are headless browsers, so I’m not sure that would be very effective.


I think that's the point, a headless browser is unlikely to trigger :hover behaviour.


> I think that's the point, a headless browser is unlikely to trigger :hover behaviour.

That is not true. I just tried it with headless chrome and triggered :hover behavior immediately just by synthesizing mouse movement the same way I would by using it as a scraper.


I think his point is that the average scraper isn't going to bother simulating mouse movements to trigger the hover behavior.


Not specifically for that purpose, sure. I didn't simulate the mouse movements for that purpose. The hover triggered when I simulated scrolling and clicks the same way I do when I want to scrape sports stats from sites that hide them behind scroll movements or mouse clicks.


It will at least filter all the bots that aren't trying to hide that they are bots, just trying very hard to understand your page.

You can use it to clean your analytics, but yes, it's useless for fraud prevention.


It didn't filter out my bot which makes no real attempt to hide that it's a bot; all it does is scroll and click.

Sure, it'd clean analytics up a little bit, but that would come at the expense of being able to use a CDN/cache for my CSS. Even leaving aside implementation effort, that's a big price to get some but not all bots out of my log data.


The evil is in how you use the data.

And of course, if you share the data with a third party (that includes if you use their services to collect or process the data), you should logically assume the worst.

I would probably be okay to share all kinds of data with the websites I use, if I could reasonably trust that it will never be used to identify me personally. If you have a website, and you are curious whether your audience is mostly male or female, young or old, using computers or smartphones, I can understand the need, and would not object against my presence increasing some number in the database by one.

Well, there is this problem that if you collect too many attributes about me, and if they are all connected together (as opposed to merely increasing a few independent counters), a sufficiently large collection can be used in the future to identify me uniquely, which is the part I object against. And I have no control about how you store the information, and it is reasonable for me to assume the worst.

To make an analogy with the offline world, if you e.g. give a public lecture, you can see what kind of people are in the audience. And I would feel no desire to wear a mask for the purpose of hiding my age or gender or race or whatever from the lecturer.

But I would object against someone taking my photo, using it to identify me, and writing the information of me attending given lecture on given day into some huge dossier about me, that he would later share with other shady people, so that they can all have my incredibly detailed biography (in case of Google, including also most of my private correspondence). That is definitely evil.


If you gather data but you anonymise it, then you're totally fine. If you gather it in order to later use it to improve the UX on your site, that's fine.

It's making it personally identifiable and linking it to other things where you get in the really evil territory.


I am inclined to compare CSS to guns now. At what point is the tool evil and when should it be banned? When all it's users are evil? More than half? Is the tool never evil? Ethics...


I'd maybe compare it to a toothbrush instead. Yes, you can sharpen one end of it and then use it to do unfriendly things to people. But that's not exactly its primary use case.


Fair enough


The biggest loss of privacy (my personal view) is that we've lost the ability to read without being observed and that's important to maintain a healthy and diverse mindset in society. It enables people to read without fear of "persecution". That pretty puts much of the "analytical web" in the "evil" basket for me.

edit: autocorrect wrote "prosecution" instead of "persecution". Fixed it.


This is a good point. One potential workaround for this is to implement something usenet style where a whole corpus would be downloaded to your device and then you'd just load all the data locally. Of course only a small fraction of content would be available this way which has it's own set of biases.


I like this sort of question, albeit it is akin to preparing to re-tune your violin, in order to do some fiddling while Rome is burning, given just how far we have lost all privacy.

In an ideal world, it should be up to the user what they want to disclose. So, perhaps there should be no logging at all. And having loaded a page, the page should work 'offline' with no further interaction with the page or site by default. I mean, that's how simple sites appear to work. That they don't work like how you think they appear to, illustrates how technologists are selling illusions for profit.


In an ideal world, users would also disclose some behavioral data to help webmasters get some meaningful feedback about their work.

In our world however, almost all users won't care, and the rest few won't disclose anything out of suspicion of abusing that data.


> almost all users won't care, and the rest few won't disclose anything out of suspicion of abusing that data

Yes, I wouldn't disclose.

But because the users don't want to give their data to you, that doesn't mean that it is ethically OK to take it without them realising.

That's the rub with privacy. There needs to be some acceptance and tolerance of an individual's decisions.

Instead though, we have engineered disclosure. In a pure sense its an act of aggression against another.


I view it as looking out into the audience while saying what I have to say (in the case of blog posts), or paying attention to who I talk to and using the best strategy when doing door to door sales.

Do people put on a mask when they open a door for a stranger or when they go to the market?

This kind of knowledge is strategic for both people that want to be heard or to make a sale. Each kind of person requires a different approach. Being observant doesn't require consent.


Yes, but we all know what it is to go to a public place. We choose that.

When you are on your own home, looking a website, you think you are on your own, reading something, like a book. You do not have a sense of interacting, or of being in a public space. You would have that sense of interacting if you were on an Internet forum, or discord, etc.

A book does not report on you (and I'm talking about paper books not kindles!). But all websites monitor you. This is to say that the user is gamed. They are misled into thinking they are looking at a private thing like a book, whereas the reality is that they are being monitored. It's the biggest anti-pattern. It's absolutely without meaningful consent and it is absolutely baked in to the internet. There is no privacy online.

But, in my perfect world, we should be allowed privacy and be able to go online. Oh well, maybe in the next life!


I can see your point. Still, when I write a blog post, I’m writing to you, specifically. Even though I don’t know who you are or anything about you.

It’s fair to “see” you there; to know my words had some impact. After all, you sought my words. Do I need to know everything about you? No. And that’s where this all breaks down.

Today’s tracking is like being a door to door salesman, and after trying to sell you something, I stalk you day and night. Actually, I’m surprised someone hasn’t tried to sue/charge Google et al for stalking.


To be honest - I'm labouring the point for sure. I personally wouldn't care about you getting some fairly anonymous data on me. But it would have been nice if those organising the internet experience had given even a bit of consideration to privacy, as opposed to how to eradicate it.

My real concern is how we have been turned inside out by corporate technology. I mean there was a story going around a couple of years ago that Facebook knew when a couple was going to split up before the couple did!

Its too much!


I find this not evil. If this is covertly used, that'd be evil.

I agree with the argument "If you do it to extract information from your user to which they would not consent, it’s evil."

However, we tend to get caught up in the right-now and not think through consequences. If this were widely used, browsers would implement the same sorts of privacy controls they do around 3rd party cookies, JS, etc.

This seems like a more semantic way to do tracking than many other techniques. It seems like it'd be easier for browsers to manage.


The evil part in tracking is tracking user behaviour and personally identifying data. If you’re tracking overall metrics anonymously, it doesn’t really matter if it’s done via HTML/JS/CSS, it is probably not evil.


The EFF’s own website has analytics, somewhat ironically. But the information they collect is limited, and the analytics are loaded from a separate domain (anon-stats.eff.org) so it’s easy to block. EFF’s privacy policy:

https://www.eff.org/policy

I think that first-party analytics are kind of a gray area. Third-party analytics are always evil.


Is it that clear-cut? Is a nice friendly org that considers the options and picks something minimal and ostensibly ethical like Fathom or Plausible and then just uses that to keep track of how they are doing evil? Or is that not a third party, just first-party outsourced? Not sure what definition we're working with here :)


I’m late to respond, but yes. Third party analytics allows a centralized entity to track your activity across multiple websites. No matter how privacy friendly Fathom and Plausible claim to be, I would rather not have to trust them.

Self-hosted analytics takes this out of the question.


You can use the same technique with a:active to track link clicks, by the way.

Technically, this would all be relatively easy to block with your own user style sheet. Practically, though, a lot of non-tracking sites rely on background-image for essential functionality, so you'll see a lot of breakage. It's a dilemma.


Very clever, borderline ingenious.

The task of filtering out bots from server logs can get really tedious even if there's JS involved. Being able to spot humans using this technique is really quite helpful.

Edit - body:hover doesn't seem to work in Firefox, but it's trivial to work around that.


I toyed around with pixel tracking like this before with PHP and the GD library. You just create an 1x1 white pixel and set the file extension to .png or whatever, and as long as you've configured the server right, it will execute and return a pixel. But then you can do all the other tracking you want. And the user doesn't know any different.

That said I won't use that in the future but it's scary how easy it is.


This is how certain large newspaper sites do their user tracking so editors can improve their article's engagement 'live'


If I were a browser engine I would download all the assets, all the images, whether the user hovered or focused or interacted in any way or not.


Users on metered connections might not thank you.


But power management (especially mobile devices)... Nobody likes excess heat and wasting electricity. And metered data connections. Lots of competing forces here.


Font and client dimension fingerprinting are the reasons why people should stop thinking Brave actually protects them from anything. Brendan we both know it's impossible to solve using a Chromium base you being bitter against Mozilla is a different story nobody cares about.

Don't take your personal grudge out on your users by fooling them into a false sense of security Brendan.


It really depends on the purpose IMO. If you are cross matching that data from other sources to track, it is "evil" (in the sense of people describe tracking people, I personally don't care). If you are using it for your own statistics (how many people visited where, screen sizes, duration, where they scrolled at etc.) I don't see any issue there.


This technique could be used for good (detect an automated harassment campaign) or evil (unmask a protester agitating for societal change against a powerful state).

As technologists we want to be able to look at a technology and discern if it is good or evil. Unfortunately we don't always have enough information.


> or evil (unmask a protester agitating for societal change against a powerful state)

Can you give an example of how this technique could be exploited in such a way?


Find a site the target visits frequently that lets you inject something that will make a request to your server (“poisoning the well”). Use the gathered IP address, User-Agent and Language headers to start building a persona to look for.


And how does images in CSS help with that?


I find it lesser evil. On one side it’s hidden analytics but on the other hand I find it much more superior to cookies that I carry around and which a lot of different entities can track.

I do find way of voting on this matter very interesting. Parsing the logs to get the results - how amusingly nerdy!


> Parsing the logs to get the results - how amusingly nerdy!

This was a big use-case for the Practical Extraction and Reporting Language 2 decades ago... :)

Today with decent JSON logs, it's also quite fun.


Oh, I wrote my share of scripts in Perl and parsed logs too (I remember when analytics was all about parsing access logs). But I wouldn't think to organize survey like that. I'd put this in DB. _Crafting_ site to utilize this is so simple though. Much easier then adding table to DBMS for sure.


I'm a believer that these kind of issues should be solved at the consumers end.

I haven't built a web browser, but I built a bot and it's somewhat doable to avoid getting tracked.

A browser could feed a fake user agent and format the browser to be the correct size. After that I believe it's only IP address and cookies which are easy enough to be blocked.

It even defeats the CSS tracking mentioned. "Oh someone downloaded image 6374tracker.png, but they were from UAE and are using Firefox" and are never seen again.

My only weakness on this subject is the low level headers, anyone familiar?


It has been best practice for some time now to detect not based on user agent, but by features. (plenty still use the UA approach of course)


What low level headers are you thinking of here?


I'm going to name drop things I don't know about and might be irrelevant

Data/network/transport/session layer can be detected through TCP?

So you need to fudge a few digits somehow at your router.


That's not a bad idea. I suspect that the author is not the first to come up with it. It's better than a heatmap.

Like most tools, it is up to the user, as to whether or not it's "evil."

I'm reminded of that rather silly little speech at the beginning of Dark Phoenix, where Xavier lectures Jean Gray about the uses of a pen.

If I were trying to understand users in something like A/B testing, I might use the technique, but I'd probably only do so temporarily. I'd need to make sure that the practice was outlined in the privacy policy.


I clicked "This is evil" although I think that is just partially correct. The problem with tracking is that it exploits functionality that wasn't intended to identify users.

In context of reality it is very nice that the author even ponders about it. This is already less evil than what we can expect on the "modern" web, however nefarious and tricky the mechanism might be. But loading a resource for tracking IPs isn't really intrinsically evil.


I find this slightly less evil then standard tracking, because information is not shared with a third party.

That being said, I think there's a line. If you use it to gather the exact same info as server log's, except with bot's filtered, that seems perfectly reasonable.

If you use it to track everything a user does on your page, that seems fairly evil.


The common user is not going to be aware and therefore this allows bad usage. So I would consider it as potentially evil.


The common user has JS enabled and probably expects their pageviews to be tracked


Assuming that stuff like this will eventually become common in the ongoing tracking/blocking arms race, I wonder how long it will be before privacy-oriented browsers/extensions start blocking stuff like :hover pseudo-classes (or eagerly loading all assets).


I can definitely imagine Firefox’s privacy.resistFingerprinting mode starting to block remote resource loading via :hover et al. It’s possible it already does, but I haven’t tested it and find no mention of it. It does several other things that break the occasional site, this would be much the same.

But on reflection, it’d be more than that: anything that’s hidden (within `display: none`) and gets shown on hover… hmm. Guess it’d need to ignore `display: none` in deciding to at least fetch the resources, or suffer weird “why is this not loading the background image?” complaints.


I'm sure this is already in use but maybe not as published because it is mostly a corner-case (non-JS users). Browsers have already had to implement lies in pseudo-classes, specifically :visited since it leaks too much data.


Related tangent: I remember being floored years ago by a realization that a seemingly-innocuous bit of CSS (the `:visited` pseudo-selector) could represent a significant privacy risk or aid to a phishing attack, if it were a _readable_ property.


So what is the preferred way to at least estimate real traffic without js? I have a static site that gets some traffic, but I have no idea how much of it is real. I'd just like to know if I'm reaching closer to 5 or 500 people a day.


Server-side tracking of some sort. Watching inbound HTTP requests and identifying/filtering stuff from known spam IPs (or your own IP) and origins.


One of those “if you have to ask …” questions. If you do it to extract information from your user to which they would not consent, it’s evil, regardless of how elegant it is or what the tech backing it is.


Definitely felt that while writing the post. So I don't plan to use this approach.

Just like JS-based analytics I don't think most users would object heavily to someone trying to differentiate them from non-interactive bot traffic to get an idea of whether they have any real readers or just bots. But I'm absolutely certain that there are people who will not concede any analysis of the visitor's agent capabilities and behavior as legitimate. The browser sets one boundary, general public another, privacy advocates another.

I personally try to err on the side of minimal analytics. But the default idea for people creating sites tends to be that Google Analytics is "fine" and something more privacy-oriented like Plausible or Fathom are the good/ethical option. I'm not sold on that though I'm glad that there are options that are less bad. I doubt most businesses won't throw analytics out entirely but they might be willing to pick something with a better ethics profile.


I don't find it evil.

Tracking is a problem when it affects your privacy. For example when too much data is collected, or when the data is handed to third parties. If you can collect just what you need, and keep that data to yourself, I really don't mind.

In this case, you measure visits anonymously without affecting the website's performance. You give me what I want without taking anything from me. I am completely fine with that.

In the real world, I'd compare it to tracking how many people enter a venue, or how many beers were sold. There is no way this could be used to tell if and when I visited that venue and bought drinks. You don't need my consent to do this.


Why is tracking evil? I want to know how people use my site so I can better optimize it.


Your use case may be acceptable, but most tracking is not acceptable.


User tracking and site analytics are not the same.


Why not?


Because it does not respect the users privacy. It's not too bad in a working democracy, but not all democracies last forever. Also, ads are bad.


So what? Why does not respecting a user's privacy make it evil, and why are ads bad? I'm just trying to understand why people reflexively say these things as I see these arguments a lot on HN.


Ads are bad because they use tracking to manipulate people.

Not respecting privacy can have bad consequences. If a country want a list of people they want to kill for various reasons such as religion, sexuality, political interests or whatever else, Facebook sells it.


I understand the privacy part now, thanks.

Why is manipulating people bad? We are manipulated every day by various psychological factors, like seeing someone eat and getting hungry, and the government for example exerts economic pressure for unneeded items like cigarettes and alcohol with a sin tax, manipulating us to not buy them as much.


Manipulating people with interests from corporations that are uncontrolled is not good.


Does Facebook sell that list? Where is it?


Through their ads platform, you can target people. Otherwise I'm sure you can buy it from Facebook employees with the right offer.


> Why does not respecting a user's privacy make it evil, and why are ads bad?

You just said it, because it's not what the user wants.


Where did I say that? I just asked why ads were bad.


"Why does not respecting a user's privacy make it evil" <- this is what they were referring to. It was quoted.


How does asking that mean that's not what the user wanted? I wasn't agreeing when asking the question, was just clarifying.


I think the very act of "not respecting" a user's privacy carries at least a heavy implication that it's not what users want.


Perhaps. People give up privacy for many reasons, sharing on Facebook for example. If the user was respecting their own privacy then they wouldn't use social media at all, but people willingly upload their personal details.


Underjorden is always evil.


You got me.


technology is not evil, usage and how leaked data can be exploited is what needs to be thought on to determine if something is evil.


use to love ETag :)


It's kinda bad for multiple reasons...

.) definitely an abuse of CSS, which is meant for visual styling

.) invisible to the user, can't even be blocked with noScript (blocking all JS), and blocking CSS would make most sites unusable or at least less than "human readable"

.) could be used to track and record a lot of information that does qualify as non-anonymous, personal data (in combination with IP address and timestamp)

.) probably illegal according to the GDPR, unless you fully inform the user and get their consent first (before loading the CSS) - and allow them to opt out.


What personal information do you get that would be different from the HTTP requests sent by the browser to load the page in the first place?


It's absolutely crazy, how much information you can reliably read "between the lines" if you just collect enough data.

For example check out this CCC-talk by David Kriesel - in which he demonstrates how much you can figure out just from some limited, publicly available data on a news website: https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...

Being able to track mouse movements and hover interactions and scroll position and time spent and other activity on the page, gives you a ton more data to mine that way than when just registering page loads.


But... how do you do this from CSS? Do you mean that you would add a :hover on a bunch of elements to gather multiple requests and the approximated position of the mouse?


For most behavioural profiling stuff it should be sufficient to just know which page element the mouse was over, no pixel-precise x/y coordinates needed.

But if a mouse enters element A at time x, then enters element B at time x+n, and then enters element C at time x+m (and so on) - you can extrapolate the path the mouse most likely must have taken - and at which speed and acceleration - to be able to get to all these positions at those time points. It's just an approximation of course, but can be surprisingly accurate.


FWIW, there are browser addons to disable animations and transitions in CSS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: