I really wish that browsers had developed first-class support for offline web page bundles. There's no way to share a page that is guaranteed to be self-contained and not hit the network, especially if you want to use javascript. It's particularly frustrating since browsers supported offline mode as far back as the 90s; it just needed to be combined with support for loading from zipped folders.
That simple change would've largely solved the academic paper problem decades ago. It's bizarre that it still isn't a feature.
Mail clients kinda do that (or at least they can, if asked to). Also, why would academic papers need JS anyway? CSS and images, I can get, but beyond that there's no need for anything fancier.
Yes, but it's not guaranteed to be self-contained. I wouldn't want to open a random HTML file knowing that it could phone home, or that the content might break one day without me realizing. There's a practical and psychological aspect to sharing `steves_paper_2014.html` versus `steves_paper_2014.offlinesitebundle`. The latter feels safe and immutable.
What you want is an HTML tag or response header that restricts network access, which the browser can then enforce. Offline or a list of allowed domains, this would be great for security in general. Not so great for advertisers though.
Then you have to verify that the tag is there, right? But if it has another extension like .offlinebundle you can know thay browsers will not make any extra requests.
Browsers don't have native support for opening WARC. It doesn't solve the safety problem either: you can still construct a WARC that phones home, AFAIK.
It's a great format for the problem it solves, but if browsers supported offline-only files the container format wouldn't (and shouldn't) need to be that complicated.
Plus I can't use web tools, like "Read this page" in Mobile Safari.
And copying and pasting is harder.
And I can't link to individual sections.
I'm honestly baffled by people who prefer PDFs for this kind of information. Are they printing them out on paper and going at them with a highlighter or something?
Just my personal take, but when I have to read something carefully, I find it easier to do on paper.
For example, I recently wrote an article about taking random samples using SQL. Even though I was writing it for my blog, which is HTML, I proofread the article by rendering it as a PDF doc, printing it out, and reviewing it with a blue pen in hand.
What surprised me is that I also found it easier to review the article on the screen when it was in PDF format. TeX just does a way better job of putting words on a page than does a web browser.
Actually, if you want to do the comparison yourself, I'll put both versions online:
On mobile phone, as a reader with photophobia, the pdf causes physical pain, and is illegible, whereas the html is perfectly readable via reader mode (where text can be enlarged and dark mode settings are respected.
Personally, it's sending it to GoodReader on a 13" iPad.
I don't know that I'd go so far as to say I 'prefer' this, but there are a lot of PDFs out there, this works fine, and it's a nice change of pace given how much time I spend in front of a monitor / laptop screen.
Translating LaTeX to HTML is not a straightforward process, unfortunately. Many people have tried to implement automated translation systems, but nothing has really worked out yet.
I think it's unfair to expect the research team to invest additional hours in learning how to make good websites, so to solve your problem would require hiring additional talent whose only job is to translate academic PDFs into accessible web pages. I don't think that's a bad idea, and certainly Google has the funds to do something like that, but I don't imagine they'd find it to be a good use of money. Accessibility is an afterthought for most major companies these days.