Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sadly, Google is also sitting on a HUGE amount of archival newspaper in a completely unsearchable state.

They've OCR'd thousands of historical papers but the search is completely broken.

https://news.google.com/newspapers

It's a pretty incredible resource if you know the date/place of an event but if you wanted to research a little more broad it's a tool with incredibly untapped potential.

I wish they would spend more time providing tools for people to consume information in powerful ways rather than trying to 'tailor' experiences.



> I wish they would spend more time providing tools for people to consume information in powerful ways rather than trying to 'tailor' experiences.

Power users are a very, very small number, probably sub-million, generate zero revenue, complain a lot and want features that cost a lot of money. There's no reason why Google would target them.

But I'm sure you can find a professional service that does that, allowing for powerful search of historical news. Probably costs a lot, though.


Empowering users to become more and more power users is literally the reason humanity bothers with science and technology. It was the reason shared by many people who started now successful software companies. It's sad to see profit motive distract people away from that core reason.


> Empowering users to become more and more power users is literally the reason humanity bothers with science and technology.

Quite the opposite. Technology allows users to become less and less power users.

> It's sad to see profit motive distract people away from that core reason.

Not sure I got that part, are you complaining that other people are not leaving their jobs and living aside to build the product you want for free with no pay?


> Quite the opposite. Technology allows users to become less and less power users.

That seems pedantic. Technology allows users to become "less and less power users" by giving them the tools to solve problems they were not advanced enough to solve before, or that were too time-consuming or difficult to solve before.

That's exactly the same point the OP is making -- Google has the potential to build a tool which would allow people who otherwise cannot make use of these archives to do so.


> That seems pedantic. Technology allows users to become "less and less power users" by giving them the tools to solve problems they were not advanced enough to solve before, or that were too time-consuming or difficult to solve before.

Exactly, today a regular user can do what a power user could do years ago.

Create a full website with good design, structure, functionality? You had to be a power user, but not anymore.

Drive a car? You had to know how to use a clutch, shift gears, etc. Not anymore.

Access the internet? You had to buy a modem, configure it, connect, etc. Not anymore.

And so on.

Technology removes burdens, and allows regular users to do what only power users could before.

People don't want to be power users. People don't even want to be users. People want to listen to music, move, read.


I don't agree-- I don't think most people want to be power users nor do I think science and technology is supposed to produce more power users. People want hard and complex things to be done easier for them.


Universities typically have subscriptions to service(s) that do just this.


There used to exist Google News Archive Search, and it really worked. I was able to search for the name of an obscure person who died in 1910, find obituaries of them and create a Wikipedia article about them, etc.

Then in 2011 it was shut down, and also the number of newspapers hasn't grown, it has actually become smaller.[1][2] It's claimed to be back in some form, but those who used to use it earlier aren't really satisfied.[3]

[1]: https://www.theatlantic.com/technology/archive/2011/05/googl... [2]: https://en.wikipedia.org/w/index.php?title=Google_News_Archi... [3]: https://productforums.google.com/forum/#!topic/news/Fw2caKy6...


I agree. Similar situation with the Google Books interface, which as far as I can tell has changed little or at all since it was launched in 2005.

Admittedly, I'm a niche within a niche segment for them (professional historian) but if Google improved the functionality of their newspapers and books services, it would translate to increased research productivity for my entire field (and for anyone else who uses archival book and newspaper scans regularly, like investigative journalists). It's a relatively intangible change but one that isn't inconsiderable, especially in terms of generating goodwill among students and researchers.

The subscription services are a complete mess at the moment, analogous to the state of for-profit academic publishers in general. Google has a golden opportunity to establish itself as an alternative to the predatory publishers who generally run digitized newspaper and article archives.


Correct me if I am wrong, but isn’t the reason the Google Books interface is so limited due to a court decision? Essentially the book publishers threatened to sue Google into the ground due to the copyright infringement of making full scans publicly available without a licensing agreement.


What specifically is broken? Basic keyword search seems to work.


The original edition of the Google News Archive included a timeline that helped you narrow down the date range. See this help page from 2008: http://web.archive.org/web/20080905060327/http://news.google...

I find the omission in the current engine somewhat frustrating. If you are looking for original references, this timeline was very helpful -- a lot of times, the current search pulls up more modern references that aren't necessarily as useful if you want perspectives from the past. There are ways to search the current archive by date, but they are not terribly intuitive (https://www.thoughtco.com/search-tips-for-google-news-archiv...).

(My memory is that the previous search engine gave better results as well, but unfortunately it's difficult for me to prove whether that's correct one way or another.)


keyword search is the very bare minimum for search


Any search at google is N mapreduce jobs away from a keyword search.

In theory, having a really nice search engine for this news would be rather easy for Google (/any Googler with a few weeks of time and access to this data).

This is essentially what every new employee at Google has to do in the orientation week.


I recently got a subscription to newspapers.com to get access to archival data. Surprised no one has mentioned them. It's not super cheap, but I think it's worth it for the amount of information available. TBH I don't use it much for some reason but when doing research it will let you access sources most people ignore, because it's not google or facebook.


What do you expect? They're just an advertisement agency. They don't have the tools or skills required to perform advanced searches in a large, unstructured data corpus.

Oh, wait.


This wasn't the reason given when the 2011 announcement was made regarding the end of the project. But since then, the archive has faced copyright issues as newspapers sign contracts with other archive services: https://www.techdirt.com/articles/20160825/10114535344/newsp...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: