tyropita's comments

tyropita · on June 9, 2023

+1. Also according to some swedish friends, aftonbladet is not a very high quality news service.

piva00 · on June 9, 2023

Aftonbladet may not be a very high quality news service, the article is still real, it's still the words of our PM.

As an immigrant who has more-or-less integrated and adapted into Sweden, lived here for nearly a decade, I completely despise this op-ed. The nationalistic way he talks in is something I never expected from Swedish politics, it's a nuanced way to play nationalism but is still shockingly unexpected.

Ulf Kristersson sold his soul to become PM, he is on the record barely 5-6 years ago saying he'd never ally himself and his party (M - Moderaterna) with SD, he has broken that to become PM and this is the blow SD wanted: a more nationalistic political discourse. Playing the Swedish language into this just makes me reek.

Jag är ledsen för er som röstade in denna regering...

sshine · on June 10, 2023

A former Danish prime minister is known for saying

  Man har et standpunkt til man tager et nyt

Or translated

  You make a standpoint until you make a new one

https://da.wikipedia.org/wiki/Man_har_et_standpunkt_til_man_...

Nullabillity · on June 9, 2023

Say what you want about them, Kristersson is definitely our PM, and I doubt they would lie about him having been the author of the op-ed.

tyropita · on June 2, 2023

Makes you wonder if they use LLMs and a chat interface to generate text-based role-playing games, renamed as “simulations”.

tyropita · on May 26, 2023

Documentation looks really neat and in-depth, always appreciated. Looks like you’re missing a .gitignore file. Folders like __pycache__ don’t need to be checked in.

tyropita · on Nov 26, 2022

It was indeed very unexpected. The first time you would try out a cli tool you’d expect just calling its name to return help info and maybe an error.

vbezhenar · on Nov 26, 2022

/sbin/reboot would surprise you.

Use --help for help info.

k0k0r0 · on Nov 26, 2022

Funniest reply I read in a while

smegsicle · on Nov 27, 2022

    shutdown /?

oh no

philwelch · on Nov 26, 2022

Or `man`

Extropy_ · on Nov 26, 2022

Or `make`

dp-hackernews · on Nov 27, 2022

tyropita · on July 10, 2022

Quite a neat way to crawl websites using a browser extension. That by itself is a form of donation to the search engine. Maybe in the future you can have dedicated software for self-hosted clients that users can run to crawl and index websites for mwmbl? Kinda like folding@home.

How are the batches of URLs to be crawled generated/discovered and posted at your API?

How do you deal with duplicate crawls?

daoudc · on July 10, 2022

Roughly this https://github.com/mwmbl/mwmbl/wiki/Crawler-Phase-2-Design

It changed a bit in the implementation.

thatwasunusual · on July 10, 2022

I have also thought that distributed crawling with the help of browser extensions, and/or clients like folding@home, could be a good idea. But how to deal with "spam injections"?

tyropita · on July 10, 2022

After a certain scale I think you can let clients do double-work and let the most common crawl data, among different clients, win.

And since you control what URLs need to be crawled, you protect yourself against rogue clients sending arbitrary URLs.

There certainly are a lot of elegant ways to reduce spam for this particular problem imo.

thatwasunusual · on July 10, 2022

> And since you control what URLs need to be crawled, you protect yourself against rogue clients sending arbitrary URLs.

I'm not worried about the URLs, but the content of the URLs sent back.

Say the server tells a client to crawl a CNN article. The "hacked" client sends a fake CNN article back.

happymellon · on July 10, 2022

Get 3 people to scrape it and see if there are significant differences.

Some might, because of A/B testing or news updating, but even updating news will get a positive similar page and those that don't should probably fall into an exceptions category until it can be determined what can be done about it. Maybe a flag in the URL to give you a static page or just accept that it changes often enough that even faked pages won't last long?

zaarn · on July 10, 2022

Then I'll just add 3 million bots to the network (or just enough to have about 50%) and I can guarantee to win the A/B test against an honest client most of the time.

8organicbits · on July 10, 2022

It's an arms race, but this is mostly a question of rate limiting account creation, assigning a trustworthiness score to different accounts, some network analysis to detect coordinated accounts, and having some trusted accounts (run by the project) that can help double check results. After an account loads poisoned data, you can detect this after the attack (user reported spam), and then block (or probably shadow ban) the malicious account.

zaarn · on July 10, 2022

You make it sound easy but companies have been trying to fight this stuff for ages.

You can buy a trustworthy residential IP for low cost, you can buy them in bulk in the thousands. All of them are real residential IPs from any ISP of your choosing in any country. You can rent Chrome browsers running over those IPs, directed via remote desktop and accessibility protocols (good luck banning that without running awful of anti-discrimination laws). You can do all that for under 1k$ a month for like 1 million clients.

My workplace has been at the other end of DDoS attacks directed by such services, best you can do is ban specific Chrome versions they use but that lasts until they update.

It's an uphill battle that you will loose in the long term if you rely on client trust.

8organicbits · on July 10, 2022

In terms of spam injection (the concern from up thread) I don't think DDoS is relevant. If the core project manages asking clients to process URLs, they'd just IP ban any client that returns too many results. DDoS is a concern for other reasons though.

I think in this specific case, the spammer is on poor footing. The spammer wants to inject specific content, ideally many times. With double processing of URLs and the spammer controls 50% of the clients then there's a 50% chance that a simple diff would show the injected spam. The problem is that the spammer needs to do this many times, so their injection becomes statistically apparent. If the spammer can only inject a small number of messages before they are detected, then the cost per injected spam will be quite high. Long running spam campaigns could eventually be detected by content analysis, so the spammer also needs to rotate content.

Obviously you can play with the numbers, the attacker could try to control >>50% of the clients. The project could process URLs >2x. The project could re-process N% of URLs on trusted hardware, etc. It's not easy by any means, but you can tune the knobs to increase the cost for spammers.

thatwasunusual · on July 10, 2022

> but this is mostly a question of rate limiting account creation, assigning a trustworthiness score to different accounts, some network analysis to detect coordinated accounts, and having some trusted accounts (run by the project) that can help double check results.

You make it sound easy. ;)

happymellon · on July 10, 2022

You can also then find all the other bots that are attempting to poison the well with the same junk and shadow ban them all.

closedloop129 · on July 10, 2022

Then OP has to do things that don't scale: Review some pages and identify a subset that can be trusted. Then OP can compare their downloads to new accounts and mark the bots.

zaarn · on July 10, 2022

Then the botnet will just be honest for like a year before it abuses the network. Even better because now honest new clients can be kicked as they disagree with the bot majority. So now the network bleeds users.

8organicbits · on July 10, 2022

Checking which account is honest isn't too hard, you detect that there is a "problematic mismatch" between two clients. So the project runs their own client to check. If one has an exact match, then you'd question the other.

There is a challenge for sites that serve different content based on GeoIP, A/B testing, dynamic content, etc. So some human review of the diff may help check for malice. If there's literally spam, human review would clearly detect this and that bot is distrusted.

zaarn · on July 10, 2022

Then I'll simply use more bots to get 80% of the network, then I can almost always win any disagreements and your "problematic mismatch" never triggers.

Plus I can now cause you to have to run your own crawler anyway and either slow progress or cost you a lot of money.

6510 · on July 10, 2022

YaCy just crawls the results [again] locally before showing them to you.

thatwasunusual · on July 10, 2022

Maybe I misunderstand, but doesn't that mean you lose the benefit of having distributed crawlers if everything has to be crawled (again) locally somewhere?

nix23 · on July 10, 2022

YaCy can do distributed crawling and exchange the Indexes (in Peer to Peer mode). I have some node's who just receives and send indexes without crawling (much less storage intensive).

tyropita · on March 17, 2022

Have you found more open-source projects that follow a similar approach to spreading information about the Ukraine-Russia conflict?

Would be nice to compile a list of them!

toyg · on March 17, 2022

Lists tend to attract the "proscription" prefix, when in a political context.

Also, if you support the stance, you arguably don't want to make it easy for Russian/Belarusian users to avoid software, you want to maximize their inconvenience - which you do when they invest time into learning something and then find out they are technically not allowed to use it in production.

This said, it's all theatre anyway - I would bet good money that most Russian businesses already didn't care about respecting FOSS licenses, considering the infamously lax attitude about copyright enforcement in that country (and in China and Iran). So adding that sort of clause will stop absolutely no one, in practice, and I bet the authors will never even try to sue anyone in Russia (or anywhere else with some sort of jurisdiction over Russia, like the WTO) for infringing it.

tyropita · on March 17, 2022

What alternative solutions does the HN crowd recommend to Google Photos etc?

mceachen · on March 17, 2022

I'm the author of PhotoStructure. I started working on it to clean up the digital mess I had accumulated due to failed/cancelled photo management software, not to replace Google Photos:

https://photostructure.com/about/introducing-photostructure/

Since then, I've been working on improving deduplication, scaling to very large libraries, and whittling away at the feature list my users and I collaborate on. I keep detailed release notes here: https://photostructure.com/about/release-notes/

Replacing at least the organization and sharing aspects of Google Photos is now my top priority. Improving search (with geo and faces and ML object labelling) is my next.

tyropita · on March 2, 2022

Just imagine regular Ukrainian folks commuting to work in their personal tanks. What a time to be alive!

hef19898 · on March 2, 2022

Well, they are propably not allowed to keep them, just sell them for tax free revenue.

justinclift · on March 2, 2022

"I'm hanging onto it for the next time they try this shit..." :)

xattt · on March 2, 2022

If an individual is concerned enough about tax implications of a new acquisition, the individual probably has enough foresight to consider the cost of fuel.

hef19898 · on March 2, 2022

Private ownership of military weapons usually is less about fuel..

tyropita · on Jan 26, 2022

What was the Great Scare?

throwaway22032 · on Jan 26, 2022

That time when there was a big propaganda campaign to get everyone to hide inside for a big scary virus.

We probably maybe not really sure saved 0.5%-1% of the population, so it was worth 2-3% of our lives worrying about it, 10% inflation, public transport becoming financially unviable, various career paths literally disappearing, an irreparable political split, education being deleted for a bit, etc.

Thankfully, slightly later on, ~all then got it anyway so now there's at least a critical mass of people who realise this was all a load of shit and I don't have to adblock it by self excluding from all social networks.