This is spectacularly misleading. That second address isn't real, and doesn't wo...

3np · on Jan 29, 2024

Indeed. For anyone unfamiliar with the nature of cryptographic hashes, each character increases the difficulty to get a collision exponentially.

~10 characters are easy enough to generate on a single machine so don't rely on a vanity-prefix and the trailing couple of characters only, but getting a new .onion address matching even half of an existing one within the lifetime of civilization is unrealistic even with state-actor resources.

You'd be better off trying to brute-force Satoshi's bitcoin private keys if you're feeling that lucky...

https://github.com/cathugger/mkp224o/issues/27

schoen · on Jan 29, 2024

If I'm following my intuitions about the math in the right direction, the probability of getting a single-character-or-less edit distance from a given target hash is (56×32)/32⁵⁶ per attempt.

The expected number of attempts to get one success at this would then be about 2²⁶⁹. Even so, a typosquatting victim would be very unlikely to make the exact right typo for the attack to work!

I think my reasoning is wrong somehow because I think there are only 2²⁵⁶ different onionsite public keys, so it doesn't quite make sense that you would have to do 2¹³ more work than trying all of them. But I'm still pretty convinced that it's going to be infeasible without a strong break of the hash function.

In terms of attacks that merely try to generate onion addresses that are merely somewhat visually similar to target ones (e.g. by matching at the very beginning and very end?), these are possible, and it would be interesting to see research about how likely people are to fall for various attacks like that. Maybe that research has already been done?

Jerrrry · on Jan 29, 2024

>If I'm following my intuitions about the math in the right direction

you are, except our theoretical familiarity with math and the antecedent nature of life can easily lead us to intuitions that mature to fallacies quickly.

https://en.wikipedia.org/wiki/Birthday_problem

>e.g. by matching at the very beginning and very end?)

Thankfully those smarter than us have solved this problem too - the "hashing" algorithm is so fundamentally lossy (but not too lossy to fall into the pidgen-hole paradox) 1-way, that it is mathematically impossible to have any knowledge of the end of the hash before you get it.

You can "brute-force" it backwards, sure (for some old hashes obviously) - give me a string that's MD5 starts with "Jerry" and ends with "loves math", and I will congratulate you on your waste of computational resources.

schoen · on Jan 29, 2024

Sure, but targeting similarity with a previously-chosen hash is a scenario where the birthday paradox doesn't come in. The case where it does would be "can we produce two arbitrary new hashes that are similar in this way?", in which case the amount of work required might be about the square root of what our intuition might suggest.

(although I think there's an explosion in the required space in that case because you need to store information about all of the values that you've already been able to produce, in order to learn whether new values collide with them!)

Jerrrry · on Jan 29, 2024

>"can we produce two arbitrary new hashes that are similar in this way?"

arbitrary may be the heavy lifter here, we can certainly birthday-paradox two address that look similar (square root, yes)

>(although I think there's an explosion in the required space in that case because you need to store information about all of the values that you've already been able to produce, in order to learn whether new values collide with them!)

bloom hash table a bloom hash table with some nerdy optimizations for backtracking, depending on whether your IO/CPU/GPU or network were the bottleneck. If you got a double-positive, skip the integer/nonce/etc.

Although, realistically, I'd be very surprised if in a quintilion PETAFLOPS you found a single 128bit number that, after being hashed twice, starts with "face" and ends with "book"

n2d4 · on Jan 29, 2024

Arbitrary means: It's "easy" (square root) to find two numbers that resemble each other in a sufficiently large set, but neither of them will resemble anything meaningful. It's still "hard" to find a number that resembles a previously given different number, such as the bbcnews hash above. (The chance that any two kids in a room share a birthday is fairly high; the chance that a kid has their birthday on January 1st is much lower.)

> Although, realistically, I'd be very surprised if in a quintilion PETAFLOPS you found a single 128bit number that, after being hashed twice, starts with "face" and ends with "book"

We can just calculate it. "face" + "book" is 8 characters in base 64, for a total of 8*6=48 bits that need to be set a certain way. 2^48 is roughly 10^15. Hashing once or twice barely matters at this point (2*10^15 ~=~ 10^15). A quintillion petaflops is 10^33 flops, so unless your hashing algorithm takes 10^18 floating point operations, you have an incredibly high probability of finding such a number within a second.

Jerrrry · on Feb 1, 2024

nerd.

Just kidding - this was a much more clearer response with some actual math behind it, thank you.

This (may) explain a facet of .onion phishing attempts at a certain point in time.