Bram Cohen's crazy idea
I just posted the following comment to Bram Cohen's blog. His blog ate the comment, so I expect it's being moderated. For my own records, I'll duplicate it here:
This is an interesting idea. I have a couple of questions:
1) You talked about "multiple runs through", but it's not clear to me what you are changing in each run. This is a pretty basic thing that I'm not getting, so the rest of my comments may be way off as a result. :-)
2) Also probably obvious, but you probably want to be very clear on the fact that the input variables are required to be different. A^A = 0, and hash(A)^hash(A) = 0, so that result is trivial.
3) Is it guaranteed that a collision will be found? Why? It seems that given your first proposal of 2^76 combinations, that is so much less than the solution space of a typical hash function (2^160 for SHA1) that most of the time no collision will be found at all. You say there should be a result ending in 76 zeros... why would the zeros cluster at the end? Having written this, I think I'm getting it (pidgeonhole principle), but you might still want to clarify that for the benefit of your readers.
4) Given the two-variable scenario, you sort of glossed over the fact that it is possible to just brute force this with a very small amount of RAM: For each possible variable A, iterate through the rest of the keyspace for B. This algorithm can be made more efficient using any amount of RAM by using a sliding window of possible A values. Finding the sweet spot of CPU/RAM will depend on the capabilities of the hardware available, and this approach can run massively parallel - which is bad if you want to keep custom hardware out of the game. I think this can be easily expanded to the four variable scenario as well.
5) Alternate attack strategy: working backwards from hash(A) ^ hash(B) ^ hash(C) ^ hash(D) = 0 being our goal, and A != B != C != D ... what if we just choose the hash values first? For example, 1001, 0110, 1100, 0011 as our hashes trivially satisfies this. Then we just start searching for A, B, C and D all at once - hashing possible values until we get one of our targets. That gives us a complete result in one pass through the key space. This is O(n) (n size of hash solution space), no sorting and almost no RAM required. Am I wrong? :-)
Please excuse me if I've misunderstood or overlooked anything obvious. Thanks for sharing the idea!
For context, I am interested in this sort of thing, as one of the long-term stretch goals for Mailpile (which I'm working on) is to define a secure alternative peer-to-peer delivery path for e-mail (bypassing legacy the SMTP servers entirely), probably over Tor. Making some sort of proof-of-work a mandatory part of that might be a useful part of a spam reduction strategy.