From: Steve Davis <steven.charles.davis@gmail.com>
To: bitcoin-dev@lists.linuxfoundation.org
Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to attakcs by third-parties, not just repo maintainers
Date: Fri, 24 Feb 2017 17:49:36 -0600 [thread overview]
Message-ID: <8F096BE1-D305-43D4-AF10-2CC48837B14F@gmail.com> (raw)
In-Reply-To: <mailman.22137.1487974823.31141.bitcoin-dev@lists.linuxfoundation.org>
If the 20 byte SHA1 is now considered insecure (with good reason), what about RIPEMD-160 which is the foundation of Bitcoin addresses?
Is that also susceptible to such an attack vector?
What does that mean for old addresses?
etc
/s
> Date: Fri, 24 Feb 2017 11:04:54 +0100
> From: Tim Ruffing <tim.ruffing@mmci.uni-saarland.de>
> To: bitcoin-dev@lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> attakcs by third-parties, not just repo maintainers
> Message-ID: <1487930694.1528.1.camel@mmci.uni-saarland.de>
> Content-Type: text/plain; charset="UTF-8"
>
> On Fri, 2017-02-24 at 00:57 +0100, Aymeric Vitte via bitcoin-dev wrote:
>>
>> I have not worked on this since some time, so that's just thoughts,
>> but maybe it can render things much more difficult
>> than???????computing two files until the same hash is found
>>
>
> You basically rely on the idea that specific collisions are more
> difficult to find.?This trick or similar tricks will not help.?(And
> actually, the more files you add to the hash, the more freedom you give
> the attacker.)
>
> Even if certain collisions are more difficult to find today (which is
> certainly true), the general rule is that someone will prove you wrong
> in a year.
>
> Even if ignore security entirely, switching to new hash function is
> much simpler trying to fix the usage of a broken hash function.
>
> Relying on SHA1 is hopeless. We have to get rid of it.
>
> Best,
> Tim
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 24 Feb 2017 16:18:43 +0100
> From: Aymeric Vitte <vitteaymeric@gmail.com>
> To: bitcoin-dev@lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> attakcs by third-parties, not just repo maintainers
> Message-ID: <15848c1b-2873-35e8-0588-c636126257df@gmail.com>
> Content-Type: text/plain; charset=utf-8
>
> Not sure that you really read deeply what I sent, because stating that
> hashing files continuously instead of hashing the intermediate steps
> just gives more latitude to the attacker can't be true when the attacker
> has absolutely no control over the past files
>
> I did not write this as a workaround to fix SHA1, which will be dead
> soon or later but as maybe some general concept that could possibly help
> whatever hash function you are using for objects that are not frozen but
> extending (ie the original email stating that trees might be some kind
> of worse candidates for collisions reminded me this), indeed it makes no
> sense to patch SHA1 or play around, but this kind of proposal could
> accompany the defunct
>
> The drawback is that you have to keep the hash state when you close the
> latest hash computation in order to start the next one
>
> Then the question is: knowing the hash state, is it as easy to find a
> collision between two files that will be computed in the next round than
> finding a collision between two files only?
>
> Knowing that you can probably modify the hash state with some
> unpredictable patterns
>
> Most likely the answer is: no, it's (astronomically?) more difficult
>
> Please take it as a suggestion that might be explored (ps: I have the
> code for this if needed) rather than an affirmation, still amazed as
> shown in the few links provided (among others) that each time I raise
> this subject nobody really pays attention (what's the use case?, etc)
> and by the fact that it's apparently used by only one project in the
> world and not supported by any library
>
>
> Le 24/02/2017 ? 11:04, Tim Ruffing via bitcoin-dev a ?crit :
>> On Fri, 2017-02-24 at 00:57 +0100, Aymeric Vitte via bitcoin-dev wrote:
>>> I have not worked on this since some time, so that's just thoughts,
>>> but maybe it can render things much more difficult
>>> than computing two files until the same hash is found
>>>
>> You basically rely on the idea that specific collisions are more
>> difficult to find. This trick or similar tricks will not help. (And
>> actually, the more files you add to the hash, the more freedom you give
>> the attacker.)
>>
>> Even if certain collisions are more difficult to find today (which is
>> certainly true), the general rule is that someone will prove you wrong
>> in a year.
>>
>> Even if ignore security entirely, switching to new hash function is
>> much simpler trying to fix the usage of a broken hash function.
>>
>> Relying on SHA1 is hopeless. We have to get rid of it.
>>
>> Best,
>> Tim
>>
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
> --
> Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
> Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
> Get the torrent dynamic blocklist: http://peersm.com/getblocklist
> Check the 10 M passwords list: http://peersm.com/findmyass
> Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
> Peersm : http://www.peersm.com
> torrent-live: https://github.com/Ayms/torrent-live
> node-Tor : https://www.github.com/Ayms/node-Tor
> GitHub : https://www.github.com/Ayms
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 24 Feb 2017 17:30:49 +0100
> From: Tim Ruffing <tim.ruffing@mmci.uni-saarland.de>
> To: bitcoin-dev@lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> attakcs by third-parties, not just repo maintainers
> Message-ID: <1487953849.5148.2.camel@mmci.uni-saarland.de>
> Content-Type: text/plain; charset="UTF-8"
>
> On Fri, 2017-02-24 at 16:18 +0100, Aymeric Vitte via bitcoin-dev wrote:
>> Not sure that you really read deeply what I sent, because stating
>> that
>> hashing files continuously instead of hashing the intermediate steps
>> just gives more latitude to the attacker can't be true when the
>> attacker
>> has absolutely no control over the past files
> What prevents the attacker to provide different past files when talking
> to parties who are still in the initial state?
>
> Then the question is: knowing the hash state, is it as easy to find a
>> collision between two files that will be computed in the next round
>> than
>> finding a collision between two files only?
> With the original usage of the hash function, the hash state is always
> the initial state. Now that the attacker has some control over the hash
> state even. In other words, if the original use of the hash function
> was vulnerable, then your scheme is vulnerable for the initial state.
>
> Concrete attack: If you can find x != y with H(x) = H(y), then you can
> also find m, x != y, with H(m||x) = H(m||y), just by setting m = "".
>
> Not sure if this is the right place to discuss that issue though...
>
> Best,
> Tim
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 24 Feb 2017 18:29:50 +0100
> From: Aymeric Vitte <vitteaymeric@gmail.com>
> To: Tim Ruffing <tim.ruffing@mmci.uni-saarland.de>, Bitcoin Protocol
> Discussion <bitcoin-dev@lists.linuxfoundation.org>
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> attakcs by third-parties, not just repo maintainers
> Message-ID: <b557a0de-2492-80a1-eff7-229503ae382d@gmail.com>
> Content-Type: text/plain; charset=windows-1252
>
> ??? apparently we are not discussing the same thing
>
> Maybe I did not provide the right links (reading them again I myself
> don't find them so clear), see maybe again
> https://github.com/whatwg/streams/issues/33#issuecomment-28045860
>
> a - b - c -d
>
> hash(a)
>
> hash(a+b)
>
> etc
>
> But you are not going to rehash from the beginning, then:
>
> update a --> keep the remaining bytes a_ (+ hash state 1) --> digest
> a=hash(a)
>
> update a_+b from hash state 1--> keep the remaining bytes b_ (+ hash
> state 2) --> digest a_+b=hash(a+b)
>
> etc
>
> Basically that's similar to a real time progressive hash of chunks of a
> file that you are streaming and therefore don't know what will come next
> (per opposition to hashing a file that you already have), this could
> apply to trees
>
> This is different from something like:
>
> hash(a)
>
> hash(hash(a) +hash(b))
>
> etc
>
> There is no initial state, and the attacker can't modify what was
> already hashed, to make it more difficult you can probably modify the
> hash state N
>
>
> Le 24/02/2017 ? 17:30, Tim Ruffing via bitcoin-dev a ?crit :
>> On Fri, 2017-02-24 at 16:18 +0100, Aymeric Vitte via bitcoin-dev wrote:
>>> Not sure that you really read deeply what I sent, because stating
>>> that
>>> hashing files continuously instead of hashing the intermediate steps
>>> just gives more latitude to the attacker can't be true when the
>>> attacker
>>> has absolutely no control over the past files
>> What prevents the attacker to provide different past files when talking
>> to parties who are still in the initial state?
>>
>> Then the question is: knowing the hash state, is it as easy to find a
>>> collision between two files that will be computed in the next round
>>> than
>>> finding a collision between two files only?
>> With the original usage of the hash function, the hash state is always
>> the initial state. Now that the attacker has some control over the hash
>> state even. In other words, if the original use of the hash function
>> was vulnerable, then your scheme is vulnerable for the initial state.
>>
>> Concrete attack: If you can find x != y with H(x) = H(y), then you can
>> also find m, x != y, with H(m||x) = H(m||y), just by setting m = "".
>>
>> Not sure if this is the right place to discuss that issue though...
>>
>> Best,
>> Tim
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
> --
> Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
> Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
> Get the torrent dynamic blocklist: http://peersm.com/getblocklist
> Check the 10 M passwords list: http://peersm.com/findmyass
> Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
> Peersm : http://www.peersm.com
> torrent-live: https://github.com/Ayms/torrent-live
> node-Tor : https://www.github.com/Ayms/node-Tor
> GitHub : https://www.github.com/Ayms
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 24 Feb 2017 14:20:19 -0800
> From: Bram Cohen <bram@bittorrent.com>
> To: Peter Todd <pete@petertodd.org>
> Cc: Bitcoin Protocol Discussion
> <bitcoin-dev@lists.linuxfoundation.org>
> Subject: Re: [bitcoin-dev] A Better MMR Definition
> Message-ID:
> <CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> So your idea is to cluster entries by entry time because newer things are
> more likely to leave and updating multiple things near each other is
> cheaper?
>
> That can be done with my tool. Instead of using hashes for the values being
> stored, you use position entries. The first entry gets a value of all
> zeros, the next one a one followed by all zeros, then the next two
> correspond to the first two with the second bit flipped to one, then the
> next four the first four with the third bit flipped to one, etc. It
> probably performs a little bit better to do it two bits at a time instead
> of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
> 0110, 0111, 1001, etc. If you were to really use this you'd probably want
> to to add some optimizations to use the fact that the terminals fit in 64
> bits instead of 256, but it mostly works unchanged, and gets whatever
> benefits there are to this clustering plus the high performance
> implementation tricks I've built which I keep complaining that nobody's
> giving feedback on.
>
> I'm not sold on this being a win: The empirical access patterns are
> unknown, it requires an extra cache miss per lookup to find the entry
> number, it may be that everything is optimized well enough without it for
> there to be no meaningful gains, and it's a bunch of extra complexity. What
> should be done is that a plain vanilla UTXO set solution is optimized as
> well as it can be first, and then the insertion ordering trick is tried as
> an optimization to see if it's an improvement. Without that baseline
> there's no meaningful basis for comparison, and I'm quite confident that a
> naive implementation which just allocates individual nodes will
> underperform the thing I've come up with, even without adding optimizations
> related to fitting in 64 bits.
>
> On Thu, Feb 23, 2017 at 8:36 PM, Peter Todd <pete@petertodd.org> wrote:
>
>> On Thu, Feb 23, 2017 at 07:32:43PM -0800, Bram Cohen wrote:
>>> On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete@petertodd.org> wrote:
>>>
>>>>
>>>> Glad we're on the same page with regard to what's possible in TXO
>>>> commitments.
>>>>
>>>> Secondly, am I correct in saying your UTXO commitments scheme requires
>>>> random
>>>> access? While you describe it as a "merkle set", obviously to be
>> merkelized
>>>> it'll have to have an ordering of some kind. What do you propose that
>>>> ordering
>>>> to be?
>>>>
>>>
>>> The ordering is by the bits in the hash. Technically it's a Patricia
>> Trie.
>>> I'm using 'merkle tree' to refer to basically anything with a hash root.
>>
>> The hash of what? The values in the set?
>>
>>>> Maybe more specifically, what exact values do you propose to be in the
>> set?
>>>>
>>>>
>>> That is unspecified in the implementation, it just takes a 256 bit value
>>> which is presumably a hash of something. The intention is to nail down a
>>> simple format and demonstrate good performance and leave those semantics
>> to
>>> a higher layer. The simplest thing would be to hash together the txid and
>>> output number.
>>
>> Ok, so let's assume the values in the set are the unspent outpoints.
>>
>> Since we're ordering by the hash of the values in the set, outpoints will
>> be
>> distributed uniformly in the set, and thus the access pattern of data in
>> the
>> set is uniform.
>>
>> Now let's fast-forward 10 years. For the sake of argument, assume that for
>> every 1 UTXO in the set that corresponds to funds in someone's wallet that
>> are
>> likely to be spent, there are 2^12 = 4096 UTXO's that have been permanently
>> lost (and/or created in spam attacks) and thus will never be spent.
>>
>> Since lost UTXO's are *also* uniformly distributed, if I'm processing a new
>> block that spends 2^12 = 4096 UTXO's, on average for each UTXO spent, I'll
>> have to update log2(4096) = 12 more digests than I would have had those
>> "dead"
>> UTXO's not existed.
>>
>> Concretely, imagine our UTXO set had just 8 values in it, and we were
>> updating
>> two of them:
>>
>> #
>> / \
>> / \
>> / \
>> / \
>> / \
>> # #
>> / \ / \
>> / \ / \
>> # . . #
>> / \ / \ / \ / \
>> . X . . . . X .
>>
>> To mark two coins as spent, we've had to update 5 inner nodes.
>>
>>
>> Now let's look at what happens in an insertion-ordered TXO commitment
>> scheme.
>> For sake of argument, let's assume the best possible case, where every UTXO
>> spent in that same block was recently created. Since the UTXO's are
>> recently
>> created, chances are almost every single one of those "dead" UTXO's will
>> have
>> been created in the past. Thus, since this is an insertion-ordered data
>> structure, those UTXO's exist in an older part of the data structure that
>> our
>> new block doesn't need to modify at all.
>>
>> Concretely, again let's imagine a TXO commitment with 8 values in it, and
>> two
>> of them being spent:
>>
>> #
>> / \
>> / \
>> / \
>> / \
>> / \
>> . #
>> / \ / \
>> / \ / \
>> . . . #
>> / \ / \ / \ / \
>> . . . . . . X X
>>
>> To mark two coins as spent, we've only had to update 3 inner nodes; while
>> our
>> tree is higher with those lost coins, those extra inner nodes are amortised
>> across all the coins we have to update.
>>
>>
>> The situation gets even better when we look at the *new* UTXO's that our
>> block
>> creates. Suppose our UTXO set has size n. To mark a single coin as spent,
>> we
>> have to update log2(n) inner nodes. We do get to amortise this a bit at
>> the top
>> levels in the tree, but even if we assume the amortisation is totally free,
>> we're updating at least log2(n) - log2(m) inner nodes "under" the amortised
>> nodes at the top of the tree for *each* new node.
>>
>> Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to
>> the
>> data set goes in the same place - the end. So almost none of the existing
>> data
>> needs to be touched to add the new UTXOs. Equally, the hashing required
>> for the
>> new UTXO's can be done in an incremental fashion that's very L1/L2 cache
>> friendly.
>>
>>
>> tl;dr: Precisely because access patterns in TXO commitments are *not*
>> uniform,
>> I think we'll find that from a L1/L2/etc cache perspective alone, TXO
>> commitments will result in better performance than UTXO commitments.
>>
>>
>> Now it is true that Bitcoin's current design means we'll need a map of
>> confirmed outpoints to TXO insertion order indexes. But it's not
>> particularly
>> hard to add that "metadata" to transactions on the P2P layer in the same
>> way
>> that segwit added witnesses to transactions without modifying how txids
>> were
>> calculated; if you only connect to peers who provide you with TXO index
>> information in blocks and transactions, you don't need to keep that map
>> yourself.
>>
>> Finally, note how this makes transactions *smaller* in many circumstances:
>> it's
>> just a 8-byte max index rather than a 40 byte outpoint.
>>
>> --
>> https://petertodd.org 'peter'[:-1]@petertodd.org
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20170224/63ab2731/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
> End of bitcoin-dev Digest, Vol 21, Issue 34
> *******************************************
next parent reply other threads:[~2017-02-24 23:49 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.22137.1487974823.31141.bitcoin-dev@lists.linuxfoundation.org>
2017-02-24 23:49 ` Steve Davis [this message]
2017-02-25 1:01 ` [bitcoin-dev] SHA1 collisions make Git vulnerable to attakcs by third-parties, not just repo maintainers Peter Todd
2017-02-25 12:04 ` Steve Davis
2017-02-25 14:50 ` Leandro Coutinho
2017-02-25 16:10 ` Ethan Heilman
2017-02-25 17:45 ` Shin'ichiro Matsuo
2017-02-27 9:15 ` Henning Kopp
2017-02-25 18:19 ` Alice Wonder
2017-02-25 18:36 ` Ethan Heilman
2017-02-25 19:12 ` Peter Todd
2017-02-25 20:42 ` Watson Ladd
2017-02-25 20:57 ` Peter Todd
2017-02-25 20:53 ` Russell O'Connor
2017-02-25 21:04 ` Peter Todd
2017-02-25 21:21 ` Dave Scotese
2017-02-25 21:34 ` Steve Davis
2017-02-25 21:40 ` Peter Todd
2017-02-25 21:54 ` Steve Davis
2017-02-25 22:14 ` Pieter Wuille
2017-02-25 22:34 ` Ethan Heilman
2017-02-26 6:26 ` Steve Davis
2017-02-26 6:36 ` Pieter Wuille
2017-02-26 7:16 ` Steve Davis
[not found] ` <CAPg+sBirowtHqUT5GUJf9hmDEACKVX19HAon-rrz7GmO8OBsNg@mail.gmail.com>
2017-02-26 16:53 ` Steve Davis
2017-02-25 23:09 ` Leandro Coutinho
2017-02-23 18:14 Peter Todd
2017-02-23 21:28 ` Peter Todd
2017-02-23 23:57 ` Aymeric Vitte
2017-02-24 10:04 ` Tim Ruffing
2017-02-24 15:18 ` Aymeric Vitte
2017-02-24 16:30 ` Tim Ruffing
2017-02-24 17:29 ` Aymeric Vitte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8F096BE1-D305-43D4-AF10-2CC48837B14F@gmail.com \
--to=steven.charles.davis@gmail.com \
--cc=bitcoin-dev@lists.linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox