From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <nothingmuch@woobling.org>
Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136])
 by lists.linuxfoundation.org (Postfix) with ESMTP id E5AA5C0881
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sun, 29 Dec 2019 03:39:01 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by silver.osuosl.org (Postfix) with ESMTP id D4D74203D7
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sun, 29 Dec 2019 03:39:01 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from silver.osuosl.org ([127.0.0.1])
 by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id h7hSGeNIEdDe
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sun, 29 Dec 2019 03:39:00 +0000 (UTC)
X-Greylist: delayed 00:06:54 by SQLgrey-1.7.6
Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com
 [209.85.216.47])
 by silver.osuosl.org (Postfix) with ESMTPS id EBCAC203BD
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sun, 29 Dec 2019 03:38:59 +0000 (UTC)
Received: by mail-pj1-f47.google.com with SMTP id a6so6477080pjh.2
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sat, 28 Dec 2019 19:38:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=woobling.org; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
 bh=R7pz7gJ0ir/vgViWmgPyoWHtNIql/PlQc28JIWsGXSc=;
 b=Ogz0Sps0nEbCwyf4pqZ15ipVHkAivUJ/Tf9bZVDGL9Q9QsZ2jD2aKMFXFbfn2yETrV
 VcwO4EF1skaMpEnrVN5Z+m90/SEKtRGZ1hXv4LUA1a92oopHJawavEtvMhCrYUBKqTja
 wlVEzaaUFC/ZWXvLB8hujl43hjYK3t5yXK9l8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=R7pz7gJ0ir/vgViWmgPyoWHtNIql/PlQc28JIWsGXSc=;
 b=dR3NYKQoLgaDQLvaWf4ZOtOf0JYnrSq2JEB8vaxGohTyvDcfFtuc2ujb3cQSZ9/jg5
 d9PqmX+GsCquGeyO06G/EHz3sEgmGZtl+l6Tm6mbSX0c68dZfmpHYfUq7MvO+epQJxpx
 3oYO8xF0uJW7YYP8Fl+TLuNiKb0hvpm6q2pPAOdeeWwfoZtfRL6c8o5prL43nL4ymtv3
 rJDaIi97nJJhsP+Md5dE+YkX13AYOlXluOe+4ycBU95/GX1orDpxDN71mctEMd0wz6/w
 +jaX6kaY2KEUsklGgN2w6B2mrfORZfemJksALRmqv6OHUpsTvY/IHAlLBclSBXUEfyMz
 zaKw==
X-Gm-Message-State: APjAAAUemRniBaEp7pEuTbCquUyaAQiBXMG99JjvHxDzpIxDG7jQmvHl
 kxBGYnziCPVpLy3b4c2rfn75HVG1v6ChskgL0OTXZw==
X-Google-Smtp-Source: APXvYqzh1QM3ZQGtnEy2PtjpNZxkeAZrzvNSsAHPhk/9QdAOXL6Dui4HmaRgCgUvkHc22rpZA8bYwkJtgQ1+rlNIKBg=
X-Received: by 2002:a17:90a:cb0f:: with SMTP id
 z15mr37471729pjt.131.1577590322921; 
 Sat, 28 Dec 2019 19:32:02 -0800 (PST)
MIME-Version: 1.0
References: <CAEPKjgdtgDbyLoj6FV+cjY1Djca_FBtd9Kt_eB4zWU+at=wfYQ@mail.gmail.com>
In-Reply-To: <CAEPKjgdtgDbyLoj6FV+cjY1Djca_FBtd9Kt_eB4zWU+at=wfYQ@mail.gmail.com>
From: Yuval Kogman <nothingmuch@woobling.org>
Date: Sun, 29 Dec 2019 03:31:48 +0000
Message-ID: <CAAQdECBqFKxAoZXCkWynN4wj5g8C9vzdhuEWk9b-BYqDW=us6g@mail.gmail.com>
To: nopara73 <adam.ficsor73@gmail.com>, 
 Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="000000000000cc92a8059acf5ea4"
X-Mailman-Approved-At: Sun, 29 Dec 2019 04:18:36 +0000
Subject: Re: [bitcoin-dev] Non-equal value CoinJoins. Opinions.
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Dec 2019 03:39:02 -0000

--000000000000cc92a8059acf5ea4
Content-Type: text/plain; charset="UTF-8"

Hi,

On Sat, 28 Dec 2019 at 01:29, nopara73 via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

I haven't read the whole thing in detail (and fwiw, I don't think I will by
this point), but I do want to respond to section about the combinatorics as
well as the proof, since both the premises and the implications don't seem
very solid to me, especially in light of the other replies in this thread.

It appears to be a step up from the Knapsack paper in terms of the
specificity of a concrete mixing protocol (which again, I did not
scrutinize, but see below), but a regression in terms of privacy (see other
replies), which even in the Knapsack paper's approach raises some concerns:

Now, there are 100!/(10!)^10 ~= 10^92 ways to partition the inputs into a
> list of 10 sets of 10 inputs, but only a tiny fraction of these partitions
> will produce the precise output list.
>
In the equal amount case, the search space of possible interpretations with
n = # inputs + # indistinguishable outputs is proportional to the nth Bell
number, i.e. it's exponential in the size of the transaction, which is an
inviting intuition. But this is an *upper* bound on the difficulty of
deanonymization, given no additional information.

This quantitative framing is potentially misleading because:

1. attributing inputs/outputs (sub-transactions in the Knapsack paper's
terminology) is arguably not a search problem, but an optimization problem,
since approximate results are still partly useful to the adversary
2. there are many computational strategies, heuristics, etc that in
practice can make this more efficient than brute force[1], so framing it
that as a security parameter doesn't sit right with me
3. as individual sub-transactions are identified (for example using out of
band information), the computational challenge also *drops* exponentially
fast

Additionally (though is a broader criticism of CoinJoin based privacy and
not specific to unequal amounts, and in particular refers to ZmnSCPxj's
assertion of 0 linkability) I am very worried that perspectives that focus
on linkability information revealed by a single coinjoin transaction in
isolation. This problem was alluded in the document, to but I don't see
that it was addressed. Naively the post/pre mix transaction graph would
seem to present a computationally much harder problem when looking at the
combinatorics through the same lens, but reality it can also be used to
place many constraints on valid partitions/sub-transaction assignments for
a single transaction with equal amounts. The trivial example is post mix
linking of outputs, but there are many other ways to draw inferences or
eliminate possible interpretations of a single transaction based on its
wider context, which in turn may be used to attack other transactions.


> Based on the example above, we can see that not only are there a huge
> number of partitions, but that even with a fast algorithm that could find
> matching partitions, it would produce around 10^20 possible valid
> configurations. With 10^20 possibilities, there is essentially no linkage.
>
This is a better framing, but still doesn't address my third bullet, since
"Attacks always get better; they never get worse." In other words
"essentially no linkage" due to multiple possible interpretation is still
strictly more meaningful if you can add constraints out of band.

To be fair in equal amount CoinJoins this is also the case, but it's a much
simpler model to consider in the context of other privacy leak vectors
(e.g. transaction graph connectivity beyond a single coinjoin, wallet
fingerprinting, temporal patterns, network privacy leaks, etc etc), since
analyzing your level of exposure is *also* complicated by unequal amounts,
in other words higher chance of privacy leaks due to misuse, or ignorance
of some of the implications under intended use. Thinking through these
implications is much easier when the information content in the amounts is
minimized.

The Cash Fusion scheme actually extends this obfuscation even further. Not
> only can players bring many inputs, they can also have multiple outputs
>
And, quoting another section:

Unfortunately, the production of equal-amount coins is impractical for
> various reasons. Foremost, it has a "toxic waste"
>

I'm still cautiously optimistic about the potential of multiple
inputs/outputs per user (c.f. 3-phase chaumian CoinJoin ideas we've
previously discussed in the context of Wasabi, though I don't recall any
public discussion I can link to, sorry list), but with the additional
assumption of amounts with small popcounts/Hamming weights (e.g. only
amounts that are 2^n sat in size, or based on 1-2-5 series, and for a
rationale see Ethan's reply).

Unfortunately this trades off that "toxic waste" problem for a very large
on chain footprint (e.g. if the popcount of the amount of a wallet is
limited to 1, the number of inputs and change outputs required in the worst
case is proportional to log of the payment amount) and significant UTXO
bloat (several mixed outputs per magnitude for transaction size to scale as
the popcount(payment amount) instead of the log(payment amount))

However, with OP_CHECKTEMPLATEVERIFY and Taproot, this overhead could
potentially be mitigated (something more like TumbleBit's privacy model,
but with an on chain footprint similar to multiparty payment channels as
described in the OP_CTV BIP draft) since the safety guaranteed by
CoinJoins' atomicity can be preserved without requiring atomicity of the
mixing itself, which can extend over multiple transactions and long time
intervals, while still providing the liquidity and finality and unilateral
exit option of payment channels. By moving such low hamming weight amount
outputs off chain, and allowing them to be mixed (with equal amounts),
split and merged off chain. The simpler analysis of equal amount outputs
could still be assumed which makes analysis easier and assumptions about
adversary weaker, and furthermore this approach would better align the
incentives for batching and privacy, which is why I think it's very
promising.

Finally, the proof as well as its applicability seems suspect to me, since
seems to involve trusting the server:
"Since the distinct list [...] [is] kept on the server and not shared with
the players"
"The server knows the linkages of the commitments but does not participate
as a verifier "
"If there is a problem [...] each component is assigned to another player
at random for verification"
these 3 statements together seems to suggest the server is trusted to not
use sybils in order the compromise privacy by participating in the
verification process?

(also, and although this is nitpicking, also does not seem to be
demonstrating "soundness" in any sense that I am familiar with - wouldn't
break in soundness only imply DoS vector?)

[1] btw, although I still owe you (nopara73) some consistently working
linear programming code to attribute change outputs in Wasabi (sorry! i
suck...), but anecdotally, in the special cases I did manage to solve even
transactions with tens of inputs do not present a challenge even to
supposedly slow/inefficient solvers compared to the state of the art. Even
though it's just anecdotal and in the context of the easier problem of
attributing change, which can still be ambiguous in Wasabi transactions, i
still think this puts into question the privacy of mixing based on a
dubious hardness assumption and a computationally bounded adversary, as
opposed to something which (in the scope of a single mixing transaction) is
perfectly hiding.

--000000000000cc92a8059acf5ea4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Hi,<br></div><br><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Sat, 28 Dec 2019 at 01:29, nopara73 vi=
a bitcoin-dev &lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" =
target=3D"_blank">bitcoin-dev@lists.linuxfoundation.org</a>&gt; wrote:</div=
><div dir=3D"ltr" class=3D"gmail_attr"><br></div><div dir=3D"ltr">I haven&#=
39;t read the whole thing in detail (and fwiw, I don&#39;t think I will by =
this point), but I do want to respond to section about the combinatorics as=
 well as the proof, since both the premises and the implications don&#39;t =
seem very solid to me, especially in light of the other replies in this thr=
ead.</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">It appears to be a st=
ep up from the Knapsack paper in terms of the specificity of a concrete mix=
ing protocol (which again, I did not scrutinize, but see below), but a regr=
ession in terms of privacy (see other replies), which even in the Knapsack =
paper&#39;s approach raises some concerns:</div><div dir=3D"ltr"><br><block=
quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1=
px solid rgb(204,204,204);padding-left:1ex"><div><p style=3D"box-sizing:bor=
der-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-=
apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Helvetica,Arial,sans-s=
erif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;;font-size:16p=
x">Now, there are 100!/(10!)^10 ~=3D 10^92 ways to partition the inputs int=
o a list of 10 sets of 10 inputs, but only a tiny fraction of these partiti=
ons will produce the precise output list.</p></div></blockquote></div><div>=
In the equal amount case, the search space of possible interpretations with=
 n =3D # inputs + # indistinguishable outputs is proportional to the nth Be=
ll number, i.e. it&#39;s exponential in the size of the transaction, which =
is an inviting intuition. But this is an *upper* bound on the difficulty of=
 deanonymization, given no additional information.<br></div><div><br></div>=
<div>This quantitative framing is potentially misleading because:<br></div>=
<div><br></div><div>1. attributing inputs/outputs (sub-transactions in the =
Knapsack paper&#39;s terminology) is arguably not a search problem, but an =
optimization problem, since approximate results are still partly useful to =
the adversary</div><div>2. there are many computational strategies, heurist=
ics, etc that in practice can make this more efficient than brute force[1],=
 so framing it that as a security parameter doesn&#39;t sit right with me<b=
r></div><div>3. as individual sub-transactions are identified (for example =
using out of band information), the computational challenge also *drops* ex=
ponentially fast<br></div><div><br></div><div>Additionally (though is a bro=
ader criticism of CoinJoin based privacy and not specific to unequal amount=
s, and in particular refers to ZmnSCPxj&#39;s assertion of 0 linkability) I=
 am very worried that perspectives that focus on linkability information re=
vealed by a single coinjoin transaction in isolation. This problem was allu=
ded in the document, to but I don&#39;t see that it was addressed. Naively =
the post/pre mix transaction graph would seem to present a computationally =
much harder problem when looking at the combinatorics through the same lens=
, but reality it can also be used to place many constraints on valid partit=
ions/sub-transaction assignments for a single transaction with equal amount=
s. The trivial example is post mix linking of outputs, but there are many o=
ther ways to draw inferences or eliminate possible interpretations of a sin=
gle transaction based on its wider context, which in turn may be used to at=
tack other transactions.<br></div><div>=C2=A0</div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2=
04,204);padding-left:1ex"><div dir=3D"ltr"><div><p style=3D"box-sizing:bord=
er-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-a=
pple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Helvetica,Arial,sans-se=
rif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;;font-size:16px=
">Based on the example above, we can see that not only are there a huge num=
ber of partitions, but that even with a fast algorithm that could find matc=
hing partitions, it would produce around 10^20 possible valid configuration=
s. With 10^20 possibilities, there is essentially no linkage.</p></div></di=
v></blockquote><div>This is a better framing, but still doesn&#39;t address=
 my third bullet, since &quot;Attacks always get better; they never get wor=
se.&quot; In other words &quot;essentially no linkage&quot; due to multiple=
 possible interpretation is still strictly more meaningful if you can add c=
onstraints out of band.</div><div><br></div><div>To be fair in equal amount=
 CoinJoins this is also the case, but it&#39;s a much simpler model to cons=
ider in the context of other privacy leak vectors (e.g. transaction graph c=
onnectivity beyond a single coinjoin, wallet fingerprinting, temporal patte=
rns, network privacy leaks, etc etc), since analyzing your level of exposur=
e is *also* complicated by unequal amounts, in other words higher chance of=
 privacy leaks due to misuse, or ignorance of some of the implications unde=
r intended use. Thinking through these implications is much easier when the=
 information content in the amounts is minimized.</div><div><br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:=
1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><p style=
=3D"box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41=
,46);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Helv=
etica,Arial,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&q=
uot;;font-size:16px">The Cash Fusion scheme actually extends this obfuscati=
on even further. Not only can players bring many inputs, they can also have=
 multiple outputs</p></div></div></blockquote><div><div class=3D"gmail_quot=
e">And, quoting another section:</div><div class=3D"gmail_quote"><br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"gmail_quote=
">Unfortunately, the production of equal-amount coins is impractical for va=
rious reasons. Foremost, it has a &quot;toxic waste&quot; </div></blockquot=
e></div><div>=C2=A0</div>I&#39;m still cautiously optimistic about the pote=
ntial of multiple inputs/outputs per user (c.f. 3-phase chaumian CoinJoin i=
deas we&#39;ve previously discussed in the context of Wasabi, though I don&=
#39;t recall any public discussion I can link to, sorry list), but with the=
 additional assumption of amounts with small popcounts/Hamming weights (e.g=
. only amounts that are 2^n sat in size, or based on 1-2-5 series, and for =
a rationale see Ethan&#39;s reply).</div><div class=3D"gmail_quote"><br></d=
iv><div class=3D"gmail_quote">Unfortunately this trades off that &quot;toxi=
c waste&quot; problem for a very large on chain footprint (e.g. if the popc=
ount of the amount of a wallet is limited to 1, the number of inputs and ch=
ange outputs required in the worst case is proportional to log of the payme=
nt amount) and significant UTXO bloat (several mixed outputs per magnitude =
for transaction size to scale as the popcount(payment amount) instead of th=
e log(payment amount))<br></div><div class=3D"gmail_quote"><br></div><div c=
lass=3D"gmail_quote">However, with OP_CHECKTEMPLATEVERIFY and Taproot, this=
 overhead could potentially be mitigated (something more like TumbleBit&#39=
;s privacy model, but with an on chain footprint similar to multiparty paym=
ent channels as described in the OP_CTV BIP draft) since the safety guarant=
eed by CoinJoins&#39; atomicity can be preserved without requiring atomicit=
y of the mixing itself, which can extend over multiple transactions and lon=
g time intervals, while still providing the liquidity and finality and unil=
ateral exit option of payment channels. By moving such low hamming weight a=
mount outputs off chain, and allowing them to be mixed (with equal amounts)=
, split and merged off chain. The simpler analysis of equal amount outputs =
could still be assumed which makes analysis easier and assumptions about ad=
versary weaker, and furthermore this approach would better align the incent=
ives for batching and privacy, which is why I think it&#39;s very promising=
.</div></div><div><div class=3D"gmail_quote"><div>=C2=A0</div><div>Finally,=
 the proof as well as its applicability seems suspect to me, since seems to=
 involve trusting the server:</div><div>&quot;Since the distinct list [...]=
 [is] kept
 on the server and not shared with the players&quot;</div><div>&quot;The se=
rver knows the linkages of the commitments but does not participate as a ve=
rifier &quot;</div><div>&quot;If there is a problem [...] each component is=
 assigned to another player at random for verification&quot;</div><div>thes=
e 3 statements together seems to suggest the server is trusted to not use s=
ybils in order the compromise privacy by participating in the verification =
process?</div><div><br></div><div>(also, and although this is nitpicking, a=
lso does not seem to be demonstrating &quot;soundness&quot; in any sense th=
at I am familiar with - wouldn&#39;t break in soundness only imply DoS vect=
or?)<br></div><div><br><div>[1] btw, although I still owe you (nopara73) so=
me consistently working linear programming code to attribute change outputs=
 in Wasabi (sorry! i suck...), but anecdotally, in the special cases I did =
manage to solve even transactions with tens of inputs do not present a chal=
lenge even to supposedly slow/inefficient solvers compared to the state of =
the art. Even though it&#39;s just anecdotal and in the context of the easi=
er problem of attributing change, which can still be ambiguous in Wasabi tr=
ansactions, i still think this puts into question the privacy of mixing bas=
ed on a dubious hardness assumption and a computationally bounded adversary=
, as opposed to something which (in the scope of a single mixing transactio=
n) is perfectly hiding.<br></div></div></div></div></div>

--000000000000cc92a8059acf5ea4--