public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] Client side coinjoin amount organization with WabiSabi
@ 2022-04-06 16:05 Max Hillebrand
  2022-04-09  6:21 ` [bitcoin-dev] ***UNCHECKED*** " Max Hillebrand
  0 siblings, 1 reply; 2+ messages in thread
From: Max Hillebrand @ 2022-04-06 16:05 UTC (permalink / raw)
  To: bitcoin-dev

Hello list,

tl;dr: client side coinjoin amount organization is bloody difficult. Our 
current approach: pick random number of inputs based on wallet utxo 
count; pick that group of inputs which result in the lowest anonscore 
consolidation penalty; generate deterministic frequency table as 
Schelling point; brute force decompose input sum into likely 
denominations and pick randomly one of the good ones.


In previous coinjoin implementations, round parameters like the equal 
denomination are dictated by the coordinator. This is in part because of 
the design constraints of the Chaumian blind signature coordination 
protocol. Given knowledge of the input sum of a user, an adversary can 
find out which denominations the user received, even though it is more 
difficult to find out exactly which equal amount output coin was 
received. Furthermore, this leads to a worse usability as well as more 
blockspace consumption. However, the coordinator can enforce for 
example, that every user ends up in the same denomination, and thus a 
very large anonymity set is achieved.

This can be improved by using a coinjoin coordination protocol like 
WabiSabi with less constraints, specifically no input-input linkage, and 
arbitrary input/output amount registration. Now the coordinator does not 
dictates round parameters like minimum equal amount denomination nor the 
decomposition algorithm used. The idea is to make more decisions client 
side, without substantially sacrificing the privacy guarantees and 
anonymity set size of outputs.

This turns out to be a quite difficult problem. I will try my best to 
explain the approach that is currently implemented in Wasabi Wallet's 
third release candidate. The code is linked below, sorry in advance for 
any discrepancy or confusion in my explanation. Even though the results 
seem to be alright, this is probably not the optimal approach, so I 
kindly ask you grey-bearded Bitcoin wizards to review, break and improve it.


## Input Selection

First, the client finds out how many coins to select in this round. This 
is a random choice between the numbers 1 and 10. However, if the wallet 
currently has less than 35 utxos, there is a preference of choosing 1. 
If the wallet has more than 125 utxos, there is a preference of choosing 
10. With a gradient in between. This is to control the utxo count of the 
wallet. Noticeably this does not take into account the sats amount in 
the utxo set, so a user with 0.1 btc will behave the same as one with 
1000 btc. Maybe the target utxo count should be adjusted based on value.

Next, the question of which coins to register: Ideally, those coins 
which result in the least anonscore loss possible. Shuffle all suitable 
utxos [i.e. confirmed, below max anonscore target etc], and sort them 
ascending by anonscore, then descending by amount. Now create groups 
with the size of the previously established input count X. The first 
coin until the X coin of the sorted list are the first group, then shift 
one down, so the second group is the second coin until the X+1 coin. Do 
these "rolling groups" all the way to the bottom of the list. This way, 
coins which have a anonscore close to each other are selected.

Remove those groups which have many coins coming from the same transaction.

For each group, calculate the anonscore cost of input consolidation 
weighted by amount. If the selected coins have anonscore 3, 5 and 10, 
then the group has a anonscore of 3. The input with 10 anonscore thus 
has a 7 anonscore cost. Now weight this to the input value of the 
relevant coin in the group, so that a loss of anonscore in a high value 
coin is more costly than if it were a low value coin.

Pick that input group with the lowest weighted anonscore cost.

There is randomness in the number of inputs chosen, but the selection of 
the best coin group is deterministic. Maybe there can be some randomness 
in the final group selection, without suffering from too much anonscore 
consolidation penalty.

One additional idea [which is not yet implemented] is that the 
coordinator suggests [not dictates] a maximum input value, which changes 
across different rounds. Large value inputs are not considered suitable, 
if the maximum suggested input value of the current round is smaller.

It is important to note that currently users choose their inputs without 
knowing the inputs that other users have already registered. It should 
be possible to design the protocol in a way to share the inputs that 
were already registered, even if input registration is not yet complete. 
There are however some privacy concerns, like timing attacks, or 
de-registration of an input after it was announced to other users.


## Output Selection

The coordinator collects all input registrations, and forwards them to 
all users. At this point, all clients knows all inputs of this 
transaction. The goal now is to get a Schelling point among users of 
which output denominations to choose, so that the anonset size of each 
denomination is sufficiently large.

First of all, it's a good idea to limit the denominations that the 
client will register. Some simulations confirmed that low Hemming weight 
numbers are efficient, thus clients generate a list of standard 
denominations which are: powers of two; powers of three; two times 
powers of three; powers of ten; two times powers of ten; and five times 
powers of ten. However, remove some of those denominations which are 
very close to each other, more so for larger values. Notice that this 
list of standard denominations is the same across all rounds, it does 
not depend on specific inputs.

We can further decrease the list of potential denominations that the 
client chooses, but specifically for every round. This is a further 
Schelling point of which denominations the client prefers to choose. 
This is done with a deterministic frequency table, based on the inputs 
of the proposed transaction.

Take each input and greedily decompose it into the standard 
denominations, meaning every input has precisely one decomposition. [45 
decomposes greedily into 32+10+3] Count the occurrences of every 
standard denomination into a frequency table. All those standard 
denominations, which have a count of 2 or larger, are considered likely 
denominations.

Notice that currently we remove the largest input from this frequency 
table calculation. This is so that the whale does not mix alone by 
himself. Also notice that individual inputs, and not input sums are 
decomposed. This is because we found that generating the frequency table 
based on only one input leads to a more accurate Schelling point, which 
increases anonset count of the finally chosen denominations. Finally, 
notice that we only calculate one single decomposition for each input, 
the greedy one, but we could also calculate multiple different [or all 
possible] decompositions for each input, thus generate a larger 
frequency table and more likely denominations.

Whereas the frequency table should be deterministic as a Schelling 
point, the actual user's input sum must not be deterministically 
decomposed, otherwise an adversary who knows the input sum would find 
out which denominations the client chose. [but not which of the equal 
outputs he got]

The client takes his input sum [minus fees] and brute-force decomposes 
into all possible groups of the likely denominations [those with high 
count in this rounds' frequency table]. We found that in most cases, 
even with this reduced list of likely denominations, any input sum can 
be decomposed into up to eight outputs. [Sometimes the wealthiest user 
gets a non-standard change amount] However, each decomposition has some 
small amount of sats left over, which is is not put into an output 
value, but instead pays miner fees.

Sort this list of all possible output groups ascending by leftover 
amount, and remove those groups which have a leftover amount 1.3x larger 
than the best option. Further, remove a group if it has a similar 
largest denomination as another one. [So far everything deterministic, 
given all coinjoin inputs and the users' input sum]

Out of this shorter list of output amount groups, shuffle and pick 
randomly one of them. These are non-deterministic denominations which 
will be registered for the actual coinjoin outputs. If there were no 
shuffle, but a selection of the amount group with the lowest loss, users 
would save sats. But arguably having this randomness here increases 
privacy sufficiently to justify the slight increase in leftover amount cost.

Again, while choosing their own outputs, clients do not know which 
outputs other clients registered. If the client would have this 
information, it could possibly increase the quality of it's own output 
registration substantially.

Notice there is a different decomposition strategies for the frequency 
table [greedy] and the input sum [brute-force all]. Maybe, having the 
same decomposition strategy here would lead to better results.

Notice further that there is no rank ordering of the possible 
denominations based on some ambiguity score or entropy score. Rather, 
the choice is random, and in some cases, this might result in not 
optimal outcomes.


Here are some results of our simulation of the current algorithm:

50 inputs 15 users

Median output count:    98
Median change count:    4
Median change percent:  3.2
Median out anonsets:    3.5
Median leftovers:       481

300 inputs 70 users

Median output count:    442
Median change count:    0.5
Median change percent:  0.3
Median out anonsets:    9.6
Median leftovers:       394


Thank you for your consideration to review!

Skol
Max


Third Wasabi 2.0 Release Candidate: 
https://github.com/zkSNACKs/WalletWasabi/releases/tag/v1.98.2.0

Input selection code: 
https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/CoinJoinClient.cs#L366-L492

Amount decomposer code: 
https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/AmountDecomposer.cs
https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/Decomposer.cs 


Decomposition simulation: https://github.com/nopara73/sake




^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [bitcoin-dev] ***UNCHECKED*** Client side coinjoin amount organization with WabiSabi
  2022-04-06 16:05 [bitcoin-dev] Client side coinjoin amount organization with WabiSabi Max Hillebrand
@ 2022-04-09  6:21 ` Max Hillebrand
  0 siblings, 0 replies; 2+ messages in thread
From: Max Hillebrand @ 2022-04-09  6:21 UTC (permalink / raw)
  To: Max Hillebrand via bitcoin-dev

As expected, I sent some wrong explanations regarding input selection. 
The coin grouping and consolidation penalty seems to be correct, but I 
was wrong about the final best group selection. Let me try to correct this.

There are often many groups which have the same consolidation penalty. 
In one testnet example, a 1 btc wallet with a 20% privacy level had 64 
coins, and when tasked to find groups of 4 coins, it found 20 groups, 
which all had exactly 0 anonscore consolidation penalty, meaning all 
inputs had the same anonscore. All groups with the lowest consolidation 
penalty advance to the next step. Notice however, there could be only 
one group with the lowest penalty, then the following would be 
deterministic.

For all groups with the lowest consolidation penalty, we find out how 
many of its coins come from the same previous transaction. The list of 
groups gets shuffled, then sorted ascending by count of same transaction 
coins, and we pick the top one. There will likely be many groups with no 
same transaction inputs, and as the list is shuffled, we pick randomly 
one of them.

To summarize, the input count is a biased random choice. In some cases, 
especially for wallets with low utxo count, there is only one good 
group, so the input selection is deterministic. However, often there are 
many possible input groups with low consolidation penalty and low same 
transaction count, and in these cases there is another random choice of 
which inputs get registered. So even if the adversary knows the entire 
wallets utxo set and anonscore, in many cases he will not be able to 
find out which inputs will be selected in the next round.

The big question is, if we should try to protect optimally against such 
an adversary, especially if the defense strategy comes at extra 
blockspace cost. If yes, we can add further ambiguity, by not only 
creating these "rolling groups", but creating groups with random inputs, 
or even brute-forcing all possible groups [with some time-out].


Static link: 
https://github.com/zkSNACKs/WalletWasabi/blob/8016404503bdffa475d8b219a6fe019a1d5775aa/WalletWasabi/WabiSabi/Client/CoinJoinClient.cs#L366-L433

WIP max suggested input value: 
https://github.com/zkSNACKs/WalletWasabi/pull/7748


On 4/6/22 18:05, Max Hillebrand via bitcoin-dev wrote:
> Hello list,
>
> tl;dr: client side coinjoin amount organization is bloody difficult. 
> Our current approach: pick random number of inputs based on wallet 
> utxo count; pick that group of inputs which result in the lowest 
> anonscore consolidation penalty; generate deterministic frequency 
> table as Schelling point; brute force decompose input sum into likely 
> denominations and pick randomly one of the good ones.
>
>
> In previous coinjoin implementations, round parameters like the equal 
> denomination are dictated by the coordinator. This is in part because 
> of the design constraints of the Chaumian blind signature coordination 
> protocol. Given knowledge of the input sum of a user, an adversary can 
> find out which denominations the user received, even though it is more 
> difficult to find out exactly which equal amount output coin was 
> received. Furthermore, this leads to a worse usability as well as more 
> blockspace consumption. However, the coordinator can enforce for 
> example, that every user ends up in the same denomination, and thus a 
> very large anonymity set is achieved.
>
> This can be improved by using a coinjoin coordination protocol like 
> WabiSabi with less constraints, specifically no input-input linkage, 
> and arbitrary input/output amount registration. Now the coordinator 
> does not dictates round parameters like minimum equal amount 
> denomination nor the decomposition algorithm used. The idea is to make 
> more decisions client side, without substantially sacrificing the 
> privacy guarantees and anonymity set size of outputs.
>
> This turns out to be a quite difficult problem. I will try my best to 
> explain the approach that is currently implemented in Wasabi Wallet's 
> third release candidate. The code is linked below, sorry in advance 
> for any discrepancy or confusion in my explanation. Even though the 
> results seem to be alright, this is probably not the optimal approach, 
> so I kindly ask you grey-bearded Bitcoin wizards to review, break and 
> improve it.
>
>
> ## Input Selection
>
> First, the client finds out how many coins to select in this round. 
> This is a random choice between the numbers 1 and 10. However, if the 
> wallet currently has less than 35 utxos, there is a preference of 
> choosing 1. If the wallet has more than 125 utxos, there is a 
> preference of choosing 10. With a gradient in between. This is to 
> control the utxo count of the wallet. Noticeably this does not take 
> into account the sats amount in the utxo set, so a user with 0.1 btc 
> will behave the same as one with 1000 btc. Maybe the target utxo count 
> should be adjusted based on value.
>
> Next, the question of which coins to register: Ideally, those coins 
> which result in the least anonscore loss possible. Shuffle all 
> suitable utxos [i.e. confirmed, below max anonscore target etc], and 
> sort them ascending by anonscore, then descending by amount. Now 
> create groups with the size of the previously established input count 
> X. The first coin until the X coin of the sorted list are the first 
> group, then shift one down, so the second group is the second coin 
> until the X+1 coin. Do these "rolling groups" all the way to the 
> bottom of the list. This way, coins which have a anonscore close to 
> each other are selected.
>
> Remove those groups which have many coins coming from the same 
> transaction.
>
> For each group, calculate the anonscore cost of input consolidation 
> weighted by amount. If the selected coins have anonscore 3, 5 and 10, 
> then the group has a anonscore of 3. The input with 10 anonscore thus 
> has a 7 anonscore cost. Now weight this to the input value of the 
> relevant coin in the group, so that a loss of anonscore in a high 
> value coin is more costly than if it were a low value coin.
>
> Pick that input group with the lowest weighted anonscore cost.
>
> There is randomness in the number of inputs chosen, but the selection 
> of the best coin group is deterministic. Maybe there can be some 
> randomness in the final group selection, without suffering from too 
> much anonscore consolidation penalty.
>
> One additional idea [which is not yet implemented] is that the 
> coordinator suggests [not dictates] a maximum input value, which 
> changes across different rounds. Large value inputs are not considered 
> suitable, if the maximum suggested input value of the current round is 
> smaller.
>
> It is important to note that currently users choose their inputs 
> without knowing the inputs that other users have already registered. 
> It should be possible to design the protocol in a way to share the 
> inputs that were already registered, even if input registration is not 
> yet complete. There are however some privacy concerns, like timing 
> attacks, or de-registration of an input after it was announced to 
> other users.
>
>
> ## Output Selection
>
> The coordinator collects all input registrations, and forwards them to 
> all users. At this point, all clients knows all inputs of this 
> transaction. The goal now is to get a Schelling point among users of 
> which output denominations to choose, so that the anonset size of each 
> denomination is sufficiently large.
>
> First of all, it's a good idea to limit the denominations that the 
> client will register. Some simulations confirmed that low Hemming 
> weight numbers are efficient, thus clients generate a list of standard 
> denominations which are: powers of two; powers of three; two times 
> powers of three; powers of ten; two times powers of ten; and five 
> times powers of ten. However, remove some of those denominations which 
> are very close to each other, more so for larger values. Notice that 
> this list of standard denominations is the same across all rounds, it 
> does not depend on specific inputs.
>
> We can further decrease the list of potential denominations that the 
> client chooses, but specifically for every round. This is a further 
> Schelling point of which denominations the client prefers to choose. 
> This is done with a deterministic frequency table, based on the inputs 
> of the proposed transaction.
>
> Take each input and greedily decompose it into the standard 
> denominations, meaning every input has precisely one decomposition. 
> [45 decomposes greedily into 32+10+3] Count the occurrences of every 
> standard denomination into a frequency table. All those standard 
> denominations, which have a count of 2 or larger, are considered 
> likely denominations.
>
> Notice that currently we remove the largest input from this frequency 
> table calculation. This is so that the whale does not mix alone by 
> himself. Also notice that individual inputs, and not input sums are 
> decomposed. This is because we found that generating the frequency 
> table based on only one input leads to a more accurate Schelling 
> point, which increases anonset count of the finally chosen 
> denominations. Finally, notice that we only calculate one single 
> decomposition for each input, the greedy one, but we could also 
> calculate multiple different [or all possible] decompositions for each 
> input, thus generate a larger frequency table and more likely 
> denominations.
>
> Whereas the frequency table should be deterministic as a Schelling 
> point, the actual user's input sum must not be deterministically 
> decomposed, otherwise an adversary who knows the input sum would find 
> out which denominations the client chose. [but not which of the equal 
> outputs he got]
>
> The client takes his input sum [minus fees] and brute-force decomposes 
> into all possible groups of the likely denominations [those with high 
> count in this rounds' frequency table]. We found that in most cases, 
> even with this reduced list of likely denominations, any input sum can 
> be decomposed into up to eight outputs. [Sometimes the wealthiest user 
> gets a non-standard change amount] However, each decomposition has 
> some small amount of sats left over, which is is not put into an 
> output value, but instead pays miner fees.
>
> Sort this list of all possible output groups ascending by leftover 
> amount, and remove those groups which have a leftover amount 1.3x 
> larger than the best option. Further, remove a group if it has a 
> similar largest denomination as another one. [So far everything 
> deterministic, given all coinjoin inputs and the users' input sum]
>
> Out of this shorter list of output amount groups, shuffle and pick 
> randomly one of them. These are non-deterministic denominations which 
> will be registered for the actual coinjoin outputs. If there were no 
> shuffle, but a selection of the amount group with the lowest loss, 
> users would save sats. But arguably having this randomness here 
> increases privacy sufficiently to justify the slight increase in 
> leftover amount cost.
>
> Again, while choosing their own outputs, clients do not know which 
> outputs other clients registered. If the client would have this 
> information, it could possibly increase the quality of it's own output 
> registration substantially.
>
> Notice there is a different decomposition strategies for the frequency 
> table [greedy] and the input sum [brute-force all]. Maybe, having the 
> same decomposition strategy here would lead to better results.
>
> Notice further that there is no rank ordering of the possible 
> denominations based on some ambiguity score or entropy score. Rather, 
> the choice is random, and in some cases, this might result in not 
> optimal outcomes.
>
>
> Here are some results of our simulation of the current algorithm:
>
> 50 inputs 15 users
>
> Median output count:    98
> Median change count:    4
> Median change percent:  3.2
> Median out anonsets:    3.5
> Median leftovers:       481
>
> 300 inputs 70 users
>
> Median output count:    442
> Median change count:    0.5
> Median change percent:  0.3
> Median out anonsets:    9.6
> Median leftovers:       394
>
>
> Thank you for your consideration to review!
>
> Skol
> Max
>
>
> Third Wasabi 2.0 Release Candidate: 
> https://github.com/zkSNACKs/WalletWasabi/releases/tag/v1.98.2.0
>
> Input selection code: 
> https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/CoinJoinClient.cs#L366-L492
>
> Amount decomposer code: 
> https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/AmountDecomposer.cs
> https://github.com/zkSNACKs/WalletWasabi/blob/master/WalletWasabi/WabiSabi/Client/Decomposer.cs 
>
>
> Decomposition simulation: https://github.com/nopara73/sake
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-04-09  6:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 16:05 [bitcoin-dev] Client side coinjoin amount organization with WabiSabi Max Hillebrand
2022-04-09  6:21 ` [bitcoin-dev] ***UNCHECKED*** " Max Hillebrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox