[Bitcoin-development] Reworking the policy estimation code (fee estimates)

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] Reworking the policy estimation code (fee estimates)
@ 2014-10-27 19:33 Alex Morcos
  2014-10-28  9:55 ` Mike Hearn
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Alex Morcos @ 2014-10-27 19:33 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1.1: Type: text/plain, Size: 4989 bytes --]

I've been playing around with the code for estimating fees and found a few
issues with the existing code.   I think this will address several
observations that the estimates returned by the existing code appear to be
too high.  For instance see @cozz in Issue 4866
<https://github.com/bitcoin/bitcoin/issues/4866>.

Here's what I found:

1) We're trying to answer the question of what fee X you need in order to
be confirmed within Y blocks.   The existing code tries to do that by
calculating the median fee for each possible Y instead of gathering
statistics for each possible X.  That approach is statistically incorrect.
In fact since certain X's appear so frequently, they tend to dominate the
statistics at all possible Y's (a fee rate of about 40k satoshis)

2) The existing code then sorts all of the data points in all of the
buckets together by fee rate and then reassigns buckets before calculating
the medians for each confirmation bucket.  The sorting forces a
relationship where there might not be one.  Imagine some other variable,
such as first 2 bytes of the transaction hash.  If we sorted these and then
used them to give estimates, we'd see a clear but false relationship where
transactions with low starting bytes in their hashes took longer to confirm.

3) Transactions which don't have all their inputs available (because they
depend on other transactions in the mempool) aren't excluded from the
calculations.  This skews the results.

I rewrote the code to follow a different approach.  I divided all possible
fee rates up into fee rate buckets (I spaced these logarithmically).  For
each transaction that was confirmed, I updated the appropriate fee rate
bucket with how many blocks it took to confirm that transaction.

The hardest part of doing this fee estimation is to decide what the
question really is that we're trying to answer.  I took the approach that
if you are asking what fee rate I need to be confirmed within Y blocks,
then what you would like to know is the lowest fee rate such that a
relatively high percentage of transactions of that fee rate are confirmed
within Y blocks. Since even the highest fee transactions are confirmed
within the first block only 90-93% of the time, I decided to use 80% as my
cutoff.  So now to answer "estimatefee Y", I scan through all of the fee
buckets from the most expensive down until I find the last bucket with >80%
of the transactions confirmed within Y blocks.

Unfortunately we still have the problem of not having enough data points
for non-typical fee rates, and so it requires gathering a lot of data to
give reasonable answers. To keep all of these data points in a circular
buffer and then sort them for every analysis (or after every new block) is
expensive.  So instead I adopted the approach of keeping an exponentially
decaying moving average for each bucket.  I used a decay of .998 which
represents a half life of 374 blocks or about 2.5 days.  Also if a bucket
doesn't have very many transactions, I combine it with the next bucket.

Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code.  I
can create an actual pull request if there is consensus that it makes sense
to do so.

I've attached a graph comparing the estimates produced for 1-3
confirmations by the new code and the old code.  I did apply the patch to
fix issue 3 above to the old code first.  The new code is in green and the
fixed code is in purple.  The Y axis is a log scale of feerate in satoshis
per KB and the X axis is chain height.  The new code produces the same
estimates for 2 and 3 confirmations (the answers are effectively quantized
by bucket).

I've also completely reworked smartfees.py.  It turns out to require many
many more transactions are put through in order to have statistically
significant results, so the test is quite slow to run (about 3 mins on my
machine).

I've also been running a real world test, sending transactions of various
fee rates and seeing how long they took to get confirmed.  After almost 200
tx's at each fee rate, here are the results so far:

Fee rate 1100   Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528
2: 0.751 3: 0.870
Fee rate 2500   Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528
2: 0.766 3: 0.880
Fee rate 5000   Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528
2: 0.782 3: 0.891
Fee rate 10000  Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569
2: 0.844 3: 0.943
Fee rate 20000  Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715
2: 0.963 3: 0.989
Fee rate 30000  Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751
2: 0.974 3: 1.0
Fee rate 40000  Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792
2: 0.953 3: 0.994
Fee rate 60000  Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875
2: 1.0   3: 1.0
Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
2: 1.0   3: 1.0
Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886
2: 0.989 3: 1.0

Alex

[-- Attachment #1.2: Type: text/html, Size: 7499 bytes --]

[-- Attachment #2: Fee Estimation Comparison.jpg --]
[-- Type: image/jpeg, Size: 90822 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-27 19:33 [Bitcoin-development] Reworking the policy estimation code (fee estimates) Alex Morcos
@ 2014-10-28  9:55 ` Mike Hearn
  2014-10-28 12:12   ` Alex Morcos
  2014-10-28 13:59 ` Gavin Andresen
  2014-10-29 20:08 ` Peter Todd
  2 siblings, 1 reply; 9+ messages in thread
From: Mike Hearn @ 2014-10-28  9:55 UTC (permalink / raw)
  To: Alex Morcos; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 6313 bytes --]

Could you explain a little further why you think the current approach is
statistically incorrect? There's no doubt that the existing estimates the
system produces are garbage, but that's because it assumes players in the
fee market are rational and they are not.

Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of
only 1000 satoshis per kB by default. I also have a program that measures
confirmation time for a given fee level (with fresh coins so there's no
priority) and it aligns with your findings, most txns confirm within a
couple of blocks.

Ultimately there isn't any easy method to stop people throwing money away.
Bitcoinj will probably continue to use hard coded fee values for now to try
and contribute to market sanity in the hope it makes smartfees smarter.
On 27 Oct 2014 19:34, "Alex Morcos" <morcos@gmail.com> wrote:

> I've been playing around with the code for estimating fees and found a few
> issues with the existing code.   I think this will address several
> observations that the estimates returned by the existing code appear to be
> too high.  For instance see @cozz in Issue 4866
> <https://github.com/bitcoin/bitcoin/issues/4866>.
>
> Here's what I found:
>
> 1) We're trying to answer the question of what fee X you need in order to
> be confirmed within Y blocks.   The existing code tries to do that by
> calculating the median fee for each possible Y instead of gathering
> statistics for each possible X.  That approach is statistically incorrect.
> In fact since certain X's appear so frequently, they tend to dominate the
> statistics at all possible Y's (a fee rate of about 40k satoshis)
>
> 2) The existing code then sorts all of the data points in all of the
> buckets together by fee rate and then reassigns buckets before calculating
> the medians for each confirmation bucket.  The sorting forces a
> relationship where there might not be one.  Imagine some other variable,
> such as first 2 bytes of the transaction hash.  If we sorted these and then
> used them to give estimates, we'd see a clear but false relationship where
> transactions with low starting bytes in their hashes took longer to confirm.
>
> 3) Transactions which don't have all their inputs available (because they
> depend on other transactions in the mempool) aren't excluded from the
> calculations.  This skews the results.
>
> I rewrote the code to follow a different approach.  I divided all possible
> fee rates up into fee rate buckets (I spaced these logarithmically).  For
> each transaction that was confirmed, I updated the appropriate fee rate
> bucket with how many blocks it took to confirm that transaction.
>
> The hardest part of doing this fee estimation is to decide what the
> question really is that we're trying to answer.  I took the approach that
> if you are asking what fee rate I need to be confirmed within Y blocks,
> then what you would like to know is the lowest fee rate such that a
> relatively high percentage of transactions of that fee rate are confirmed
> within Y blocks. Since even the highest fee transactions are confirmed
> within the first block only 90-93% of the time, I decided to use 80% as my
> cutoff.  So now to answer "estimatefee Y", I scan through all of the fee
> buckets from the most expensive down until I find the last bucket with >80%
> of the transactions confirmed within Y blocks.
>
> Unfortunately we still have the problem of not having enough data points
> for non-typical fee rates, and so it requires gathering a lot of data to
> give reasonable answers. To keep all of these data points in a circular
> buffer and then sort them for every analysis (or after every new block) is
> expensive.  So instead I adopted the approach of keeping an exponentially
> decaying moving average for each bucket.  I used a decay of .998 which
> represents a half life of 374 blocks or about 2.5 days.  Also if a bucket
> doesn't have very many transactions, I combine it with the next bucket.
>
> Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code.  I
> can create an actual pull request if there is consensus that it makes sense
> to do so.
>
> I've attached a graph comparing the estimates produced for 1-3
> confirmations by the new code and the old code.  I did apply the patch to
> fix issue 3 above to the old code first.  The new code is in green and the
> fixed code is in purple.  The Y axis is a log scale of feerate in satoshis
> per KB and the X axis is chain height.  The new code produces the same
> estimates for 2 and 3 confirmations (the answers are effectively quantized
> by bucket).
>
> I've also completely reworked smartfees.py.  It turns out to require many
> many more transactions are put through in order to have statistically
> significant results, so the test is quite slow to run (about 3 mins on my
> machine).
>
> I've also been running a real world test, sending transactions of various
> fee rates and seeing how long they took to get confirmed.  After almost 200
> tx's at each fee rate, here are the results so far:
>
> Fee rate 1100   Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528
> 2: 0.751 3: 0.870
> Fee rate 2500   Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528
> 2: 0.766 3: 0.880
> Fee rate 5000   Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528
> 2: 0.782 3: 0.891
> Fee rate 10000  Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569
> 2: 0.844 3: 0.943
> Fee rate 20000  Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715
> 2: 0.963 3: 0.989
> Fee rate 30000  Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751
> 2: 0.974 3: 1.0
> Fee rate 40000  Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792
> 2: 0.953 3: 0.994
> Fee rate 60000  Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875
> 2: 1.0   3: 1.0
> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
> 2: 1.0   3: 1.0
> Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886
> 2: 0.989 3: 1.0
>
>
> Alex
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

[-- Attachment #2: Type: text/html, Size: 9143 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-28  9:55 ` Mike Hearn
@ 2014-10-28 12:12   ` Alex Morcos
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Morcos @ 2014-10-28 12:12 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 10382 bytes --]

Yeah, so to explain points 1 and 2 a bit more

1)  It's about what question you are trying to answer.  The existing code
tries to answer the question of what is the median fee of a transaction
that gets confirmed in Y blocks.  It turns out that is not a very good
proxy for the question we really want to know which is what is the fee that
is necessary such that we are likely to be confirmed within Y blocks.
What happens is that there are so many transactions of the 40k satoshis/kB
feerate that they turn out to be the dominant data points of transactions
that are confirmed after 2 blocks, 3 blocks, etc. and not only 1 block.

So for example.   A hypothetical sample of 20 txs might find 2 of your 1k
sat/kB txs and 18 of the 40k sat/kB txs.  Perhaps 15 of the 40k txs are
confirmed in 1 block and the other 3 in 2 blocks, and 1 of the 1k txs in 1
block and the other in 2 blocks.  So if you analyze the data by
confirmation time, you find that 15/16 1-conf txs are 40k and 3/4 2-conf
txs are 40k, so the median feerate is 40k for both 1 and 2 confirmations.
Instead, the correct thing to do is analyze the data by feerate.  Doing
that, we find that 15/18 (83%) of 40k txs are confirmed in 1 block and 1/2
(50%) 1k txs are.  But 100% of both are confirmed within two blocks.  This
leads you to say, you need 40k feerate if you want to get confirmed in 1
block but 1k is sufficient if you want to be confirmed in 2 blocks.

Put another way, Let's imagine you wanted to know how tall you have to be
no longer fit in the coach seats on an airplane.   If you looked at the
median height of all people in coach and all people in first class, you
would see that they were about the same, and you would get a confusing
answer.  Instead you have to bin by height, and look at the percentage of
people of each height that fly first-class vs coach, and I'd guess that by
the time you got up to say 6'8" you were finding greater than 50% of the
people flying first class.

2) The code also presupposes that higher fee rate transactions must be
confirmed quicker.  And so in addition to binning all transactions by
confirmation time, the code then sorts all of the transactions and re-bins
them such that the highest fee transactions are all in the 1-confirmation
bin and the lowest fee transactions are all in the 25-confirmation bin.  If
we'd been trying to predict whether the first 2 bytes of transaction hash
influenced our confirmation time, we would have started by having a random
distribution of hashes in each confirmation bin, but then after doing the
sorting, we'd of course have found that the "median hash" of the
1-confirmation transactions was higher, because we sorted it to make that
the case.

In the airplane example this would have been equivalent to taking the
median height of the 20 tallest people on the plane (assuming first class
is 20 seats) and saying that was the height for first class and the median
height of the remaining people for coach.  This will appear to give a
slightly better answer than the first approach, but is still wrong.

There are still a lot of additional improvements that can be made to fee
estimation.  One problem my proposed code has is there really just aren't
enough data points of low feerate transactions to give meaningful answers
about how likely those are to be confirmed, so its answers are still a bit
conservative.  This will improve though as the actual distribution of
transactions spreads out.    The other major needed improvement is to not
just state some description of what has happened in the past, but to
actually make a prediction about what is going to happen in the future.
For instance looking at the feerates of unconfirmed transactions currently
in the mempool could tell you that if you want to be confirmed immediately
you'll need to be high enough in that priority queue.

On Tue, Oct 28, 2014 at 5:55 AM, Mike Hearn <mike@plan99.net> wrote:

> Could you explain a little further why you think the current approach is
> statistically incorrect? There's no doubt that the existing estimates the
> system produces are garbage, but that's because it assumes players in the
> fee market are rational and they are not.
>
> Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of
> only 1000 satoshis per kB by default. I also have a program that measures
> confirmation time for a given fee level (with fresh coins so there's no
> priority) and it aligns with your findings, most txns confirm within a
> couple of blocks.
>
> Ultimately there isn't any easy method to stop people throwing money away.
> Bitcoinj will probably continue to use hard coded fee values for now to try
> and contribute to market sanity in the hope it makes smartfees smarter.
> On 27 Oct 2014 19:34, "Alex Morcos" <morcos@gmail.com> wrote:
>
>> I've been playing around with the code for estimating fees and found a
>> few issues with the existing code.   I think this will address several
>> observations that the estimates returned by the existing code appear to be
>> too high.  For instance see @cozz in Issue 4866
>> <https://github.com/bitcoin/bitcoin/issues/4866>.
>>
>> Here's what I found:
>>
>> 1) We're trying to answer the question of what fee X you need in order
>> to be confirmed within Y blocks.   The existing code tries to do that by
>> calculating the median fee for each possible Y instead of gathering
>> statistics for each possible X.  That approach is statistically incorrect.
>> In fact since certain X's appear so frequently, they tend to dominate the
>> statistics at all possible Y's (a fee rate of about 40k satoshis)
>>
>> 2) The existing code then sorts all of the data points in all of the
>> buckets together by fee rate and then reassigns buckets before calculating
>> the medians for each confirmation bucket.  The sorting forces a
>> relationship where there might not be one.  Imagine some other variable,
>> such as first 2 bytes of the transaction hash.  If we sorted these and then
>> used them to give estimates, we'd see a clear but false relationship where
>> transactions with low starting bytes in their hashes took longer to confirm.
>>
>> 3) Transactions which don't have all their inputs available (because they
>> depend on other transactions in the mempool) aren't excluded from the
>> calculations.  This skews the results.
>>
>> I rewrote the code to follow a different approach.  I divided all
>> possible fee rates up into fee rate buckets (I spaced these
>> logarithmically).  For each transaction that was confirmed, I updated the
>> appropriate fee rate bucket with how many blocks it took to confirm that
>> transaction.
>>
>> The hardest part of doing this fee estimation is to decide what the
>> question really is that we're trying to answer.  I took the approach that
>> if you are asking what fee rate I need to be confirmed within Y blocks,
>> then what you would like to know is the lowest fee rate such that a
>> relatively high percentage of transactions of that fee rate are confirmed
>> within Y blocks. Since even the highest fee transactions are confirmed
>> within the first block only 90-93% of the time, I decided to use 80% as my
>> cutoff.  So now to answer "estimatefee Y", I scan through all of the fee
>> buckets from the most expensive down until I find the last bucket with >80%
>> of the transactions confirmed within Y blocks.
>>
>> Unfortunately we still have the problem of not having enough data points
>> for non-typical fee rates, and so it requires gathering a lot of data to
>> give reasonable answers. To keep all of these data points in a circular
>> buffer and then sort them for every analysis (or after every new block) is
>> expensive.  So instead I adopted the approach of keeping an exponentially
>> decaying moving average for each bucket.  I used a decay of .998 which
>> represents a half life of 374 blocks or about 2.5 days.  Also if a
>> bucket doesn't have very many transactions, I combine it with the next
>> bucket.
>>
>> Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code.
>> I can create an actual pull request if there is consensus that it makes
>> sense to do so.
>>
>> I've attached a graph comparing the estimates produced for 1-3
>> confirmations by the new code and the old code.  I did apply the patch to
>> fix issue 3 above to the old code first.  The new code is in green and the
>> fixed code is in purple.  The Y axis is a log scale of feerate in satoshis
>> per KB and the X axis is chain height.  The new code produces the same
>> estimates for 2 and 3 confirmations (the answers are effectively quantized
>> by bucket).
>>
>> I've also completely reworked smartfees.py.  It turns out to require many
>> many more transactions are put through in order to have statistically
>> significant results, so the test is quite slow to run (about 3 mins on my
>> machine).
>>
>> I've also been running a real world test, sending transactions of various
>> fee rates and seeing how long they took to get confirmed.  After almost 200
>> tx's at each fee rate, here are the results so far:
>>
>> Fee rate 1100   Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528
>> 2: 0.751 3: 0.870
>> Fee rate 2500   Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528
>> 2: 0.766 3: 0.880
>> Fee rate 5000   Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528
>> 2: 0.782 3: 0.891
>> Fee rate 10000  Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569
>> 2: 0.844 3: 0.943
>> Fee rate 20000  Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715
>> 2: 0.963 3: 0.989
>> Fee rate 30000  Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751
>> 2: 0.974 3: 1.0
>> Fee rate 40000  Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792
>> 2: 0.953 3: 0.994
>> Fee rate 60000  Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875
>> 2: 1.0   3: 1.0
>> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
>> 2: 1.0   3: 1.0
>> Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886
>> 2: 0.989 3: 1.0
>>
>>
>> Alex
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>>

[-- Attachment #2: Type: text/html, Size: 13835 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-27 19:33 [Bitcoin-development] Reworking the policy estimation code (fee estimates) Alex Morcos
  2014-10-28  9:55 ` Mike Hearn
@ 2014-10-28 13:59 ` Gavin Andresen
  2014-10-28 14:30   ` Alex Morcos
  2014-10-29 20:08 ` Peter Todd
  2 siblings, 1 reply; 9+ messages in thread
From: Gavin Andresen @ 2014-10-28 13:59 UTC (permalink / raw)
  To: Alex Morcos; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

I think Alex's approach is better; I don't think we can know how much
better until we have a functioning fee market.

We don't have a functioning fee market now, because fees are hard-coded. So
we get "pay the hard-coded fee and you'll get confirmed in one or two or
three blocks, depending on which miners mine the next three blocks and what
time of day it is."

git HEAD code says you need a fee of 10,0000 satoshis/kb to be pretty sure
you'll get confirmed in the next block. That looks about right with Alex's
real-world data (if we take "90% chance" as 'pretty sure you'll get
confirmed'):

Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
2: 1.0   3: 1.0

My only concern with Alex's code is that it takes much longer to get
'primed' -- Alex, if I started with no data about fees, how long would it
take to be able to get enough data for a reasonable estimate of "what is
the least I can pay and still be 90% sure I get confirmed in 20 blocks" ?
Hours? Days? Weeks?

-- 
--
Gavin Andresen

[-- Attachment #2: Type: text/html, Size: 1577 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-28 13:59 ` Gavin Andresen
@ 2014-10-28 14:30   ` Alex Morcos
  2014-10-28 14:55     ` Alex Morcos
  2014-10-28 14:58     ` Gavin Andresen
  0 siblings, 2 replies; 9+ messages in thread
From: Alex Morcos @ 2014-10-28 14:30 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

Oh in just a couple of blocks, it'll give you a somewhat reasonable
estimate for asking about every confirmation count other than 1, but it
could take several hours for it to have enough data points to give you a
good estimate for getting confirmed in one block (because the prevalent
feerate is not always confirmed in 1 block >80% of the time)   Essentially
what it does is just combine buckets until it has enough data points, so
after the first block it might be treating all of the txs as belonging to
the same feerate bucket, but since the answer it returns is the "median"*
fee rate for that bucket, its a reasonable answer right off the get go.

Do you think it would make sense to make that 90% number an argument to rpc
call?  For instance there could be a default (I would use 80%) but then you
could specify if you required a different certainty.  It wouldn't require
any code changes and might make it easier for people to build more
complicated logic on top of it.

*It can't actually track the median, but it identifies which of the smaller
actual buckets the median would have fallen into and returns the average
feerate for that median bucket.

On Tue, Oct 28, 2014 at 9:59 AM, Gavin Andresen <gavinandresen@gmail.com>
wrote:

> I think Alex's approach is better; I don't think we can know how much
> better until we have a functioning fee market.
>
> We don't have a functioning fee market now, because fees are hard-coded.
> So we get "pay the hard-coded fee and you'll get confirmed in one or two or
> three blocks, depending on which miners mine the next three blocks and what
> time of day it is."
>
> git HEAD code says you need a fee of 10,0000 satoshis/kb to be pretty sure
> you'll get confirmed in the next block. That looks about right with Alex's
> real-world data (if we take "90% chance" as 'pretty sure you'll get
> confirmed'):
>
> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
> 2: 1.0   3: 1.0
>
> My only concern with Alex's code is that it takes much longer to get
> 'primed' -- Alex, if I started with no data about fees, how long would it
> take to be able to get enough data for a reasonable estimate of "what is
> the least I can pay and still be 90% sure I get confirmed in 20 blocks" ?
> Hours? Days? Weeks?
>
> --
> --
> Gavin Andresen
>

[-- Attachment #2: Type: text/html, Size: 3341 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-28 14:30   ` Alex Morcos
@ 2014-10-28 14:55     ` Alex Morcos
  2014-10-28 14:58     ` Gavin Andresen
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Morcos @ 2014-10-28 14:55 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3814 bytes --]

Sorry, perhaps I misinterpreted that question.  The estimates will be
dominated by the prevailing transaction rates initially, so the estimates
you get for something like "what is the least I can pay and still be 90%
sure I get confirmed in 20 blocks"  won't be insane, but they will still be
way too conservative.  I'm not sure what you meant by reasonable.  You
won't get the "correct" answer of something significantly less than 40k
sat/kB for quite some time.  Given that the half-life of the decay is 2.5
days, then within a couple of days.  And in fact even in the steady state,
the new code will still return a much higher rate than the existing code,
say 10k sat/kB instead of 1k sat/kB, but that's just a result of the
sorting the existing code does and the fact that no one places transactions
with that small fee.   To correctly give such low answers, the new code
will require that those super low feerate transactions are occurring
frequently enough, but the bar for enough datapoints in a feerate bucket is
pretty low, an average of 1 tx per block.  The bar can be made lower at the
expense of a bit of noisiness in the answers, for instance for priorities I
had to make the bar significantly lower because there are so many fewer
transactions confirmed because of priorities.  I'm certainly open to tuning
some of these variables.

On Tue, Oct 28, 2014 at 10:30 AM, Alex Morcos <morcos@gmail.com> wrote:

> Oh in just a couple of blocks, it'll give you a somewhat reasonable
> estimate for asking about every confirmation count other than 1, but it
> could take several hours for it to have enough data points to give you a
> good estimate for getting confirmed in one block (because the prevalent
> feerate is not always confirmed in 1 block >80% of the time)   Essentially
> what it does is just combine buckets until it has enough data points, so
> after the first block it might be treating all of the txs as belonging to
> the same feerate bucket, but since the answer it returns is the "median"*
> fee rate for that bucket, its a reasonable answer right off the get go.
>
> Do you think it would make sense to make that 90% number an argument to
> rpc call?  For instance there could be a default (I would use 80%) but then
> you could specify if you required a different certainty.  It wouldn't
> require any code changes and might make it easier for people to build more
> complicated logic on top of it.
>
> *It can't actually track the median, but it identifies which of the
> smaller actual buckets the median would have fallen into and returns the
> average feerate for that median bucket.
>
>
>
>
>
> On Tue, Oct 28, 2014 at 9:59 AM, Gavin Andresen <gavinandresen@gmail.com>
> wrote:
>
>> I think Alex's approach is better; I don't think we can know how much
>> better until we have a functioning fee market.
>>
>> We don't have a functioning fee market now, because fees are hard-coded.
>> So we get "pay the hard-coded fee and you'll get confirmed in one or two or
>> three blocks, depending on which miners mine the next three blocks and what
>> time of day it is."
>>
>> git HEAD code says you need a fee of 10,0000 satoshis/kb to be pretty
>> sure you'll get confirmed in the next block. That looks about right with
>> Alex's real-world data (if we take "90% chance" as 'pretty sure you'll get
>> confirmed'):
>>
>> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
>> 2: 1.0   3: 1.0
>>
>> My only concern with Alex's code is that it takes much longer to get
>> 'primed' -- Alex, if I started with no data about fees, how long would it
>> take to be able to get enough data for a reasonable estimate of "what is
>> the least I can pay and still be 90% sure I get confirmed in 20 blocks" ?
>> Hours? Days? Weeks?
>>
>> --
>> --
>> Gavin Andresen
>>
>
>

[-- Attachment #2: Type: text/html, Size: 5433 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-28 14:30   ` Alex Morcos
  2014-10-28 14:55     ` Alex Morcos
@ 2014-10-28 14:58     ` Gavin Andresen
  2014-10-28 15:39       ` Alex Morcos
  1 sibling, 1 reply; 9+ messages in thread
From: Gavin Andresen @ 2014-10-28 14:58 UTC (permalink / raw)
  To: Alex Morcos; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Tue, Oct 28, 2014 at 10:30 AM, Alex Morcos <morcos@gmail.com> wrote:
>
> Do you think it would make sense to make that 90% number an argument to
> rpc call?  For instance there could be a default (I would use 80%) but then
> you could specify if you required a different certainty.  It wouldn't
> require any code changes and might make it easier for people to build more
> complicated logic on top of it.
>

RE: 80% versus 90% :  I think a default of 80% will get us a lot of "the
fee estimation logic is broken, I want my transactions to confirm quick and
a lot of them aren't confirming for 2 or 3 blocks."

RE: RPC argument:  I'm reluctant to give too many 'knobs' for the RPC
interface. I think the default percentage makes sense as a
command-line/bitcoin.conf option; I can imagine services that want to save
on fees running with -estimatefeethreshold=0.5  (or
-estimatefeethreshold=0.95 if as-fast-as-possible confirmations are
needed). Setting both the number of confirmations and the estimation
threshold on a transaction-by-transaction basis seems like overkill to me.

-- 
--
Gavin Andresen

[-- Attachment #2: Type: text/html, Size: 1507 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-28 14:58     ` Gavin Andresen
@ 2014-10-28 15:39       ` Alex Morcos
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Morcos @ 2014-10-28 15:39 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]

RE: 90% : I think it's fine to use 90% for anything other than 1
confirmation, but if you look at the real world data test I did, or the raw
data from this new code, you'll see that even the highest fee rate
transactions only get confirmed at about a 90% rate in 1 block, so that if
you use that as your cut-off you will sometimes get no answer and sometimes
get a very high fee rate and sometimes get a reasonable fee rate, it just
depends because the data is too noisy.  I think thats just because there is
no good answer to that question.  There is no fee you can put on your
transaction to guarantee greater than 90% chance of getting confirmed in
one block.  I think 85% might be safe?

RE: tunable as command-line/bitcoin.conf: sounds good!

OK, sorry to have all this conversation on the dev list, maybe i'll turn
this into an actual PR if we want to comment on the code?
I just wanted to see if it even made sense to make a PR for this or this
isn't the way we wanted to go about it.

On Tue, Oct 28, 2014 at 10:58 AM, Gavin Andresen <gavinandresen@gmail.com>
wrote:

> On Tue, Oct 28, 2014 at 10:30 AM, Alex Morcos <morcos@gmail.com> wrote:
>>
>> Do you think it would make sense to make that 90% number an argument to
>> rpc call?  For instance there could be a default (I would use 80%) but then
>> you could specify if you required a different certainty.  It wouldn't
>> require any code changes and might make it easier for people to build more
>> complicated logic on top of it.
>>
>
> RE: 80% versus 90% :  I think a default of 80% will get us a lot of "the
> fee estimation logic is broken, I want my transactions to confirm quick and
> a lot of them aren't confirming for 2 or 3 blocks."
>
> RE: RPC argument:  I'm reluctant to give too many 'knobs' for the RPC
> interface. I think the default percentage makes sense as a
> command-line/bitcoin.conf option; I can imagine services that want to save
> on fees running with -estimatefeethreshold=0.5  (or
> -estimatefeethreshold=0.95 if as-fast-as-possible confirmations are
> needed). Setting both the number of confirmations and the estimation
> threshold on a transaction-by-transaction basis seems like overkill to me.
>
> --
> --
> Gavin Andresen
>

[-- Attachment #2: Type: text/html, Size: 3109 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
  2014-10-27 19:33 [Bitcoin-development] Reworking the policy estimation code (fee estimates) Alex Morcos
  2014-10-28  9:55 ` Mike Hearn
  2014-10-28 13:59 ` Gavin Andresen
@ 2014-10-29 20:08 ` Peter Todd
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Todd @ 2014-10-29 20:08 UTC (permalink / raw)
  To: Alex Morcos; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On Mon, Oct 27, 2014 at 03:33:45PM -0400, Alex Morcos wrote:
> I've been playing around with the code for estimating fees and found a few
> issues with the existing code.   I think this will address several
> observations that the estimates returned by the existing code appear to be
> too high.  For instance see @cozz in Issue 4866
> <https://github.com/bitcoin/bitcoin/issues/4866>.

I don't have time to look at the details of your statistical methods
unfortunately due to some deadlines, but a quick comment:

You should think about the malleability of your estimates to attackers.
For instance the current fee estimation code has a serious issue where
it'll happily estimate ludicriously high fees based on very little date.
There is a 'insane fees' failsafe, but it's IIRC set to allow
transactions with fees of less than 100mBTC/tx, roughly $50 at current
exchange rates. It's relatively easy to get a wallet into a condition
where this happens as the estimations are considered valid even based on
very little data - a simple sybil attack suffices. (e.g. the recently
published paper(1) on Tor sybil attacks comes to mind as one example of
many ways to do this) Obviously this could empty someone's wallet pretty
quickly; an exchange that makes a few dozen transactions an hour could
easily lose tens of thousands of dollars due to this exploit. Someone
correct me if I'm wrong, but last I checked in git HEAD this exploit is
still unfixed.

A user-configurable failsafe limit is a pretty obvious solution here,
albeit a crude one; it'd be interesting to see if a plausible security
argument could be made for something more sophisticated, like taking
into account coin-age of observed transactions that estimates are based
on.

1) "Bitcoin over Tor isn't a good idea",
   http://arxiv.org/abs/1410.6079

-- 
'peter'[:-1]@petertodd.org
0000000000000000098d3c9095b47ff1fd692fef5ac6731340802c7c63d38bb0

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 650 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-10-29 20:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-27 19:33 [Bitcoin-development] Reworking the policy estimation code (fee estimates) Alex Morcos
2014-10-28  9:55 ` Mike Hearn
2014-10-28 12:12   ` Alex Morcos
2014-10-28 13:59 ` Gavin Andresen
2014-10-28 14:30   ` Alex Morcos
2014-10-28 14:55     ` Alex Morcos
2014-10-28 14:58     ` Gavin Andresen
2014-10-28 15:39       ` Alex Morcos
2014-10-29 20:08 ` Peter Todd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox