From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194] helo=mx.sourceforge.net) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1Xj3Ua-0004d4-Gf for bitcoin-development@lists.sourceforge.net; Tue, 28 Oct 2014 09:55:08 +0000 Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.214.179 as permitted sender) client-ip=209.85.214.179; envelope-from=mh.in.england@gmail.com; helo=mail-ob0-f179.google.com; Received: from mail-ob0-f179.google.com ([209.85.214.179]) by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1Xj3UX-00011y-P4 for bitcoin-development@lists.sourceforge.net; Tue, 28 Oct 2014 09:55:08 +0000 Received: by mail-ob0-f179.google.com with SMTP id m8so206007obr.38 for ; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.213.2 with SMTP id no2mr204777obc.76.1414490100240; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) Sender: mh.in.england@gmail.com Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) In-Reply-To: References: Date: Tue, 28 Oct 2014 09:55:00 +0000 X-Google-Sender-Auth: DVEKo3s_GBo8W4lKZ6PofjnJK2Y Message-ID: From: Mike Hearn To: Alex Morcos Content-Type: multipart/alternative; boundary=001a11c2de64f69ad2050678a37e X-Spam-Score: -0.5 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (mh.in.england[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-Headers-End: 1Xj3UX-00011y-P4 Cc: Bitcoin Dev Subject: Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates) X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 09:55:08 -0000 --001a11c2de64f69ad2050678a37e Content-Type: text/plain; charset=UTF-8 Could you explain a little further why you think the current approach is statistically incorrect? There's no doubt that the existing estimates the system produces are garbage, but that's because it assumes players in the fee market are rational and they are not. Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of only 1000 satoshis per kB by default. I also have a program that measures confirmation time for a given fee level (with fresh coins so there's no priority) and it aligns with your findings, most txns confirm within a couple of blocks. Ultimately there isn't any easy method to stop people throwing money away. Bitcoinj will probably continue to use hard coded fee values for now to try and contribute to market sanity in the hope it makes smartfees smarter. On 27 Oct 2014 19:34, "Alex Morcos" wrote: > I've been playing around with the code for estimating fees and found a few > issues with the existing code. I think this will address several > observations that the estimates returned by the existing code appear to be > too high. For instance see @cozz in Issue 4866 > . > > Here's what I found: > > 1) We're trying to answer the question of what fee X you need in order to > be confirmed within Y blocks. The existing code tries to do that by > calculating the median fee for each possible Y instead of gathering > statistics for each possible X. That approach is statistically incorrect. > In fact since certain X's appear so frequently, they tend to dominate the > statistics at all possible Y's (a fee rate of about 40k satoshis) > > 2) The existing code then sorts all of the data points in all of the > buckets together by fee rate and then reassigns buckets before calculating > the medians for each confirmation bucket. The sorting forces a > relationship where there might not be one. Imagine some other variable, > such as first 2 bytes of the transaction hash. If we sorted these and then > used them to give estimates, we'd see a clear but false relationship where > transactions with low starting bytes in their hashes took longer to confirm. > > 3) Transactions which don't have all their inputs available (because they > depend on other transactions in the mempool) aren't excluded from the > calculations. This skews the results. > > I rewrote the code to follow a different approach. I divided all possible > fee rates up into fee rate buckets (I spaced these logarithmically). For > each transaction that was confirmed, I updated the appropriate fee rate > bucket with how many blocks it took to confirm that transaction. > > The hardest part of doing this fee estimation is to decide what the > question really is that we're trying to answer. I took the approach that > if you are asking what fee rate I need to be confirmed within Y blocks, > then what you would like to know is the lowest fee rate such that a > relatively high percentage of transactions of that fee rate are confirmed > within Y blocks. Since even the highest fee transactions are confirmed > within the first block only 90-93% of the time, I decided to use 80% as my > cutoff. So now to answer "estimatefee Y", I scan through all of the fee > buckets from the most expensive down until I find the last bucket with >80% > of the transactions confirmed within Y blocks. > > Unfortunately we still have the problem of not having enough data points > for non-typical fee rates, and so it requires gathering a lot of data to > give reasonable answers. To keep all of these data points in a circular > buffer and then sort them for every analysis (or after every new block) is > expensive. So instead I adopted the approach of keeping an exponentially > decaying moving average for each bucket. I used a decay of .998 which > represents a half life of 374 blocks or about 2.5 days. Also if a bucket > doesn't have very many transactions, I combine it with the next bucket. > > Here is a link to the code. I > can create an actual pull request if there is consensus that it makes sense > to do so. > > I've attached a graph comparing the estimates produced for 1-3 > confirmations by the new code and the old code. I did apply the patch to > fix issue 3 above to the old code first. The new code is in green and the > fixed code is in purple. The Y axis is a log scale of feerate in satoshis > per KB and the X axis is chain height. The new code produces the same > estimates for 2 and 3 confirmations (the answers are effectively quantized > by bucket). > > I've also completely reworked smartfees.py. It turns out to require many > many more transactions are put through in order to have statistically > significant results, so the test is quite slow to run (about 3 mins on my > machine). > > I've also been running a real world test, sending transactions of various > fee rates and seeing how long they took to get confirmed. After almost 200 > tx's at each fee rate, here are the results so far: > > Fee rate 1100 Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528 > 2: 0.751 3: 0.870 > Fee rate 2500 Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528 > 2: 0.766 3: 0.880 > Fee rate 5000 Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528 > 2: 0.782 3: 0.891 > Fee rate 10000 Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569 > 2: 0.844 3: 0.943 > Fee rate 20000 Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715 > 2: 0.963 3: 0.989 > Fee rate 30000 Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751 > 2: 0.974 3: 1.0 > Fee rate 40000 Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792 > 2: 0.953 3: 0.994 > Fee rate 60000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875 > 2: 1.0 3: 1.0 > Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901 > 2: 1.0 3: 1.0 > Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886 > 2: 0.989 3: 1.0 > > > Alex > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > > --001a11c2de64f69ad2050678a37e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Could you explain a little further why you think the current= approach is statistically incorrect? There's no doubt that the existin= g estimates the system produces are garbage, but that's because it assu= mes players in the fee market are rational and they are not.

Fwiw bitcoinj 0.12.1 applies the January fee drop and will a= ttach fee of only 1000 satoshis per kB by default. I also have a program th= at measures confirmation time for a given fee level (with fresh coins so th= ere's no priority) and it aligns with your findings, most txns confirm = within a couple of blocks.

Ultimately there isn't any easy method to stop people th= rowing money away. Bitcoinj will probably continue to use hard coded fee va= lues for now to try and contribute to market sanity in the hope it makes sm= artfees smarter.

On 27 Oct 2014 19:34, "Alex Morcos" &l= t;morcos@gmail.com> wrote:
I've= been playing around with the code for estimating fees and found a few issu= es with the existing code. =C2=A0 I think this will address several observa= tions that the estimates returned by the existing code appear to be too hig= h.=C2=A0 For instance see @cozz in Issue 4866.

Her= e's what I found:

1)=C2=A0We're trying to answer the question of what fee X you need= in order to be confirmed within Y blocks. =C2=A0 The existing code tries t= o do that by calculating the median fee for each possible Y instead of gath= ering statistics for each possible X.=C2=A0 That approach is=C2=A0statistic= ally=C2=A0incorrect.=C2=A0 In fact since certain X's appear so frequent= ly, they tend to dominate the statistics at all possible Y's (a fee rat= e of about 40k satoshis)

2) The existing code then sorts all of th= e data points in all of the buckets together by fee rate and then reassigns= buckets before calculating the medians for each confirmation bucket. =C2= =A0The s= orting forces a relationship where there might not be one.=C2=A0 Imagine so= me other variable, such as first 2 bytes of the transaction hash.=C2=A0 If = we sorted these and then used them to give estimates, we'd see a clear = but false relationship where transactions with low starting bytes in their = hashes took longer to confirm.

3) Transactions which don't hav= e all their inputs available (because they depend on other transactions in = the mempool) aren't excluded from the calculations.=C2=A0 This skews th= e results.

I rewrote the code to follow a different approach.=C2=A0 I divided all pos= sible fee rates up into fee rate buckets (I spaced these logarithmically).= =C2=A0 For each transaction that was confirmed, I updated the appropriate f= ee rate bucket=C2=A0with how many blocks it took to confirm that transactio= n. =C2=A0

The hardest part of doing this fee estimation is to deci= de what the question really is that we're trying to answer.=C2=A0 I too= k the approach that if you are asking what fee rate I need to be confirmed = within Y blocks, then what you would like to know is the lowest fee rate su= ch that a relatively high percentage of transactions of that fee rate are c= onfirmed within Y blocks. Since even the highest fee transactions are confi= rmed within the first block only 90-93% of the time, I decided to use 80% a= s my cutoff.=C2=A0 So now to answer "estimatefee Y", I scan throu= gh all of the fee buckets from the most expensive down until I find the las= t bucket with >80% of the transactions confirmed within Y blocks.=

Unfortunately we still have the problem of not having enough dat= a points for non-typical fee rates, and so it requires gathering a lot of d= ata to give reasonable answers. To keep all of these data points in a circu= lar buffer and then sort them for every analysis (or after every new block)= is expensive.=C2=A0 So instead I adopted the approach of keeping an expone= ntially decaying moving average for each bucket. =C2=A0I used a decay of .998 whi= ch represents a half life of 374 blocks or about 2.5 days. =C2=A0Also if a bucket d= oesn't have very many transactions, I combine it with the next bucket.<= /span>
Here is a link to the code.=C2=A0 I can create an actual pull request= if there is consensus that it makes sense to do so.

I've atta= ched a graph comparing the estimates produced for 1-3 confirmations by the = new code and the old code.=C2=A0 I did apply the patch to fix issue 3 above= to the old code first.=C2=A0 The new code is in green and the fixed code i= s in purple.=C2=A0 The Y axis is a log scale of feerate in satoshis per KB = and the X axis is chain height.=C2=A0 The new code produces the same estima= tes for 2 and 3 confirmations (the answers are effectively quantized by buc= ket).

I've also completely reworked smartfees.py.=C2=A0 It tur= ns out to require many many more transactions are put through in order to h= ave statistically significant results, so the test is quite slow to run (ab= out 3 mins on my machine).

I've also been running a real world= test, sending transactions of various fee rates and seeing how long they t= ook to get confirmed.=C2=A0 After almost 200 tx's at each fee rate, her= e are the results so far:

Fee rate 1100 =C2=A0 Avg blocks to confirm 2.30= NumBlocks:% confirmed 1: 0.528 2: 0.751 3: 0.870
Fee rate 2500 =C2=A0 Avg blocks to confirm 2.= 22 NumBlocks:% confirmed 1: 0.528 2: 0.766 3: 0.880
Fee rate 5000 =C2=A0 Avg blocks to confirm = 1.93 NumBlocks:% confirmed 1: 0.528 2: 0.782 3: 0.891
Fee rate 10000 =C2=A0Avg blocks to confir= m 1.67 NumBlocks:% confirmed 1: 0.569 2: 0.844 3: 0.943
Fee rate 20000 =C2=A0Avg blocks to conf= irm 1.33 NumBlocks:% confirmed 1: 0.715 2: 0.963 3: 0.989
= Fee rate 30000 =C2=A0Avg blocks to co= nfirm 1.27 NumBlocks:% confirmed 1: 0.751 2: 0.974 3: 1.0
= Fee rate 40000 =C2=A0Avg blocks to co= nfirm 1.25 NumBlocks:% confirmed 1: 0.792 2: 0.953 3: 0.994
Fee rate 60000 =C2=A0Avg blocks to = confirm 1.12 NumBlocks:% confirmed 1: 0.875 2: 1.0 =C2=A0 3: 1.0
Fee rate 100000 Avg blocks to = confirm 1.09 NumBlocks:% confirmed 1: 0.901 2: 1.0 =C2=A0 3: 1.0
Fee rate 300000 Avg blocks to = confirm 1.12 NumBlocks:% confirmed 1: 0.886 2: 0.989 3: 1.0


Alex

-----------------------------------------------------------------------= -------

_______________________________________________
Bitcoin-development mailing list
Bitcoin-develo= pment@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-de= velopment

--001a11c2de64f69ad2050678a37e--