public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [Bitcoin-development] Future Feature Proposal - getgist
@ 2014-06-04 19:30 Richard Moore
  2014-06-05  3:42 ` Mike Hearn
  2014-06-05 14:28 ` Mark Friedenbach
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Moore @ 2014-06-04 19:30 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]

Bitcoin development team,

I recently started implementing my own Python full-node, and had an idea, so I’m prowling through BIP 001 for this proposal, which says to e-mail you kind folks to make sure the idea is original (enough) and that there aren’t other existing means to accomplish what I was thinking. :)

The only way to grab all the headers seems to be to serially get one, then the next and so on, as you need the last hash of a headers() call to the next getheaders(). But we are on a peer-to-peer network, potentially able to getheaders() from many peers simultaneously, if we only knew the hash to look for.

What I was thinking is something to the effect of a getgist() command, fully backward compatible (any node not understanding it, can silently ignore it… Otherwise version or services could be used to announce the capability, but that seems like a little overkill). The inputs to getgist() would be similar to getheaders(); version, hash_count, block_locator_hash, stop_hash and an additional field, segment_count. The response would be a normal headers() message, except, not sequential block headers… Rather they would be spaced out, preferably 2000-block-hash-aligned from the first block hash. So, for example, if you have a blockchain with 198,005 blocks, and you passed it the block locator of height 0 (the genesis block), and a segment_count of 25, you would expect (approximately, the actual algorithm needs to be figured out), the block hashes at the following 25 (segment_count) heights:

1, 8000, 16000, 24000, 32000, 40000, 48000, 56000, 64000, 72000, 80000, 88000, 96000, 104000, 112000, 120000, 128000, 136000, 144000, 152000, 160000, 168000, 176000, 184000, 192000

Which can now be spread across 25 different nodes, fetching the block headers (albeit, out of order, possibly increasing the complexity of the local block chain database) but vastly increasing the speed the whole blockchain can have all headers synced.

I still need to perform some tests to see what type of speed gains there are, but I would suspect it should be segment_count times faster.

Existing methods could be to use checkpoint-ish nodes or bootstrap data files, but these begin relying on semi-cetralizenesses.

Ideas? Suggestions? Concerns? Prior it-ain’t-gonna-works?

Thanks!

RicMoo


.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸><(((º>

Richard Moore ~ Founder
Genetic Mistakes Software inc.
phone: (778) 882-6125
email: ricmoo@geneticmistakes.com
www: http://GeneticMistakes.com


[-- Attachment #2: Type: text/html, Size: 3848 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bitcoin-development] Future Feature Proposal - getgist
  2014-06-04 19:30 [Bitcoin-development] Future Feature Proposal - getgist Richard Moore
@ 2014-06-05  3:42 ` Mike Hearn
  2014-06-05 17:43   ` Richard Moore
  2014-06-05 14:28 ` Mark Friedenbach
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Hearn @ 2014-06-05  3:42 UTC (permalink / raw)
  To: Richard Moore; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

Why do you want to optimise this? getheaders is used by SPV clients not
full nodes. SPV clients like bitcoinj can and do simply ship with gist
files in them, then getheaders from the last "checkpoint"   (I wish I
hadn't reused terminology like that but this is what bitcoinj calls them).

In practice when I look at performance issues with current SPV clients,
getheaders speed is not on my radar.

[-- Attachment #2: Type: text/html, Size: 461 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bitcoin-development] Future Feature Proposal - getgist
  2014-06-04 19:30 [Bitcoin-development] Future Feature Proposal - getgist Richard Moore
  2014-06-05  3:42 ` Mike Hearn
@ 2014-06-05 14:28 ` Mark Friedenbach
  1 sibling, 0 replies; 5+ messages in thread
From: Mark Friedenbach @ 2014-06-05 14:28 UTC (permalink / raw)
  To: bitcoin-development

The correct approach here is header hash-tree commitments which enable
compact (logarithmic) SPV proofs that elide nearly all intervening
headers since the last checkpoint. You could then query the hash tree
for references to any of the headers you actually need.

See this message for details:

http://sourceforge.net/p/bitcoin/mailman/message/32111357/

On 06/04/2014 12:30 PM, Richard Moore wrote:
> Bitcoin development team,
> 
> I recently started implementing my own Python full-node, and had an
> idea, so I’m prowling through BIP 001 for this proposal, which says to
> e-mail you kind folks to make sure the idea is original (enough) and
> that there aren’t other existing means to accomplish what I was thinking. :)
> 
> The only way to grab all the headers seems to be to serially get one,
> then the next and so on, as you need the last hash of a headers() call
> to the next getheaders(). But we are on a peer-to-peer network,
> potentially able to getheaders() from many peers simultaneously, if we
> only knew the hash to look for.
> 
> What I was thinking is something to the effect of a getgist() command,
> fully backward compatible (any node not understanding it, can silently
> ignore it… Otherwise version or services could be used to announce the
> capability, but that seems like a little overkill). The inputs to
> getgist() would be similar to getheaders(); version, hash_count,
> block_locator_hash, stop_hash and an additional field, segment_count.
> The response would be a normal headers() message, except, not sequential
> block headers… Rather they would be spaced out, preferably
> 2000-block-hash-aligned from the first block hash. So, for example, if
> you have a blockchain with 198,005 blocks, and you passed it the block
> locator of height 0 (the genesis block), and a segment_count of 25, you
> would expect (approximately, the actual algorithm needs to be figured
> out), the block hashes at the following 25 (segment_count) heights:
> 
> 1, 8000, 16000, 24000, 32000, 40000, 48000, 56000, 64000, 72000, 80000,
> 88000, 96000, 104000, 112000, 120000, 128000, 136000, 144000, 152000,
> 160000, 168000, 176000, 184000, 192000
> 
> Which can now be spread across 25 different nodes, fetching the block
> headers (albeit, out of order, possibly increasing the complexity of the
> local block chain database) but vastly increasing the speed the whole
> blockchain can have all headers synced.
> 
> I still need to perform some tests to see what type of speed gains there
> are, but I would suspect it should be segment_count times faster.
> 
> Existing methods could be to use checkpoint-ish nodes or bootstrap data
> files, but these begin relying on semi-cetralizenesses.
> 
> Ideas? Suggestions? Concerns? Prior it-ain’t-gonna-works?
> 
> Thanks!
> 
> RicMoo
> 
> 
> .·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸><(((º>
> 
> Richard Moore ~ Founder
> Genetic Mistakes Software inc.
> phone: (778) 882-6125
> email: ricmoo@geneticmistakes.com <mailto:ricmoo@geneticmistakes.com>
> www: http://GeneticMistakes.com <http://GeneticMistakes.com/>
> 
> 
> 
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their 
> applications. Written by three acclaimed leaders in the field, 
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> 
> 
> 
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bitcoin-development] Future Feature Proposal - getgist
  2014-06-05  3:42 ` Mike Hearn
@ 2014-06-05 17:43   ` Richard Moore
  2014-06-05 19:34     ` Pieter Wuille
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Moore @ 2014-06-05 17:43 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]

I was considering names like getcheckpoints() to use the terminology that already seemed to be in place, but they were too long :)

I have been using getheaders() in my thick client to quickly grab all the headers before downloading the full blocks since I can grab more at a time. Even with getblocks(), there is the case for a  getgist() call. Right now you call getblocks(), which can take some time to get the corresponding inv(), at which time you can then start the call to getdata() as well as the next call to getblocks().

With a gist, for example of segment_count 50, you could call getgist(), then with the response, you could request 50 getblocks() each with a block_locator of 1 hash from the gist (and optimally the stop_hash of the next hash in the gist) to 50 different peers, providing 25,000 (50 x 500) block hashes.

Currently:
>>> getblocks()
<<< inv()
>>> getdata()
<<< block(), block(), block(), … (x 500)

Saturates one peer, while leaving the rest un-used. Step 1 and 2 can be repeated and dispatched to different peers, but there is still the latency between the two calls.

Gist:
>>> getgist()
<<< inv()
>>> getblocks(), getblocks(), getblocks(), … (x segment_count, 1 per peer)
<<< inv(), inv(), inv(), … (x segment_count, 1 per peer)
>>> getdata(), getdata(), getdata(), … (x segment_count, 1 per peer)
<<< block(), block(), block(), … (x (500 * segment_count), ie. 500 in per peer)

Each peer can be saturated.

I will try to run some experiments this weekend to get numbers as to whether there is actually any performance improvement using a gist, or whether the getdata(), block() latency ends up dominating the time anyways.


RicMoo


On Jun 4, 2014, at 11:42 PM, Mike Hearn <mike@plan99.net> wrote:

> Why do you want to optimise this? getheaders is used by SPV clients not full nodes. SPV clients like bitcoinj can and do simply ship with gist files in them, then getheaders from the last "checkpoint"   (I wish I hadn't reused terminology like that but this is what bitcoinj calls them).
> 
> In practice when I look at performance issues with current SPV clients, getheaders speed is not on my radar.

.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸><(((º>

Richard Moore ~ Founder
Genetic Mistakes Software inc.
phone: (778) 882-6125
email: ricmoo@geneticmistakes.com
www: http://GeneticMistakes.com


[-- Attachment #2: Type: text/html, Size: 3959 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bitcoin-development] Future Feature Proposal - getgist
  2014-06-05 17:43   ` Richard Moore
@ 2014-06-05 19:34     ` Pieter Wuille
  0 siblings, 0 replies; 5+ messages in thread
From: Pieter Wuille @ 2014-06-05 19:34 UTC (permalink / raw)
  To: Richard Moore; +Cc: Bitcoin Dev

On Thu, Jun 5, 2014 at 7:43 PM, Richard Moore <me@ricmoo.com> wrote:
> I was considering names like getcheckpoints() to use the terminology that
> already seemed to be in place, but they were too long :)
>
> I have been using getheaders() in my thick client to quickly grab all the
> headers before downloading the full blocks since I can grab more at a time.
> Even with getblocks(), there is the case for a  getgist() call. Right now
> you call getblocks(), which can take some time to get the corresponding
> inv(), at which time you can then start the call to getdata() as well as the
> next call to getblocks().
>
> With a gist, for example of segment_count 50, you could call getgist(), then
> with the response, you could request 50 getblocks() each with a
> block_locator of 1 hash from the gist (and optimally the stop_hash of the
> next hash in the gist) to 50 different peers, providing 25,000 (50 x 500)
> block hashes.

I don't understand. If you're using getheaders(), there is no need to
use getblocks() anymore. You just do a getdata() immediately for the
block hashes you have the headers but not the transactions for.

In general, I think we should aim for as much verifiability as
possible. Much of the reference client's design is built around doing
as much validation on received data as soon as possible, to avoid
being misled by a particular peer. Getheaders() provides this: you
receive a set of headers starting from a point you already know, in
order, and can validate them syntactically and for proof-of-work
immediately. That allows building a very-hard-to-beat tree structure
locally already, at which point you can start requesting blocks along
the good branches of that tree immediately - potentially in parallel
from multiple peers. In fact, that's the planned approach for the
headers-first synchronization.

The getgist() proposed here allows the peer to basically give you
bullshit headers at the end, and you won't notice until you've
downloaded every single block (or at least every single header) up to
that point. That risks wasting time, bandwidth and diskspace,
depending on implementation.

Based on earlier experimenting with my former experimental
headersfirst branch, it's quite possible to have 2 mostly independent
synchronization mechanisms going on; 1) asking and downloading headers
from every peer, and validating them, and 2) asking and downloading
blocks from multiple peers in parallel, for blocks corresponding to
validated headers. Downloading the headers succeeds within minutes,
and within seconds you have enough to start fetching blocks. After
that point, you can keep a "download window" full with outstanding
block requests, and as blocks go much slower than headers, the headers
process never becomes a blocker for blocks to download.

Unless we're talking about a system with billions of headers to
download, I don't think this is a worthwhile optimization.

-- 
Pieter



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-05 19:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 19:30 [Bitcoin-development] Future Feature Proposal - getgist Richard Moore
2014-06-05  3:42 ` Mike Hearn
2014-06-05 17:43   ` Richard Moore
2014-06-05 19:34     ` Pieter Wuille
2014-06-05 14:28 ` Mark Friedenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox