From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 089FEC13 for ; Thu, 20 Apr 2017 15:50:27 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-qk0-f174.google.com (mail-qk0-f174.google.com [209.85.220.174]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 9E4281B4 for ; Thu, 20 Apr 2017 15:50:25 +0000 (UTC) Received: by mail-qk0-f174.google.com with SMTP id d131so49993871qkc.3 for ; Thu, 20 Apr 2017 08:50:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=nfB1ug2W79FPoCDQOOL2qyiKr4DGVrVhzDUWu3AC3bA=; b=LwRHdFlhgG0bBSjy7ovYpNVDEsa/YpeRluytnkNIYUUhZJqmw5fyGGcFVSPJY5INbt tPtLFBuvGEbrhEKHyC3AMr0DO+WHM5C0BF1KZsWha+wg8R7dkAwR2g/bfNHd1EK4ZCM7 Rp0Lsm5Og3FOWdxVmNr9fAVh5JSgn9eYURU16DrsUlh2j0bdLZipQbYPw7OXQTs9KmfW q+Y6HES0TYE7MeTRbYpzUJS3ILBt44ZSCmLhffTCRkJZ2AH3oR67fAcNtABthkBwe8UN EMtzY5If2/CljRSxo3bczyc/m6jgkivK0ujE/16LiIGad/XHyC0UA4Lf1eueFmhjOsy0 vPxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=nfB1ug2W79FPoCDQOOL2qyiKr4DGVrVhzDUWu3AC3bA=; b=uQ3phWivbefphuFHCWjGmJU6mnmep5BatbhuFRfPNBSGn0pKgsbH819CG+KoZG8ST0 UQTgaiE2UOKYxaT7IQSx23HxW2w02eDOyO2X9Tx1h40Hlv79C8gQ2amOSgljLUbmXpA2 iys2bl4h/pVHU36Wq7SPt1VUDyrs6Ty/vI6YBA+oAh1nT02Yxe3V4Qb6vYKIpp/BxWdh Pg/fUippD3quLjkzW4vbjrG/YAuuA8m8/HVgt2u3dk9D2LPt9x32ikqR/9aGzGDhUoet nVHBh5qCeu2OcfxRRXLdyL752MTLmDqDKT3bFGAC2wZKuQQDEEZ5J9F8b0LA/o70Gayn cFuw== X-Gm-Message-State: AN3rC/4QDb3z+zvkpY3e9u6hkJ4ajOij2Wi4rgYmPkfJCt4CGYfTNZTl A0A9VVjUFYskhDS2k4Yvv65PAR/YWuoq X-Received: by 10.55.23.132 with SMTP id 4mr4726548qkx.85.1492703424704; Thu, 20 Apr 2017 08:50:24 -0700 (PDT) MIME-Version: 1.0 Sender: earonesty@gmail.com Received: by 10.200.0.146 with HTTP; Thu, 20 Apr 2017 08:50:24 -0700 (PDT) In-Reply-To: References: From: Erik Aronesty Date: Thu, 20 Apr 2017 11:50:24 -0400 X-Google-Sender-Auth: Oo8pWR7BB3ewYWWU3-3JLJDuXrk Message-ID: To: Danny Thorpe Content-Type: multipart/alternative; boundary=001a1147a97e628429054d9b19a2 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 20 Apr 2017 15:53:32 +0000 Cc: Bitcoin Dev Subject: Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Apr 2017 15:50:27 -0000 --001a1147a97e628429054d9b19a2 Content-Type: text/plain; charset=UTF-8 Try to find 1TB dedicated server hosting ... If you want to set up an ecommerce site somewhere besides your living room, storage costs are still a concern. On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > 1TB HDD is now available for under $40 USD. How is the 100GB storage > requirement preventing anyone from setting up full nodes? > > On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" < > bitcoin-dev@lists.linuxfoundation.org> wrote: > >> *Rationale:* >> >> A node that stores the full blockchain (I will use the term archival >> node) requires over 100GB of disk space, which I believe is one of the most >> significant barriers to more people running full nodes. And I believe the >> ecosystem would benefit substantially if more users were running full nodes. >> >> The best alternative today to storing the full blockchain is to run a >> pruned node, which keeps only the UTXO set and throws away already verified >> blocks. The operator of the pruned node is able to enjoy the full security >> benefits of a full node, but is essentially leeching the network, as they >> performed a large download likely without contributing anything back. >> >> This puts more pressure on the archival nodes, as the archival nodes need >> to pick up the slack and help new nodes bootstrap to the network. As the >> pressure on archival nodes grows, fewer people will be able to actually run >> archival nodes, and the situation will degrade. The situation would likely >> become problematic quickly if bitcoin-core were to ship with the defaults >> set to a pruned node. >> >> Even further, the people most likely to care about saving 100GB of disk >> space are also the people least likely to care about some extra bandwidth >> usage. For datacenter nodes, and for nodes doing lots of bandwidth, the >> bandwidth is usually the biggest cost of running the node. For home users >> however, as long as they stay under their bandwidth cap, the bandwidth is >> actually free. Ideally, new nodes would be able to bootstrap from nodes >> that do not have to pay for their bandwidth, instead of needing to rely on >> a decreasing percentage of heavy-duty archival nodes. >> >> I have (perhaps incorrectly) identified disk space consumption as the >> most significant factor in your average user choosing to run a pruned node >> or a lite client instead of a full node. The average user is not typically >> too worried about bandwidth, and is also not typically too worried about >> initial blockchain download time. But the 100GB hit to your disk space can >> be a huge psychological factor, especially if your hard drive only has >> 500GB available in the first place, and 250+ GB is already consumed by >> other files you have. >> >> I believe that improving the disk usage situation would greatly benefit >> decentralization, especially if it could be done without putting pressure >> on archival nodes. >> >> *Small Nodes Proposal:* >> >> I propose an alternative to the pruned node that does not put undue >> pressure on archival nodes, and would be acceptable and non-risky to ship >> as a default in bitcoin-core. For lack of a better name, I'll call this new >> type of node a 'small node'. The intention is that bitcoin-core would >> eventually ship 'small nodes' by default, such that the expected amount of >> disk consumption drops from today's 100+ GB to less than 30 GB. >> >> My alternative proposal has the following properties: >> >> + Full nodes only need to store ~20% of the blockchain >> + With very high probability, a new node will be able to recover the >> entire blockchain by connecting to 6 random small node peers. >> + An attacker that can eliminate a chosen+ 95% of the full nodes running >> today will be unable to prevent new nodes from downloading the full >> blockchain, even if the attacker is also able to eliminate all archival >> nodes. (assuming all nodes today were small nodes instead of archival nodes) >> >> Method: >> >> A small node will pick an index [5, 256). This index is that node's >> permanent index. When storing a block, instead of storing the full block, >> the node will use Reed-Solomon coding to erasure code the block using a >> 5-of-256 scheme. The result will be 256 pieces that are 20% of the size of >> the block each. The node picks the piece that corresponds to its index, and >> stores that instead. (Indexes 0-4 are reserved for archival nodes - >> explained later) >> >> The node is now storing a fragment of every block. Alone, this fragment >> cannot be used to recover any piece of the blockchain. However, when paired >> with any 5 unique fragments (fragments of the same index will not be >> unique), the full block can be recovered. >> >> Nodes can optionally store more than 1 fragment each. At 5 fragments, the >> node becomes a full archival node, and the chosen indexes should be 0-4. >> This is advantageous for the archival node as the encoded data for the >> first 5 indexes will actually be identical to the block itself - there is >> no computational overhead for selecting the first indexes. There is also no >> need to choose random indexes, because the full block can be recovered no >> matter which indexes are chosen. >> >> When connecting to new peers, the indexes of each peer needs to be known. >> Once peers totaling 5 unique indexes are discovered, blockchain download >> can begin. Connecting to just 5 small node peers provides a >95% chance of >> getting 5 uniques, with exponentially improving odds of success as you >> connect to more peers. Connecting to a single archive node guarantees that >> any gaps can be filled. >> >> A good encoder should be able to turn a block into a 5-of-256 piece set >> in under 10 milliseconds using a single core on a standard consumer >> desktop. This should not slow down initial blockchain download >> substantially, though the overhead is more than a rounding error. >> >> *DoS Prevention:* >> >> A malicious node may provide garbage data instead of the actual piece. >> Given just the garbage data and 4 other correct pieces, it is impossible >> (best I know anyway) to tell which piece is the garbage piece. >> >> One option in this case would be to seek out an archival node that could >> verify the correctness of the pieces, and identify the malicious node. >> >> Another option would be to have the small nodes store a cryptographic >> checksum of each piece. Obtaining the cryptographic checksum for all 256 >> pieces would incur a nontrivial amount of hashing (post segwit, as much as >> 100MB of extra hashing per block), and would require an additional ~4kb of >> storage per block. The hashing overhead here may be prohibitive. >> >> Another solution would be to find additional pieces and brute-force >> combinations of 5 until a working combination was discovered. Though this >> sounds nasty, it should take less than five seconds of computation to find >> the working combination given 5 correct pieces and 2 incorrect pieces. This >> computation only needs to be performed once to identify the malicious peers. >> >> I also believe that alternative erasure coding schemes exist which >> actually are able to identify the bad pieces given sufficient good pieces, >> however I don't know if they have the same computational performance as the >> best Reed-Solomon coding implementations. >> >> *Deployment:* >> >> Small nodes are completely useless unless the critical mass of 5 pieces >> can be obtained. The first version that supports small node block downloads >> should default everyone to an archival node (meaning indexes 0-4 are used) >> >> Once there are enough small-node-enabled archive nodes, the default can >> be switched so that nodes only have a single index by default. In the first >> few days, when there are only a few small nodes, the previously-deployed >> archival nodes can help fill in the gaps, and the small nodes can be useful >> for blockchain download right away. >> >> ---------------------------------- >> >> This represents a non-trivial amount of code, but I believe that the >> result would be a non-trivial increase in the percentage of users running >> full nodes, and a healthier overall network. >> >> _______________________________________________ >> bitcoin-dev mailing list >> bitcoin-dev@lists.linuxfoundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >> >> > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > --001a1147a97e628429054d9b19a2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Try to find 1TB dedicated server hosting ...

If yo= u want to set up an ecommerce site somewhere besides your living room, stor= age costs are still a concern.

On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bi= tcoin-dev <bitcoin-dev@lists.linuxfoundation.org&g= t; wrote:
1TB HD= D is now available for under $40= =C2=A0USD.=C2=A0 How is the 100GB storage requirement preventing any= one from setting up full nodes?

On Apr 16, 2017 11:55 PM, "= David Vorick via bitcoin-dev" <bitcoin-dev@lists.linuxfoundati= on.org> wrote:
Rational= e:

A node that stores the full blockchain (I will use= the term archival node) requires over 100GB of disk space, which I believe= is one of the most significant barriers to more people running full nodes.= And I believe the ecosystem would benefit substantially if more users were= running full nodes.

The best alternative today to storing the= full blockchain is to run a pruned node, which keeps only the UTXO set and= throws away already verified blocks. The operator of the pruned node is ab= le to enjoy the full security benefits of a full node, but is essentially l= eeching the network, as they performed a large download likely without cont= ributing anything back.

This puts more pressure on the ar= chival nodes, as the archival nodes need to pick up the slack and help new = nodes bootstrap to the network. As the pressure on archival nodes grows, fe= wer people will be able to actually run archival nodes, and the situation w= ill degrade. The situation would likely become problematic quickly if bitco= in-core were to ship with the defaults set to a pruned node.

<= div>Even further, the people most likely to care about saving 100GB of disk= space are also the people least likely to care about some extra bandwidth = usage. For datacenter nodes, and for nodes doing lots of bandwidth, the ban= dwidth is usually the biggest cost of running the node. For home users howe= ver, as long as they stay under their bandwidth cap, the bandwidth is actua= lly free. Ideally, new nodes would be able to bootstrap from nodes that do = not have to pay for their bandwidth, instead of needing to rely on a decrea= sing percentage of heavy-duty archival nodes.

I have (per= haps incorrectly) identified disk space consumption as the most significant= factor in your average user choosing to run a pruned node or a lite client= instead of a full node. The average user is not typically too worried abou= t bandwidth, and is also not typically too worried about initial blockchain= download time. But the 100GB hit to your disk space can be a huge psycholo= gical factor, especially if your hard drive only has 500GB available in the= first place, and 250+ GB is already consumed by other files you have.
<= br>
I believe that improving the disk usage situation would great= ly benefit decentralization, especially if it could be done without putting= pressure on archival nodes.

Small Nodes Pr= oposal:

I propose an alternative to the pruned node t= hat does not put undue pressure on archival nodes, and would be acceptable = and non-risky to ship as a default in bitcoin-core. For lack of a better na= me, I'll call this new type of node a 'small node'. The intenti= on is that bitcoin-core would eventually ship 'small nodes' by defa= ult, such that the expected amount of disk consumption drops from today'= ;s 100+ GB to less than 30 GB.

My alternative proposal ha= s the following properties:

+ Full nodes only need to sto= re ~20% of the blockchain
+ With very high probability, a new= node will be able to recover the entire blockchain by connecting to 6 rand= om small node peers.
+ An attacker that can eliminate a chose= n+ 95% of the full nodes running today will be unable to prevent new nodes = from downloading the full blockchain, even if the attacker is also able to = eliminate all archival nodes. (assuming all nodes today were small nodes in= stead of archival nodes)

Method:

A smal= l node will pick an index [5, 256). This index is that node's permanent= index. When storing a block, instead of storing the full block, the node w= ill use Reed-Solomon coding to erasure code the block using a 5-of-256 sche= me. The result will be 256 pieces that are 20% of the size of the block eac= h. The node picks the piece that corresponds to its index, and stores that = instead. (Indexes 0-4 are reserved for archival nodes - explained later)
The node is now storing a fragment of every block. Alone, t= his fragment cannot be used to recover any piece of the blockchain. However= , when paired with any 5 unique fragments (fragments of the same index will= not be unique), the full block can be recovered.

Nodes c= an optionally store more than 1 fragment each. At 5 fragments, the node bec= omes a full archival node, and the chosen indexes should be 0-4. This is ad= vantageous for the archival node as the encoded data for the first 5 indexe= s will actually be identical to the block itself - there is no computationa= l overhead for selecting the first indexes. There is also no need to choose= random indexes, because the full block can be recovered no matter which in= dexes are chosen.

When connecting to new peers, the index= es of each peer needs to be known. Once peers totaling 5 unique indexes are= discovered, blockchain download can begin. Connecting to just 5 small node= peers provides a >95% chance of getting 5 uniques, with exponentially i= mproving odds of success as you connect to more peers. Connecting to a sing= le archive node guarantees that any gaps can be filled.

A= good encoder should be able to turn a block into a 5-of-256 piece set in u= nder 10 milliseconds using a single core on a standard consumer desktop. Th= is should not slow down initial blockchain download substantially, though t= he overhead is more than a rounding error.

= DoS Prevention:

A malicious node may provide garbage = data instead of the actual piece. Given just the garbage data and 4 other c= orrect pieces, it is impossible (best I know anyway) to tell which piece is= the garbage piece.

One option in this case would be to s= eek out an archival node that could verify the correctness of the pieces, a= nd identify the malicious node.

Another option would be t= o have the small nodes store a cryptographic checksum of each piece. Obtain= ing the cryptographic checksum for all 256 pieces would incur a nontrivial = amount of hashing (post segwit, as much as 100MB of extra hashing per block= ), and would require an additional ~4kb of storage per block. The hashing o= verhead here may be prohibitive.

Another solution would b= e to find additional pieces and brute-force combinations of 5 until a worki= ng combination was discovered. Though this sounds nasty, it should take les= s than five seconds of computation to find the working combination given 5 = correct pieces and 2 incorrect pieces. This computation only needs to be pe= rformed once to identify the malicious peers.

I also beli= eve that alternative erasure coding schemes exist which actually are able t= o identify the bad pieces given sufficient good pieces, however I don't= know if they have the same computational performance as the best Reed-Solo= mon coding implementations.

Deployment:

Small nodes are completely useless unless the critical mass of 5 pi= eces can be obtained. The first version that supports small node block down= loads should default everyone to an archival node (meaning indexes 0-4 are = used)

Once there are enough small-node-enabled archive no= des, the default can be switched so that nodes only have a single index by = default. In the first few days, when there are only a few small nodes, the = previously-deployed archival nodes can help fill in the gaps, and the small= nodes can be useful for blockchain download right away.

-----------= -----------------------

This represents a non-trivia= l amount of code, but I believe that the result would be a non-trivial incr= ease in the percentage of users running full nodes, and a healthier overall= network.

_______________________________________________
bitcoin-dev mailing list
= bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org= /mailman/listinfo/bitcoin-dev


_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.= linuxfoundation.org
https://lists.linuxfoundation.org= /mailman/listinfo/bitcoin-dev


--001a1147a97e628429054d9b19a2--