Devs,
I have decided to upgrade Armory's blockchain utilities, partly out
of necessity due to a poor code decision I made before I even
decided I was making a client. In an effort to avoid such mistakes
again, I want to do it "right" this time around, and realize that
this is a good discussion for all the devs that will have to deal
with this eventually...
The part I'm having difficulty with, is the idea that in a few years
from now, it just may not be feasible to hold transactions file-pointers
in RAM, because even that would overwhelm standard RAM sizes.
Without any degree of blockchain compression, I see that the most
general, scalable solution is probably a complicated one.
On the other hand, where this fails may be where we have already
predicted that the network will have to split into "super-nodes" and
"lite nodes." In which case, this discussion is still a good one,
but just directed more towards the super-nodes. But, there may
still be a point at which super-nodes don't have enough RAM to hold
this data...
(1) As for how small you can get the data: my original idea was
that the entire blockchain is stored on disk as blkXXXX.dat files.
I store all transactions as 10-byte "file-references." 10 bytes
would be
-- X in blkX.dat (2 bytes)
-- Tx start byte (4 bytes)
-- Tx size bytes (4 bytes)
The file-refs would be stored in a multimap indexed by the first 6
bytes of the tx-hash. In this way, when I search the multimap, I
potentially get a list of file-refs, and I might have to retrieve a
couple of tx from disk before finding the right one, but it would be
a good trade-off compared to storing all 32 bytes (that's assuming
that multimap nodes don't have too much overhead).
But even with this, if there are 1,000,000,000 transactions in the
blockchain, each node is probably 48 bytes (16 bytes +
map/container overhead), then you're talking about 48 GB to track
all the data in RAM. mmap() may help here, but I'm not sure it's
the right solution
(2) What other ways are there, besides some kind of blockchain
compression, to maintain a multi-terabyte blockchain, assuming that
storing references to each tx would overwhelm available RAM? Maybe
that assumption isn't necessary, but I think it prepares for the
worst.
Or maybe I'm too narrow in my focus. How do other people envision
this will be handled in the future. I've heard so many vague
notions of "well we could do this or that, or it
wouldn't be hard to do that" but I haven't heard any serious
proposals for it. And while I believe that blockchain compression
will become ubiquitous in the future, not everyone believes that,
and there will undoubtedly be users/devs that want to
maintain everything under all circumstances.
-Alan