[Bitcoin-development] LevelDB benchmarking

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] LevelDB benchmarking
@ 2012-06-18 18:41 Mike Hearn
       [not found] ` <CAAS2fgTNqUeYy+oEFyQWrfs4Xyb=3NXutvCmLusknF-18JmFQg@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Hearn @ 2012-06-18 18:41 UTC (permalink / raw)
  To: Bitcoin Dev

I switched the transaction database to use the Google LevelDB library,
which is a refactored out part of BigTable.

Here are my results. All tests are done on this hard disk:

  http://wdc.custhelp.com/app/answers/detail/a_id/1409/~/specifications-for-the-500-gb-caviar-blue-and-caviar-se16-serial-ata-drive

which has an average 8.9msec seek time. It is a 6 core Ubuntu machine.

I used -loadblock on a chain with with 185127 blocks in it, so it has
lots of SatoshiDice traffic.

8.9 ms (average) seek time

>> Regular BDB as we have today:
real	96m6.836s
user	49m55.220s
sys	2m29.850s

Throughput usually 4-5MB/sec according to iotop, pauses of 8-10
seconds for “Flushing wallet ...”. 611mb of blkindex.dat

>> BDB without sig checking
Throughput, 12-17mb/sec
real	42m51.508s
user	11m52.700s
sys	2m36.590s

Disabling EC verification halves running time.

>> LevelDB no customized options
(I ran the wrong time command here, hence the different format)
3184.73user 181.02system 51:20.81elapsed 109%CPU (0avgtext+0avgdata
1220096maxresident)k
1104inputs+125851776outputs (293569major+37436202minor)pagefaults 0swaps

So, 50 minutes. Throughput often in range of 20-30mb/sec. 397MB of data files.

>> LevelDB w/ 10 bit per key bloom filter
real	50m52.740s
user	53m38.870s
sys	3m4.990s

424mb of data files

No change.

>> LevelDB w/ 10 bit per key bloom filter + 30mb cache (up from 8mb by default)
real	50m53.054s
user	53m26.910s
sys	3m10.720s

No change. The reason is, signature checking is the bottleneck not IO.

>> LevelDB w/10 bit per key bloom filter, 30mb cache, no sigs
real	12m58.998s
user	11m42.330s
sys	2m5.670s

12 minutes vs 42 minutes for BDB on the same benchmark.

Conclusion: LevelDB is a clear win, taking a sync in the absence of
network delays from 95 minutes to 50, at which point signature
checking becomes the bottleneck. It is nearly 4x as fast when
signature checks are not done (ie, when receiving a block containing
only mempool transactions you already verified).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
       [not found] ` <CAAS2fgTNqUeYy+oEFyQWrfs4Xyb=3NXutvCmLusknF-18JmFQg@mail.gmail.com>
@ 2012-06-19  9:05   ` Mike Hearn
  2012-06-19 11:38     ` Pieter Wuille
  2012-06-19 15:05     ` Gavin Andresen
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Hearn @ 2012-06-19  9:05 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Dev

+list

On Mon, Jun 18, 2012 at 9:07 PM, Gregory Maxwell <gmaxwell@gmail.com> wrote:
> In addition to the ECDSA caching,  ECDSA can can easily be run on
> multiple cores for basically a linear speedup.. so even with the
> checking in place once ECDSA was using multiple threads we'd be back
> to the DB being the bottleneck for this kind of case.

Maybe ... looking again I think I may be wrong about being IO bound in
the last benchmark. The core running the main Bitcoin thread is still
pegged and the LevelDB background thread is only spending around 20%
of its time in iowait. An oprofile shows most of the time being spent
inside a std::map.

OK, to make progress on this work I need a few decisions (Gavin?)

1) Shall we do it?

2) LevelDB is obscure, new and has a very minimalist build system. It
supports "make" but not "make install", for example, and is unlikely
to be packaged. It's also not very large. I suggest we just check the
source into the main Bitcoin tree and link it statically rather than
complicate the build.

3) As the DB format would change and a slow migration period
necessary, any other tweaks to db format we could make at the same
time? Right now the key/values are the same as before, though using
satoshi serialization for everything is a bit odd.

We'd need UI for migration as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-19  9:05   ` Mike Hearn
@ 2012-06-19 11:38     ` Pieter Wuille
  2012-06-19 15:05     ` Gavin Andresen
  1 sibling, 0 replies; 12+ messages in thread
From: Pieter Wuille @ 2012-06-19 11:38 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

On Tue, Jun 19, 2012 at 11:05:20AM +0200, Mike Hearn wrote:
> OK, to make progress on this work I need a few decisions (Gavin?)
> 
> 1) Shall we do it?

I'm all for moving away from BDB. It's a very good system for what it is
intended for, but that is not how we use it. The fact that it is tied to
a database environment (but people want to copy the files themselves
between systems), that is provides consistency in case of failures (but
because we remove old log files, we still see very frequent corrupted
systems), the fact that its environments are sometimes not even forward-
compatible, ...

Assuming LevelDB is an improvement in these areas as well as resulting in
a speed improvement, I like it.

> 2) LevelDB is obscure, new and has a very minimalist build system. It
> supports "make" but not "make install", for example, and is unlikely
> to be packaged. It's also not very large. I suggest we just check the
> source into the main Bitcoin tree and link it statically rather than
> complicate the build.

How portable is LevelDB? How well tested is it? What compatibility
guarantees exist between versions of the system?

I don't mind including the source code; it doesn't seem particularly
large, and the 2-clause BSD license shouldn't be a problem.

> 3) As the DB format would change and a slow migration period
> necessary, any other tweaks to db format we could make at the same
> time? Right now the key/values are the same as before, though using
> satoshi serialization for everything is a bit odd.
> 
> We'd need UI for migration as well.

Jeff was working on splitting the database into several files earlier, and
I'm working on the database/validation logic as well. Each of these will
require a rebuild of the databases anyway. If possible, we should try to
get them in a single release, so people only need to rebuild once. 

PS: can we see the code?

-- 
Pieter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-19  9:05   ` Mike Hearn
  2012-06-19 11:38     ` Pieter Wuille
@ 2012-06-19 15:05     ` Gavin Andresen
  2012-06-19 16:06       ` Mike Hearn
  1 sibling, 1 reply; 12+ messages in thread
From: Gavin Andresen @ 2012-06-19 15:05 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

> OK, to make progress on this work I need a few decisions (Gavin?)
>
> 1) Shall we do it?

What problem does it solve?

If the problem it will solve is "it will only take 4 hours to download
the entire blockchain next year instead of taking 16 hours" then no, I
don't think we should do it, both 4 and 16 hours to get fully up and
running is too long.

If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
error" because bdb is fragile when it comes to its environment... then
LevelDB looks very interesting.

If the problem is bdb is creaky and old and has obscure semantics and
a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
a pretty API and blazing fast performance).

> 2) LevelDB is obscure, new and has a very minimalist build system. It
> supports "make" but not "make install", for example, and is unlikely
> to be packaged. It's also not very large. I suggest we just check the
> source into the main Bitcoin tree and link it statically rather than
> complicate the build.

As long as it compiles and runs on mac/windows/linux that doesn't
really worry me. I just tried it, and it compiled quickly with no
complaints on my mac.

Lack of infrastructure because it is new does worry me; for example,
could I rework bitcointools to read the LevelDB blockchain?  (are
there python bindings for LevelDB?)

> 3) As the DB format would change and a slow migration period
> necessary, any other tweaks to db format we could make at the same
> time? Right now the key/values are the same as before, though using
> satoshi serialization for everything is a bit odd.

Satoshi rolled his own network serialization because he didn't trust
existing serialization solutions to be 100% secure against remote
exploits. Then it made sense to use the same solution for disk
serialization; I don't see a compelling reason to switch to some other
serialization scheme.

Modifying the database schema during migration to better support
applications like InstaWallet (tens of thousands of separate wallets)
or something like Pieter's ultra-pruning makes sense.

-- 
--
Gavin Andresen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-19 15:05     ` Gavin Andresen
@ 2012-06-19 16:06       ` Mike Hearn
  2012-06-19 19:22         ` Stefan Thomas
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Hearn @ 2012-06-19 16:06 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

> What problem does it solve?

Primarily that block verification and therefore propagation is too
slow because it's very CPU and IO intensive. The CPU work can be
multi-threaded. The IO work, not as much. As Bitcoin grows we need to
scale the nodes. Eventually there may be multi-machine nodes, but for
now we can buy more time by making the existing nodes faster.

I don't see this as a replacement for moving users to SPV clients.
Obviously, otherwise I would not be writing one ;)

> If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
> error" because bdb is fragile when it comes to its environment... then
> LevelDB looks very interesting.

I have no experience with how robust LevelDB is. It has an API call to
try and repair the database and I know from experience that BigTable
is pretty solid. But that doesn't mean LevelDB is.

> If the problem is bdb is creaky and old and has obscure semantics and
> a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
> a pretty API and blazing fast performance).

The code is a lot simpler for sure.

> As long as it compiles and runs on mac/windows/linux that doesn't
> really worry me.

It was refactored out of BigTable and made standalone for usage in
Chrome. Therefore it's as portable as Chrome is. Mac/Windows/Linux
should all work. Solaris, I believe, may need 64 bit binaries to avoid
low FD limits.

> Lack of infrastructure because it is new does worry me; for example,
> could I rework bitcointools to read the LevelDB blockchain?  (are
> there python bindings for LevelDB?)

Yes: http://code.google.com/p/py-leveldb/

First look at the code is here, but it's not ready for a pull req yet,
and I'll force push over it a few times to get it into shape. So don't
branch:

https://github.com/mikehearn/bitcoin/commit/2b601dd4a0093f834084241735d84d84e484f183

It has misc other changes I made whilst profiling, isn't well
commented enough, etc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-19 16:06       ` Mike Hearn
@ 2012-06-19 19:22         ` Stefan Thomas
  2012-06-20  9:44           ` Mike Hearn
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Thomas @ 2012-06-19 19:22 UTC (permalink / raw)
  To: bitcoin-development

Here are my 2 cents after using LevelDB as the default backend for
BitcoinJS for about a year.

LevelDB was written to power IndexedDB in Chrome which is a JavaScript
API. That means that LevelDB doesn't really give you a lot of options,
because they assume that on the C++ layer you don't know any more than
they do, because the actual application is on the JavaScript layer. For
example whereas BDB supports hashtables, b-trees, queues, etc., LevelDB
uses one database type, LSM trees which is an ordered data structure
that is pretty good at everything.

Another gotcha was the number of file descriptors, LevelDB defaults to
1000 per DB. We originally used multiple DBs, one for each of the
indices, but it was easy enough to combine everything into one table,
thereby solving the fd issue. (Lowering the file descriptor limit also
works of course, but if you lower it too much, LevelDB will start to
spend a lot of time opening and closing files, so I believe combining
your tables into one is the better option.)

Overall, LevelDB is a fantastic solution for desktop software that is
faced with multiple use cases that aren't known at compile time. It
isn't really designed for something like Bitcoin which doesn't need
ordered access, has relatively predictable characteristics and - at
least some of the time - runs on servers.

That said, it does seem to work well for the Bitcoin use case anyway.
Thanks to the LSM trees, It's very quick at doing bulk inserts and we
don't seem to need any of the bells and whistles that BDB offers. So I
can't think of a reason not to switch, just make sure you all understand
the deal, LevelDB unlike Tokyo/Kyoto Cabinet is *not* intended as a
competitor or replacement for BDB, it's something quite different.

On 6/19/2012 6:06 PM, Mike Hearn wrote:
>> What problem does it solve?
> Primarily that block verification and therefore propagation is too
> slow because it's very CPU and IO intensive. The CPU work can be
> multi-threaded. The IO work, not as much. As Bitcoin grows we need to
> scale the nodes. Eventually there may be multi-machine nodes, but for
> now we can buy more time by making the existing nodes faster.
>
> I don't see this as a replacement for moving users to SPV clients.
> Obviously, otherwise I would not be writing one ;)
>
>> If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
>> error" because bdb is fragile when it comes to its environment... then
>> LevelDB looks very interesting.
> I have no experience with how robust LevelDB is. It has an API call to
> try and repair the database and I know from experience that BigTable
> is pretty solid. But that doesn't mean LevelDB is.
>
>> If the problem is bdb is creaky and old and has obscure semantics and
>> a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
>> a pretty API and blazing fast performance).
> The code is a lot simpler for sure.
>
>> As long as it compiles and runs on mac/windows/linux that doesn't
>> really worry me.
> It was refactored out of BigTable and made standalone for usage in
> Chrome. Therefore it's as portable as Chrome is. Mac/Windows/Linux
> should all work. Solaris, I believe, may need 64 bit binaries to avoid
> low FD limits.
>
>> Lack of infrastructure because it is new does worry me; for example,
>> could I rework bitcointools to read the LevelDB blockchain?  (are
>> there python bindings for LevelDB?)
> Yes: http://code.google.com/p/py-leveldb/
>
> First look at the code is here, but it's not ready for a pull req yet,
> and I'll force push over it a few times to get it into shape. So don't
> branch:
>
> https://github.com/mikehearn/bitcoin/commit/2b601dd4a0093f834084241735d84d84e484f183
>
> It has misc other changes I made whilst profiling, isn't well
> commented enough, etc.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-19 19:22         ` Stefan Thomas
@ 2012-06-20  9:44           ` Mike Hearn
  2012-06-20  9:53             ` Mike Hearn
  2012-06-20 11:37             ` Pieter Wuille
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Hearn @ 2012-06-20  9:44 UTC (permalink / raw)
  To: Stefan Thomas; +Cc: bitcoin-development

Thanks, I didn't realize BitcoinJS used LevelDB already.

Just one minor thing - LevelDB was definitely designed for servers, as
it comes from BigTable. It happens to be used in Chrome today, and
that was the motivation for open sourcing it, but that's not where the
design came from.

If anything it's going to get less and less optimal for desktops and
laptops over time because they're moving towards SSDs, where the
minimal-seeks design of LevelDB doesn't necessarily help. Servers are
moving too of course but I anticipate most Bitcoin nodes on servers to
be HDD based for the forseeable future.

Also, Satoshis code does use ordered access/iteration in at least one
place, where it looks up the "owner transactions" of a tx. I'm not
totally sure what that code is used for, but it's there. Whether it's
actually the best way to solve the problem is another question :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-20  9:44           ` Mike Hearn
@ 2012-06-20  9:53             ` Mike Hearn
  2012-06-20 11:37             ` Pieter Wuille
  1 sibling, 0 replies; 12+ messages in thread
From: Mike Hearn @ 2012-06-20  9:53 UTC (permalink / raw)
  To: Stefan Thomas; +Cc: bitcoin-development

There's an interesting post here about block propagation times:

https://bitcointalk.org/index.php?topic=88302.msg975343#msg975343

Looks like the regular network is reliably 0-60 seconds behind p2pool
in propagating new blocks.

So optimizing IO load (and after that, threading tx verification)
seems like an important win. Lukes preview functionality would also be
useful.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-20  9:44           ` Mike Hearn
  2012-06-20  9:53             ` Mike Hearn
@ 2012-06-20 11:37             ` Pieter Wuille
  2012-06-20 12:41               ` Mike Hearn
  1 sibling, 1 reply; 12+ messages in thread
From: Pieter Wuille @ 2012-06-20 11:37 UTC (permalink / raw)
  To: Mike Hearn; +Cc: bitcoin-development

On Wed, Jun 20, 2012 at 11:44:48AM +0200, Mike Hearn wrote:
> Also, Satoshis code does use ordered access/iteration in at least one
> place, where it looks up the "owner transactions" of a tx. I'm not
> totally sure what that code is used for, but it's there. Whether it's
> actually the best way to solve the problem is another question :-)

Two days ago on #bitcoin-dev:
21:01:19< sipa> what was CTxDB::ReadOwnerTxes ever used for?
21:01:31< sipa> maybe it predates the wallet logic

(read: it's not used anywhere in the code, and apparently wasn't ever, even in 0.1.5)

-- 
Pieter




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-20 11:37             ` Pieter Wuille
@ 2012-06-20 12:41               ` Mike Hearn
  2012-06-25 16:32                 ` Mike Hearn
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Hearn @ 2012-06-20 12:41 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: bitcoin-development

> Two days ago on #bitcoin-dev:
> 21:01:19< sipa> what was CTxDB::ReadOwnerTxes ever used for?
> 21:01:31< sipa> maybe it predates the wallet logic
>
> (read: it's not used anywhere in the code, and apparently wasn't ever, even in 0.1.5)

Great, in that case Stefan is right and I'll delete that code when I
next work on the patch.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-20 12:41               ` Mike Hearn
@ 2012-06-25 16:32                 ` Mike Hearn
  2012-07-21 18:49                   ` Mike Hearn
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Hearn @ 2012-06-25 16:32 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: bitcoin-development

I've added some more commits:

https://github.com/mikehearn/bitcoin/commits/leveldb

It's still not ready for a pull req but is a lot closer:

1) Auto-migration is there but not well tested enough (I only tested
with empty wallets).
2) Migration progress UI is there so you have something to watch for
the few minutes it takes. Script execution is disabled during
migration
3) LevelDB source is checked in to the main tree, bitcoin-qt.pro
updated to use it
4) LevelDB is conditionally compiled so if there's some unexpected
issue or regression on some platform it can be switched back to BDB

Still to go:

1) More testing, eg, with actual wallets :-)
2) Update the non-Qt makefiles
3) On Windows it's currently de-activated due to some missing files
from leveldb + I didn't test it

If you want to help out, some testing and makefile work would be
useful. I may not get a chance to work on this again until next week.

On Wed, Jun 20, 2012 at 2:41 PM, Mike Hearn <mike@plan99.net> wrote:
>> Two days ago on #bitcoin-dev:
>> 21:01:19< sipa> what was CTxDB::ReadOwnerTxes ever used for?
>> 21:01:31< sipa> maybe it predates the wallet logic
>>
>> (read: it's not used anywhere in the code, and apparently wasn't ever, even in 0.1.5)
>
> Great, in that case Stefan is right and I'll delete that code when I
> next work on the patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bitcoin-development] LevelDB benchmarking
  2012-06-25 16:32                 ` Mike Hearn
@ 2012-07-21 18:49                   ` Mike Hearn
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Hearn @ 2012-07-21 18:49 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: bitcoin-development

Stefan went and finished off this work by bringing it up on Windows,
so now there's a pull req for it:

  https://github.com/bitcoin/bitcoin/pull/1619



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-21 18:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-18 18:41 [Bitcoin-development] LevelDB benchmarking Mike Hearn
     [not found] ` <CAAS2fgTNqUeYy+oEFyQWrfs4Xyb=3NXutvCmLusknF-18JmFQg@mail.gmail.com>
2012-06-19  9:05   ` Mike Hearn
2012-06-19 11:38     ` Pieter Wuille
2012-06-19 15:05     ` Gavin Andresen
2012-06-19 16:06       ` Mike Hearn
2012-06-19 19:22         ` Stefan Thomas
2012-06-20  9:44           ` Mike Hearn
2012-06-20  9:53             ` Mike Hearn
2012-06-20 11:37             ` Pieter Wuille
2012-06-20 12:41               ` Mike Hearn
2012-06-25 16:32                 ` Mike Hearn
2012-07-21 18:49                   ` Mike Hearn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox