* [Bitcoin-development] bitcoind stops responding
@ 2013-09-30 20:44 slush
2013-09-30 21:01 ` Warren Togami Jr.
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: slush @ 2013-09-30 20:44 UTC (permalink / raw)
To: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 612 bytes --]
Hi,
during several weeks I'm observing more and more frequent issues with
bitcoind. The problem is that bitcoind stops responding to RPC calls, but
there's no other suspicious activity in bitcoind log, CPU usage is low,
disk I/O is standard etc.
I observed this problem with version 0.8.2, but it is still happening with
0.8.5. Originally this happen just one or twice per day. Today my
monitoring scripts restarted bitcoind more than 30x, which sounds alarming.
This happen on various backends, so it isn't a problem of one specific
node. Is there anybody else who's observing similar problem?
Thanks,
slush
[-- Attachment #2: Type: text/html, Size: 741 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush
@ 2013-09-30 21:01 ` Warren Togami Jr.
2013-09-30 22:00 ` Fatima Castiglione Maldonado 发
2013-10-01 1:26 ` Jeff Garzik
2013-10-01 0:20 ` Chris Double
` (2 subsequent siblings)
3 siblings, 2 replies; 16+ messages in thread
From: Warren Togami Jr. @ 2013-09-30 21:01 UTC (permalink / raw)
To: slush; +Cc: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]
https://github.com/litecoin-project/litecoin/issues/67
0.8.2 apparently was the first Bitcoin version to support RPC keepalive.
With the 4 RPC thread limit, four keepalive connections will exhaust all
four and prevent further connections. This issue describes a workaround
where you build with more RPC threads.
On Mon, Sep 30, 2013 at 10:44 AM, slush <slush@centrum.cz> wrote:
> Hi,
>
> during several weeks I'm observing more and more frequent issues with
> bitcoind. The problem is that bitcoind stops responding to RPC calls, but
> there's no other suspicious activity in bitcoind log, CPU usage is low,
> disk I/O is standard etc.
>
> I observed this problem with version 0.8.2, but it is still happening with
> 0.8.5. Originally this happen just one or twice per day. Today my
> monitoring scripts restarted bitcoind more than 30x, which sounds alarming.
> This happen on various backends, so it isn't a problem of one specific
> node. Is there anybody else who's observing similar problem?
>
> Thanks,
> slush
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
[-- Attachment #2: Type: text/html, Size: 2449 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 21:01 ` Warren Togami Jr.
@ 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发
2013-09-30 22:08 ` Gregory Maxwell
2013-10-01 1:26 ` Jeff Garzik
1 sibling, 1 reply; 16+ messages in thread
From: Fatima Castiglione Maldonado 发 @ 2013-09-30 22:00 UTC (permalink / raw)
To: Warren Togami Jr.; +Cc: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 4210 bytes --]
Hi,
I am new on the list. I got a similar problem.
If I put "sendToAdress" transactions to bitcoind, it will accept between 1
and 3 transactions per minute, depending on the underlying machine.
If I try to send one transaction every 20 seconds or less, bitcoind stops
responding to RPC calls.
Does anybody know of any solution / fix / workaround for this?
Thanks in advance.
2013/9/30 Warren Togami Jr. <wtogami@gmail.com>
> https://github.com/litecoin-project/litecoin/issues/67
> 0.8.2 apparently was the first Bitcoin version to support RPC keepalive.
> With the 4 RPC thread limit, four keepalive connections will exhaust all
> four and prevent further connections. This issue describes a workaround
> where you build with more RPC threads.
>
>
> On Mon, Sep 30, 2013 at 10:44 AM, slush <slush@centrum.cz> wrote:
>
>> Hi,
>>
>> during several weeks I'm observing more and more frequent issues with
>> bitcoind. The problem is that bitcoind stops responding to RPC calls, but
>> there's no other suspicious activity in bitcoind log, CPU usage is low,
>> disk I/O is standard etc.
>>
>> I observed this problem with version 0.8.2, but it is still happening
>> with 0.8.5. Originally this happen just one or twice per day. Today my
>> monitoring scripts restarted bitcoind more than 30x, which sounds alarming.
>> This happen on various backends, so it isn't a problem of one specific
>> node. Is there anybody else who's observing similar problem?
>>
>> Thanks,
>> slush
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
--
=================================
Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com
____
,'_ |
__________________|__|__|__
<_____ ) _.------._
`-----------,------.-' ,-' `-.
| | | ,' `.
,' | | ,' `.
| _,-' |__ / \
_,'-' `. `---.___|_____________ \
.--' -----. | _____________________ `-. ----- |
| ___| | | \ ,- \ |
| ___| |===========================((|) | |
| | | | _____________________/ `- / |
`--._ -----' | _________________,-' ----- |
`.-._ ,' __.---' | /
| `-. | \ /
`. | | `. ,'
| | | `. ,'
_____,----------`-------`-. `-._ _,-'
<___________________________) `------'
| _| |
`.____|
=================================
[-- Attachment #2: Type: text/html, Size: 6241 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 22:00 ` Fatima Castiglione Maldonado 发
@ 2013-09-30 22:08 ` Gregory Maxwell
2013-10-01 0:05 ` Fatima Castiglione Maldonado 发
0 siblings, 1 reply; 16+ messages in thread
From: Gregory Maxwell @ 2013-09-30 22:08 UTC (permalink / raw)
To: Fatima Castiglione Maldonado 发; +Cc: bitcoin-development
On Mon, Sep 30, 2013 at 3:00 PM, Fatima Castiglione Maldonado 发
<castiglionemaldonado@gmail.com> wrote:
> I am new on the list. I got a similar problem.
> If I put "sendToAdress" transactions to bitcoind, it will accept between 1
> and 3 transactions per minute, depending on the underlying machine.
> If I try to send one transaction every 20 seconds or less, bitcoind stops
> responding to RPC calls.
>
> Does anybody know of any solution / fix / workaround for this?
This is unrelated to Slush's question.
The complexity of IsConfirmed/Ismine is exponential and starts taking
tens of seconds at a chain of a dozen unconfirmed transactions (and
growing from there)
There are some patches that change this, but since the whole network
will only average about 7tx per second, you're probably doing
something wrong if you're building great big chains of unconfirmed
transaction. Are you aware of sendmany?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 22:08 ` Gregory Maxwell
@ 2013-10-01 0:05 ` Fatima Castiglione Maldonado 发
0 siblings, 0 replies; 16+ messages in thread
From: Fatima Castiglione Maldonado 发 @ 2013-10-01 0:05 UTC (permalink / raw)
To: Gregory Maxwell; +Cc: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]
Oh, excuse me; it looked related to me.
It was not at all my intention to mess down the thread; on the contrary I
was trying to be of help and at the same time getting an answer.
No, I must recognize that I was not aware of sendmany. This is my first
time managing bulk transactions.
Thank you very much for your help.
I lead a group of developers and we are trying to be of help to the Bitcoin
community at large, and to this group.
Kindest regards,
Fatima
2013/9/30 Gregory Maxwell <gmaxwell@gmail.com>
> On Mon, Sep 30, 2013 at 3:00 PM, Fatima Castiglione Maldonado 发
> <castiglionemaldonado@gmail.com> wrote:
> > I am new on the list. I got a similar problem.
> > If I put "sendToAdress" transactions to bitcoind, it will accept between
> 1
> > and 3 transactions per minute, depending on the underlying machine.
> > If I try to send one transaction every 20 seconds or less, bitcoind stops
> > responding to RPC calls.
> >
> > Does anybody know of any solution / fix / workaround for this?
>
> This is unrelated to Slush's question.
>
> The complexity of IsConfirmed/Ismine is exponential and starts taking
> tens of seconds at a chain of a dozen unconfirmed transactions (and
> growing from there)
>
> There are some patches that change this, but since the whole network
> will only average about 7tx per second, you're probably doing
> something wrong if you're building great big chains of unconfirmed
> transaction. Are you aware of sendmany?
>
--
=================================
Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com
____
,'_ |
__________________|__|__|__
<_____ ) _.------._
`-----------,------.-' ,-' `-.
| | | ,' `.
,' | | ,' `.
| _,-' |__ / \
_,'-' `. `---.___|_____________ \
.--' -----. | _____________________ `-. ----- |
| ___| | | \ ,- \ |
| ___| |===========================((|) | |
| | | | _____________________/ `- / |
`--._ -----' | _________________,-' ----- |
`.-._ ,' __.---' | /
| `-. | \ /
`. | | `. ,'
| | | `. ,'
_____,----------`-------`-. `-._ _,-'
<___________________________) `------'
| _| |
`.____|
=================================
[-- Attachment #2: Type: text/html, Size: 3962 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush
2013-09-30 21:01 ` Warren Togami Jr.
@ 2013-10-01 0:20 ` Chris Double
2013-10-01 1:17 ` Jeff Garzik
2013-10-01 4:03 ` Olivier Langlois
3 siblings, 0 replies; 16+ messages in thread
From: Chris Double @ 2013-10-01 0:20 UTC (permalink / raw)
To: bitcoin-development
slush <slush@centrum.cz> writes:
> I observed this problem with version 0.8.2, but it is still happening with
> 0.8.5.
I see this as well with 0.8.2+. I don't see it on 0.8.1. I originally
hit the '4 keep alive thread limit' where four clients on the same
bitcoind would result in other clients not being able to connect. I
resolved that and still hit the RPC hanging issue. The most problem I've
had is using 'getwork' which I believe is going the wayside anyway.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush
2013-09-30 21:01 ` Warren Togami Jr.
2013-10-01 0:20 ` Chris Double
@ 2013-10-01 1:17 ` Jeff Garzik
2013-10-01 8:58 ` slush
2013-10-01 4:03 ` Olivier Langlois
3 siblings, 1 reply; 16+ messages in thread
From: Jeff Garzik @ 2013-10-01 1:17 UTC (permalink / raw)
To: slush; +Cc: bitcoin-development
Can you please describe more than "RPC stops working"? What is your
precise RPC usage? getwork? getblocktemplate? other calls? What is
your OS?
On Mon, Sep 30, 2013 at 10:44 PM, slush <slush@centrum.cz> wrote:
> Hi,
>
> during several weeks I'm observing more and more frequent issues with
> bitcoind. The problem is that bitcoind stops responding to RPC calls, but
> there's no other suspicious activity in bitcoind log, CPU usage is low, disk
> I/O is standard etc.
>
> I observed this problem with version 0.8.2, but it is still happening with
> 0.8.5. Originally this happen just one or twice per day. Today my monitoring
> scripts restarted bitcoind more than 30x, which sounds alarming. This happen
> on various backends, so it isn't a problem of one specific node. Is there
> anybody else who's observing similar problem?
>
> Thanks,
> slush
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
--
Jeff Garzik
Senior Software Engineer and open source evangelist
BitPay, Inc. https://bitpay.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 21:01 ` Warren Togami Jr.
2013-09-30 22:00 ` Fatima Castiglione Maldonado 发
@ 2013-10-01 1:26 ` Jeff Garzik
1 sibling, 0 replies; 16+ messages in thread
From: Jeff Garzik @ 2013-10-01 1:26 UTC (permalink / raw)
To: Warren Togami Jr.; +Cc: bitcoin-development
On Mon, Sep 30, 2013 at 11:01 PM, Warren Togami Jr. <wtogami@gmail.com> wrote:
> 0.8.2 apparently was the first Bitcoin version to support RPC keepalive.
No, this is not correct at all. RPC keepalive was present in 0.7.0,
possibly earlier.
Come on, it took a 30 second 'git checkout' session to verify this.
> With the 4 RPC thread limit, four keepalive connections will exhaust all four and prevent further connections.
I was about to flame you with a "WTF is this fiction?" but it seems
true: 21eb5ada introduces this.
Unfortunately, it does so erroneously, introducing clearly buggy
behavior that did not exist with the thread-per-connection code that
provided keep-alive before commit 21eb5ada.
In my opinion, 21eb5ada half-changed the RPC code from
thread-per-connection to a worker-group model, without considering all
the consequences.
--
Jeff Garzik
Senior Software Engineer and open source evangelist
BitPay, Inc. https://bitpay.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush
` (2 preceding siblings ...)
2013-10-01 1:17 ` Jeff Garzik
@ 2013-10-01 4:03 ` Olivier Langlois
2013-10-01 7:10 ` Olivier Langlois
3 siblings, 1 reply; 16+ messages in thread
From: Olivier Langlois @ 2013-10-01 4:03 UTC (permalink / raw)
To: slush; +Cc: bitcoin-development
On Mon, 2013-09-30 at 22:44 +0200, slush wrote:
> Hi,
>
>
> during several weeks I'm observing more and more frequent issues with
> bitcoind. The problem is that bitcoind stops responding to RPC calls,
> but there's no other suspicious activity in bitcoind log, CPU usage is
> low, disk I/O is standard etc.
>
>
> I observed this problem with version 0.8.2, but it is still happening
> with 0.8.5. Originally this happen just one or twice per day. Today my
> monitoring scripts restarted bitcoind more than 30x, which sounds
> alarming. This happen on various backends, so it isn't a problem of
> one specific node. Is there anybody else who's observing similar
> problem?
What a coincidence. I do have observed the same thing. right now with
0.8.5. I am writing a small app. My jsonrpc client is programmed to
timeout after 2 secs and I did see a couple of timeouts once in while.
What I did is a simple test app that just hammer bitcoind with 3 rpc
requests every 30 seconds and I abort it as soon as it encountered a
timeout.
The 3 request burst is performed on the same HTTP 1.1 kept alive
connection. Then I disconnect. When I launch my app before leaving in
the morning, pretty sure that I have a core dump waiting for me when I
come back.
I choose very simple calls: getinfo,getaccount
Added a couple of traces in the RPC handling code. (BTW, timestamps in
traces would be tremendously useful for tracking problems...). I see my
request received by bitcoind but there is no trace yet to show that the
reply is sent.
Not sure yet exactly where the problem is but my current #1 suspect is:
LOCK2(cs_main, pwalletMain->cs_wallet);
with some kind of lock contention with the other threads.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-01 4:03 ` Olivier Langlois
@ 2013-10-01 7:10 ` Olivier Langlois
2013-10-01 14:17 ` Jeff Garzik
0 siblings, 1 reply; 16+ messages in thread
From: Olivier Langlois @ 2013-10-01 7:10 UTC (permalink / raw)
To: slush; +Cc: bitcoin-development
>
> Not sure yet exactly where the problem is but my current #1 suspect is:
>
> LOCK2(cs_main, pwalletMain->cs_wallet);
>
> with some kind of lock contention with the other threads.
>
I was right. It took more than 6 seconds to acquire the locks
I did modify bitcoinrpc.cpp:
namespace {
struct timeval difftv( const struct timeval &tv1, const struct timeval
&tv2 )
{
struct timeval res;
res.tv_sec = tv1.tv_sec - tv2.tv_sec;
if (tv2.tv_usec > tv1.tv_usec) {
res.tv_sec--;
res.tv_usec = 1000000;
} else
res.tv_usec = 0;
res.tv_usec += tv1.tv_usec;
res.tv_usec -= tv2.tv_usec;
return res;
}
void printExecTimes( const struct timeval &tv1,
const struct timeval &tv2,
const struct timeval &tv3 )
{
struct timeval lockTime = difftv(tv2,tv1);
struct timeval callTime = difftv(tv3,tv2);
struct timeval totalTime = difftv(tv3,tv1);
printf( "locktime : %ld.%06ld calltime : %ld.%06ld totaltime : %
ld.%06ld\n",
lockTime.tv_sec,lockTime.tv_usec,callTime.tv_sec,callTime.tv_usec,
totalTime.tv_sec,totalTime.tv_usec);
}
}
json_spirit::Value CRPCTable::execute(const std::string &strMethod,
const json_spirit::Array ¶ms) const
{
// Find method
const CRPCCommand *pcmd = tableRPC[strMethod];
if (!pcmd)
throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Method not found");
// Observe safe mode
string strWarning = GetWarnings("rpc");
if (strWarning != "" && !GetBoolArg("-disablesafemode") &&
!pcmd->okSafeMode)
throw JSONRPCError(RPC_FORBIDDEN_BY_SAFE_MODE, string("Safe
mode: ") + strWarning);
try
{
// Execute
Value result;
{
if (pcmd->threadSafe)
result = pcmd->actor(params, false);
else {
struct timeval tv1,tv2,tv3;
gettimeofday(&tv1,NULL); {
LOCK2(cs_main, pwalletMain->cs_wallet);
gettimeofday(&tv2,NULL);
result = pcmd->actor(params, false); }
gettimeofday(&tv3,NULL);
printExecTimes(tv1,tv2,tv3);
}
}
return result;
}
catch (std::exception& e)
{
throw JSONRPCError(RPC_MISC_ERROR, e.what());
}
}
locktime : 0.000001 calltime : 0.000153 totaltime : 0.000154
locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
locktime : 0.000000 calltime : 0.000451 totaltime : 0.000451
locktime : 0.000000 calltime : 0.000313 totaltime : 0.000313
locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
locktime : 0.051574 calltime : 0.000377 totaltime : 0.051951
locktime : 0.000000 calltime : 0.000222 totaltime : 0.000222
locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
locktime : 0.121106 calltime : 0.000471 totaltime : 0.121577
locktime : 0.078093 calltime : 0.000451 totaltime : 0.078544
locktime : 0.101185 calltime : 0.000021 totaltime : 0.101206
locktime : 0.000000 calltime : 0.000476 totaltime : 0.000476
locktime : 0.000001 calltime : 0.000291 totaltime : 0.000292
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.003025 calltime : 0.000036 totaltime : 0.003061
locktime : 0.000000 calltime : 0.000383 totaltime : 0.000383
locktime : 0.000000 calltime : 0.000210 totaltime : 0.000210
locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
locktime : 0.000000 calltime : 0.000470 totaltime : 0.000470
locktime : 0.000000 calltime : 0.000295 totaltime : 0.000295
locktime : 0.000000 calltime : 0.000020 totaltime : 0.000020
locktime : 0.000001 calltime : 0.000385 totaltime : 0.000386
locktime : 0.000000 calltime : 0.000241 totaltime : 0.000241
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000001 calltime : 0.000308 totaltime : 0.000309
locktime : 0.000000 calltime : 0.000164 totaltime : 0.000164
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000000 calltime : 0.000376 totaltime : 0.000376
locktime : 0.000000 calltime : 0.000356 totaltime : 0.000356
locktime : 0.000000 calltime : 0.000021 totaltime : 0.000021
locktime : 0.000000 calltime : 0.000496 totaltime : 0.000496
locktime : 0.000001 calltime : 0.000201 totaltime : 0.000202
locktime : 0.000001 calltime : 0.000017 totaltime : 0.000018
locktime : 0.000000 calltime : 0.000301 totaltime : 0.000301
locktime : 0.000000 calltime : 0.000180 totaltime : 0.000180
locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
locktime : 0.000001 calltime : 0.000359 totaltime : 0.000360
locktime : 0.000000 calltime : 0.000265 totaltime : 0.000265
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000001 calltime : 0.000488 totaltime : 0.000489
locktime : 0.000000 calltime : 0.000226 totaltime : 0.000226
locktime : 0.000000 calltime : 0.000025 totaltime : 0.000025
locktime : 0.000000 calltime : 0.000369 totaltime : 0.000369
locktime : 0.000000 calltime : 0.000262 totaltime : 0.000262
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000000 calltime : 0.000459 totaltime : 0.000459
locktime : 0.000000 calltime : 0.000260 totaltime : 0.000260
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000000 calltime : 0.000330 totaltime : 0.000330
locktime : 0.000000 calltime : 0.000223 totaltime : 0.000223
locktime : 0.000000 calltime : 0.000014 totaltime : 0.000014
locktime : 0.000001 calltime : 0.000449 totaltime : 0.000450
locktime : 0.000001 calltime : 0.000248 totaltime : 0.000249
locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
locktime : 0.000000 calltime : 0.000327 totaltime : 0.000327
locktime : 0.000000 calltime : 0.000196 totaltime : 0.000196
locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
locktime : 0.121170 calltime : 0.000408 totaltime : 0.121578
locktime : 0.004912 calltime : 0.000278 totaltime : 0.005190
locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-01 1:17 ` Jeff Garzik
@ 2013-10-01 8:58 ` slush
2013-10-04 6:22 ` Gavin Andresen
0 siblings, 1 reply; 16+ messages in thread
From: slush @ 2013-10-01 8:58 UTC (permalink / raw)
To: Jeff Garzik; +Cc: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 705 bytes --]
ad "RPC stops working":
* Client makes a 'getinfo' call and don't receive a response in a minute.
"What is your precise RPC usage? "
One process is asking getinfo every second as a fallback to possibly
misconfigured blocknotify. It also calls getblocktemplate every 30 second.
Second process is calling getinfo once a minute to check if bitcoind is
working. If it don't receive a response in a minute, it kills bitcoind and
starts it again.
That's all.
OS is Debian.
On Tue, Oct 1, 2013 at 3:17 AM, Jeff Garzik <jgarzik@bitpay.com> wrote:
> Can you please describe more than "RPC stops working"? What is your
> precise RPC usage? getwork? getblocktemplate? other calls? What is
> your OS?
>
>
[-- Attachment #2: Type: text/html, Size: 1565 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-01 7:10 ` Olivier Langlois
@ 2013-10-01 14:17 ` Jeff Garzik
2013-10-02 3:20 ` Olivier Langlois
0 siblings, 1 reply; 16+ messages in thread
From: Jeff Garzik @ 2013-10-01 14:17 UTC (permalink / raw)
To: Olivier Langlois; +Cc: bitcoin-development
Olivier,
What network activity was going on, during this test? i.e. during the
call with 5.99 locktime, was bitcoind processing a block or sending a
large transaction? There are plenty of valid reasons -- sadly -- that
the locks are held for a long time, during random network events.
On Tue, Oct 1, 2013 at 9:10 AM, Olivier Langlois <olivier@trillion01.com> wrote:
>
>>
>> Not sure yet exactly where the problem is but my current #1 suspect is:
>>
>> LOCK2(cs_main, pwalletMain->cs_wallet);
>>
>> with some kind of lock contention with the other threads.
>>
>
> I was right. It took more than 6 seconds to acquire the locks
>
> I did modify bitcoinrpc.cpp:
>
> namespace {
>
> struct timeval difftv( const struct timeval &tv1, const struct timeval
> &tv2 )
> {
> struct timeval res;
> res.tv_sec = tv1.tv_sec - tv2.tv_sec;
> if (tv2.tv_usec > tv1.tv_usec) {
> res.tv_sec--;
> res.tv_usec = 1000000;
> } else
> res.tv_usec = 0;
>
> res.tv_usec += tv1.tv_usec;
> res.tv_usec -= tv2.tv_usec;
>
> return res;
> }
>
> void printExecTimes( const struct timeval &tv1,
> const struct timeval &tv2,
> const struct timeval &tv3 )
> {
> struct timeval lockTime = difftv(tv2,tv1);
> struct timeval callTime = difftv(tv3,tv2);
> struct timeval totalTime = difftv(tv3,tv1);
> printf( "locktime : %ld.%06ld calltime : %ld.%06ld totaltime : %
> ld.%06ld\n",
>
> lockTime.tv_sec,lockTime.tv_usec,callTime.tv_sec,callTime.tv_usec,
> totalTime.tv_sec,totalTime.tv_usec);
> }
>
> }
>
> json_spirit::Value CRPCTable::execute(const std::string &strMethod,
> const json_spirit::Array ¶ms) const
> {
> // Find method
> const CRPCCommand *pcmd = tableRPC[strMethod];
> if (!pcmd)
> throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Method not found");
>
> // Observe safe mode
> string strWarning = GetWarnings("rpc");
> if (strWarning != "" && !GetBoolArg("-disablesafemode") &&
> !pcmd->okSafeMode)
> throw JSONRPCError(RPC_FORBIDDEN_BY_SAFE_MODE, string("Safe
> mode: ") + strWarning);
>
> try
> {
> // Execute
> Value result;
> {
> if (pcmd->threadSafe)
> result = pcmd->actor(params, false);
> else {
> struct timeval tv1,tv2,tv3;
> gettimeofday(&tv1,NULL); {
> LOCK2(cs_main, pwalletMain->cs_wallet);
> gettimeofday(&tv2,NULL);
> result = pcmd->actor(params, false); }
> gettimeofday(&tv3,NULL);
> printExecTimes(tv1,tv2,tv3);
> }
> }
> return result;
> }
> catch (std::exception& e)
> {
> throw JSONRPCError(RPC_MISC_ERROR, e.what());
> }
> }
>
> locktime : 0.000001 calltime : 0.000153 totaltime : 0.000154
> locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
> locktime : 0.000000 calltime : 0.000451 totaltime : 0.000451
> locktime : 0.000000 calltime : 0.000313 totaltime : 0.000313
> locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
> locktime : 0.051574 calltime : 0.000377 totaltime : 0.051951
> locktime : 0.000000 calltime : 0.000222 totaltime : 0.000222
> locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011
> locktime : 0.121106 calltime : 0.000471 totaltime : 0.121577
> locktime : 0.078093 calltime : 0.000451 totaltime : 0.078544
> locktime : 0.101185 calltime : 0.000021 totaltime : 0.101206
> locktime : 0.000000 calltime : 0.000476 totaltime : 0.000476
> locktime : 0.000001 calltime : 0.000291 totaltime : 0.000292
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.003025 calltime : 0.000036 totaltime : 0.003061
> locktime : 0.000000 calltime : 0.000383 totaltime : 0.000383
> locktime : 0.000000 calltime : 0.000210 totaltime : 0.000210
> locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
> locktime : 0.000000 calltime : 0.000470 totaltime : 0.000470
> locktime : 0.000000 calltime : 0.000295 totaltime : 0.000295
> locktime : 0.000000 calltime : 0.000020 totaltime : 0.000020
> locktime : 0.000001 calltime : 0.000385 totaltime : 0.000386
> locktime : 0.000000 calltime : 0.000241 totaltime : 0.000241
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000001 calltime : 0.000308 totaltime : 0.000309
> locktime : 0.000000 calltime : 0.000164 totaltime : 0.000164
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000000 calltime : 0.000376 totaltime : 0.000376
> locktime : 0.000000 calltime : 0.000356 totaltime : 0.000356
> locktime : 0.000000 calltime : 0.000021 totaltime : 0.000021
> locktime : 0.000000 calltime : 0.000496 totaltime : 0.000496
> locktime : 0.000001 calltime : 0.000201 totaltime : 0.000202
> locktime : 0.000001 calltime : 0.000017 totaltime : 0.000018
> locktime : 0.000000 calltime : 0.000301 totaltime : 0.000301
> locktime : 0.000000 calltime : 0.000180 totaltime : 0.000180
> locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
> locktime : 0.000001 calltime : 0.000359 totaltime : 0.000360
> locktime : 0.000000 calltime : 0.000265 totaltime : 0.000265
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000001 calltime : 0.000488 totaltime : 0.000489
> locktime : 0.000000 calltime : 0.000226 totaltime : 0.000226
> locktime : 0.000000 calltime : 0.000025 totaltime : 0.000025
> locktime : 0.000000 calltime : 0.000369 totaltime : 0.000369
> locktime : 0.000000 calltime : 0.000262 totaltime : 0.000262
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000000 calltime : 0.000459 totaltime : 0.000459
> locktime : 0.000000 calltime : 0.000260 totaltime : 0.000260
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000000 calltime : 0.000330 totaltime : 0.000330
> locktime : 0.000000 calltime : 0.000223 totaltime : 0.000223
> locktime : 0.000000 calltime : 0.000014 totaltime : 0.000014
> locktime : 0.000001 calltime : 0.000449 totaltime : 0.000450
> locktime : 0.000001 calltime : 0.000248 totaltime : 0.000249
> locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017
> locktime : 0.000000 calltime : 0.000327 totaltime : 0.000327
> locktime : 0.000000 calltime : 0.000196 totaltime : 0.000196
> locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
> locktime : 0.121170 calltime : 0.000408 totaltime : 0.121578
> locktime : 0.004912 calltime : 0.000278 totaltime : 0.005190
> locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016
> locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Senior Software Engineer and open source evangelist
BitPay, Inc. https://bitpay.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-01 14:17 ` Jeff Garzik
@ 2013-10-02 3:20 ` Olivier Langlois
2013-10-04 3:52 ` Olivier Langlois
0 siblings, 1 reply; 16+ messages in thread
From: Olivier Langlois @ 2013-10-02 3:20 UTC (permalink / raw)
To: Jeff Garzik; +Cc: bitcoin-development
On Tue, 2013-10-01 at 16:17 +0200, Jeff Garzik wrote:
> Olivier,
>
> What network activity was going on, during this test? i.e. during the
> call with 5.99 locktime, was bitcoind processing a block or sending a
> large transaction? There are plenty of valid reasons -- sadly -- that
> the locks are held for a long time, during random network events.
>
Jeff,
In your opinion what would be then a reasonable to value for a rpc client? From what you say, 2 seconds is apparently not enough.
For your question, unfortunately or fortunately, not much. The only thing that strike out is that CTxMemPool::accept() seems to be called while the rpc thread is between
jreq.parse() and tableRPC.execute().
Here are the logs surrounding
the 5.99 sec lock wait.
1380610602.706628 : RPC connection closed
Added 1 addresses from 85.202.231.133: 2633 tried, 12769 new
CTxMemPool::accept() : accepted 6191dee582c1c19fd76d6f06a4c594ee1577dbb54b688c5dcc676ebfbc225cda (poolsz 1262)
Added 1 addresses from 217.73.163.131: 2633 tried, 12769 new
CTxMemPool::accept() : accepted 7a482a24d2588a66c6561da6ee29b5adff4e8ba72e58fb360a2c36cd78be3a71 (poolsz 1263)
Added 1 addresses from 24.211.152.165: 2633 tried, 12770 new
received getdata for: tx 3849ab1481c9d8c283e682f2ef8421a8cf01ef228a89d0daf2d5e888e04f45e4
CTxMemPool::accept() : accepted cf34b31f26d3275b66adf081f84fca2d761e8e260b42243ba36638368c1633b1 (poolsz 1264)
CTxMemPool::accept() : accepted 61fc517d9461b79bab972593fbaf8b0be8ed0222c232b2c51eea2a781a44efef (poolsz 1265)
CTxMemPool::accept() : accepted 4a42913c0b956f27cfc72ed44605ef8654ff8720b81e6ef1c7dfab8e300ee9d9 (poolsz 1266)
CTxMemPool::accept() : accepted cec545fca0b4ad6efa44159758dd18b2e3047a1f1783812256a11aba96c32792 (poolsz 1267)
CTxMemPool::accept() : accepted 1ad7ff07ed4b8299af0f5ce61b721f20d984225a4a963174f51d329587eef2e9 (poolsz 1268)
CTxMemPool::accept() : accepted 12f6a6e453f34db7165983def2ecdfef174917194aa1b8f4c4883b02595c043c (poolsz 1269)
CTxMemPool::accept() : accepted 69942aebbf3d3e6b781e56553f0b1e021e07fe1c4d922b221cadabb7eb84abb7 (poolsz 1270)
CTxMemPool::accept() : accepted d7d95a6fc61209c8dbdf4b17a058f4f9c7fa6d6e76b36ca4cf8bb67f288ef3cc (poolsz 1271)
CTxMemPool::accept() : accepted da6c8145c9f506ec0f146b37d57ef423098b584af8dbed9490002ed900228c27 (poolsz 1272)
CTxMemPool::accept() : accepted 46debd7df211ea26060db539a909d7908f1f6fcca8103897909280ab156b8872 (poolsz 1273)
CTxMemPool::accept() : accepted 30f47bae4e485e1129e0598f2264469a4f9186d9a7988debfe7e21f283975c9f (poolsz 1274)
CTxMemPool::accept() : accepted 80848aebe10afda234c42db1ae3a013f873af54906d966afcb42547501d899c4 (poolsz 1275)
CTxMemPool::accept() : accepted faaf9a2b57e80460293f3d53115fc8795bb1357d0ae6400710bc8a6610312b81 (poolsz 1276)
CTxMemPool::accept() : accepted 44f5fc39cde3b35d32f20f8246001f946c9fdc1297ea21fd7400c0c483aceca2 (poolsz 1277)
CTxMemPool::accept() : accepted d92a8389cfa6a8ea98a8103a138f45e56cf229d9b35bd46ac319167d17409edd (poolsz 1278)
CTxMemPool::accept() : accepted 557e88dfed0d2e9cd3c8c1af2eb497ea5c0a888af7417f7b816390af1dd4f38a (poolsz 1279)
CTxMemPool::accept() : accepted 037b48b12209b046bed88806bf5a92e3cc45cc2e30af895b5536fd8ef2ae865f (poolsz 1280)
received getdata for: tx 1ad7ff07ed4b8299af0f5ce61b721f20d984225a4a963174f51d329587eef2e9
CTxMemPool::accept() : accepted cf8149016e4c25b0408a70dbb8f7bc596331de6840b9a3115a6aa5445cc2b3fe (poolsz 1281)
received getdata for: tx 12f6a6e453f34db7165983def2ecdfef174917194aa1b8f4c4883b02595c043c
received getdata for: tx 61fc517d9461b79bab972593fbaf8b0be8ed0222c232b2c51eea2a781a44efef
CTxMemPool::accept() : accepted 890b41943b2cda0b9d7e34ddfcddb490582165c0cedd8fca09df6eec37ad768d (poolsz 1282)
1380610633.387730: POST / HTTP/1.1^M
ThreadRPCServer method=getinfo
CTxMemPool::accept() : accepted dc2941dd69b2f9fa2754f741dfba76489abef706bd237e3bd715181950723e4d (poolsz 1283)
keypool reserve 15
keypool return 15
locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-02 3:20 ` Olivier Langlois
@ 2013-10-04 3:52 ` Olivier Langlois
0 siblings, 0 replies; 16+ messages in thread
From: Olivier Langlois @ 2013-10-04 3:52 UTC (permalink / raw)
To: Jeff Garzik; +Cc: bitcoin-development
> 1380610633.387730: POST / HTTP/1.1^M
> ThreadRPCServer method=getinfo
> CTxMemPool::accept() : accepted dc2941dd69b2f9fa2754f741dfba76489abef706bd237e3bd715181950723e4d (poolsz 1283)
> keypool reserve 15
> keypool return 15
> locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148
>
I have found another circumstance where getinfo rpc function can take
more than 2 secs to return. Here are the circunstances:
received block 0000000000000013c889fbe1b2a23eb4b6bd38a8051420be1bf319da184f02ae
Committing 1015 changed transactions to coin database...
SetBestChain: new best=0000000000000013c889fbe1b2a23eb4b6bd38a8051420be1bf319da184f02ae height=261213 log2_work=72.38679 tx=24743665 date=2013-10-02 05:26:49 progress=0.999999
{"method":"getinfo","params":[],"id":783}
ThreadRPCServer method=getinfo
keypool reserve 15
keypool return 15
locktime : 0.000001 calltime : 2.453037 totaltime : 2.453038
This time locking the contention isn't from cs_main, cs_wallet mutex
lock. I think that what happen here is
1. CWallet::SetBestChain is called
2. This trigger ThreadFlushWalletDB that will lock wallet db and flush
it
3. getinfo will have to wait that ThreadFlushWalletDB ends to complete
GetOldestKeyPoolTime().
Just throwing a couple of ideas here. Comments are welcome:
1.
Is closing the db and reopen it the only way to flush the db to disk?
Can it be done asynchonously? I don't know BDB much but probably closing
the DB involves some blocking fsync() call and given that this follow
1015 records writing into the txdb, it takes more time than it should
otherwise to return.
2.
I'm relatively new to bitcoin. I understand that in a recent past
everything was using BDB. Now that only the wallet seems to keep using
it, maybe some options could be removed to make BDB more lightweight.
DB_THREAD:
From what I see, BDB API access is already well synchronized by
bitcoind. There is no obvious reason to ask BDB API to be thread safe.
DB_INIT_TXN:
Correct me if I'm wrong, but I suspect this subsystem initialization to
be a vestige of when txdb was in BDB.
3.
not sure at all about this one but I'm throwing the idea anyway.
Personnally, I have no used for 'keypoololdest'. I might experiment
without it
How about having an optional bool param to 'getinfo' that could have the
name 'lowlatency' or 'nodb' to skip info gathering having the potential
to make the call longer?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-01 8:58 ` slush
@ 2013-10-04 6:22 ` Gavin Andresen
2013-10-04 15:05 ` Jeff Garzik
0 siblings, 1 reply; 16+ messages in thread
From: Gavin Andresen @ 2013-10-04 6:22 UTC (permalink / raw)
To: slush; +Cc: bitcoin-development
[-- Attachment #1: Type: text/plain, Size: 976 bytes --]
On Tue, Oct 1, 2013 at 6:58 PM, slush <slush@centrum.cz> wrote:
> One process is asking getinfo every second as a fallback to possibly
> misconfigured blocknotify. It also calls getblocktemplate every 30 second.
>
getinfo does a bunch of stuff; with 0.9 you will be able to use
getbestblockhash instead.
> Second process is calling getinfo once a minute to check if bitcoind is
> working. If it don't receive a response in a minute, it kills bitcoind and
> starts it again.
>
If you just want to see if bitcoind is responding to RPC requests, then
'help getinfo' would do the trick without acquiring any locks.
RE: running into the maximum-of-4-keepalive-requests : simple workaround is
to run with -rpcthreads=11 (or however many keepalive connections you need
to support). I agree that the rpc code should be smarter; making the last
rpc thread ignore keepalive and always disconnecting should be a fairly
simple patch, and "patches welcome."
--
--
Gavin Andresen
[-- Attachment #2: Type: text/html, Size: 1596 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding
2013-10-04 6:22 ` Gavin Andresen
@ 2013-10-04 15:05 ` Jeff Garzik
0 siblings, 0 replies; 16+ messages in thread
From: Jeff Garzik @ 2013-10-04 15:05 UTC (permalink / raw)
To: Gavin Andresen; +Cc: bitcoin-development
On Fri, Oct 4, 2013 at 2:22 AM, Gavin Andresen <gavinandresen@gmail.com> wrote:
> RE: running into the maximum-of-4-keepalive-requests : simple workaround is
> to run with -rpcthreads=11 (or however many keepalive connections you need
> to support). I agree that the rpc code should be smarter; making the last
> rpc thread ignore keepalive and always disconnecting should be a fairly
> simple patch, and "patches welcome."
It's all still working around a problem unchanged since Satoshi wrote
it: the HTTP server code paths use blocking I/O.
Amusingly, we do this through an async I/O library, which helps
facilitate SSL, but all our connections and operations are blocking.
That's why RPC was multi-threaded in part: to work around the ugly
blocking nature of the code. At least with multiple threads, one
thread will not stall another even if the network stalls (or a
software bug triggers a stall etc.)
Hopefully I can dive into the code and make is truly async I/O. It
takes some work, and I'm happy if someone else steals the task, but
that's what really needs to be done.
I tried the multi-threaded approach, writing an entire boost::asio
skeleton JSON-RPC HTTP server: https://github.com/jgarzik/rpcsrv
This is working, tested code that uses boost::asio properly. It's
also quite a bit of LOC, and a bit messy to boot (four LOC per boost
async action and incomprehensible compiler errors in return for proper
async+MT).
A single thread with async I/O is likely sufficient for even heavy
uses of RPC -- since today it all goes through a big lock anyway.
--
Jeff Garzik
Senior Software Engineer and open source evangelist
BitPay, Inc. https://bitpay.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-10-04 15:05 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush
2013-09-30 21:01 ` Warren Togami Jr.
2013-09-30 22:00 ` Fatima Castiglione Maldonado 发
2013-09-30 22:08 ` Gregory Maxwell
2013-10-01 0:05 ` Fatima Castiglione Maldonado 发
2013-10-01 1:26 ` Jeff Garzik
2013-10-01 0:20 ` Chris Double
2013-10-01 1:17 ` Jeff Garzik
2013-10-01 8:58 ` slush
2013-10-04 6:22 ` Gavin Andresen
2013-10-04 15:05 ` Jeff Garzik
2013-10-01 4:03 ` Olivier Langlois
2013-10-01 7:10 ` Olivier Langlois
2013-10-01 14:17 ` Jeff Garzik
2013-10-02 3:20 ` Olivier Langlois
2013-10-04 3:52 ` Olivier Langlois
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox