* [Bitcoin-development] bitcoind stops responding @ 2013-09-30 20:44 slush 2013-09-30 21:01 ` Warren Togami Jr. ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: slush @ 2013-09-30 20:44 UTC (permalink / raw) To: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 612 bytes --] Hi, during several weeks I'm observing more and more frequent issues with bitcoind. The problem is that bitcoind stops responding to RPC calls, but there's no other suspicious activity in bitcoind log, CPU usage is low, disk I/O is standard etc. I observed this problem with version 0.8.2, but it is still happening with 0.8.5. Originally this happen just one or twice per day. Today my monitoring scripts restarted bitcoind more than 30x, which sounds alarming. This happen on various backends, so it isn't a problem of one specific node. Is there anybody else who's observing similar problem? Thanks, slush [-- Attachment #2: Type: text/html, Size: 741 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush @ 2013-09-30 21:01 ` Warren Togami Jr. 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发 2013-10-01 1:26 ` Jeff Garzik 2013-10-01 0:20 ` Chris Double ` (2 subsequent siblings) 3 siblings, 2 replies; 16+ messages in thread From: Warren Togami Jr. @ 2013-09-30 21:01 UTC (permalink / raw) To: slush; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 1656 bytes --] https://github.com/litecoin-project/litecoin/issues/67 0.8.2 apparently was the first Bitcoin version to support RPC keepalive. With the 4 RPC thread limit, four keepalive connections will exhaust all four and prevent further connections. This issue describes a workaround where you build with more RPC threads. On Mon, Sep 30, 2013 at 10:44 AM, slush <slush@centrum.cz> wrote: > Hi, > > during several weeks I'm observing more and more frequent issues with > bitcoind. The problem is that bitcoind stops responding to RPC calls, but > there's no other suspicious activity in bitcoind log, CPU usage is low, > disk I/O is standard etc. > > I observed this problem with version 0.8.2, but it is still happening with > 0.8.5. Originally this happen just one or twice per day. Today my > monitoring scripts restarted bitcoind more than 30x, which sounds alarming. > This happen on various backends, so it isn't a problem of one specific > node. Is there anybody else who's observing similar problem? > > Thanks, > slush > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > > [-- Attachment #2: Type: text/html, Size: 2449 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 21:01 ` Warren Togami Jr. @ 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发 2013-09-30 22:08 ` Gregory Maxwell 2013-10-01 1:26 ` Jeff Garzik 1 sibling, 1 reply; 16+ messages in thread From: Fatima Castiglione Maldonado 发 @ 2013-09-30 22:00 UTC (permalink / raw) To: Warren Togami Jr.; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 4210 bytes --] Hi, I am new on the list. I got a similar problem. If I put "sendToAdress" transactions to bitcoind, it will accept between 1 and 3 transactions per minute, depending on the underlying machine. If I try to send one transaction every 20 seconds or less, bitcoind stops responding to RPC calls. Does anybody know of any solution / fix / workaround for this? Thanks in advance. 2013/9/30 Warren Togami Jr. <wtogami@gmail.com> > https://github.com/litecoin-project/litecoin/issues/67 > 0.8.2 apparently was the first Bitcoin version to support RPC keepalive. > With the 4 RPC thread limit, four keepalive connections will exhaust all > four and prevent further connections. This issue describes a workaround > where you build with more RPC threads. > > > On Mon, Sep 30, 2013 at 10:44 AM, slush <slush@centrum.cz> wrote: > >> Hi, >> >> during several weeks I'm observing more and more frequent issues with >> bitcoind. The problem is that bitcoind stops responding to RPC calls, but >> there's no other suspicious activity in bitcoind log, CPU usage is low, >> disk I/O is standard etc. >> >> I observed this problem with version 0.8.2, but it is still happening >> with 0.8.5. Originally this happen just one or twice per day. Today my >> monitoring scripts restarted bitcoind more than 30x, which sounds alarming. >> This happen on various backends, so it isn't a problem of one specific >> node. Is there anybody else who's observing similar problem? >> >> Thanks, >> slush >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bitcoin-development mailing list >> Bitcoin-development@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bitcoin-development >> >> > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > > -- ================================= Fátima Castiglione Maldonado castiglionemaldonado@gmail.com ____ ,'_ | __________________|__|__|__ <_____ ) _.------._ `-----------,------.-' ,-' `-. | | | ,' `. ,' | | ,' `. | _,-' |__ / \ _,'-' `. `---.___|_____________ \ .--' -----. | _____________________ `-. ----- | | ___| | | \ ,- \ | | ___| |===========================((|) | | | | | | _____________________/ `- / | `--._ -----' | _________________,-' ----- | `.-._ ,' __.---' | / | `-. | \ / `. | | `. ,' | | | `. ,' _____,----------`-------`-. `-._ _,-' <___________________________) `------' | _| | `.____| ================================= [-- Attachment #2: Type: text/html, Size: 6241 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发 @ 2013-09-30 22:08 ` Gregory Maxwell 2013-10-01 0:05 ` Fatima Castiglione Maldonado 发 0 siblings, 1 reply; 16+ messages in thread From: Gregory Maxwell @ 2013-09-30 22:08 UTC (permalink / raw) To: Fatima Castiglione Maldonado 发; +Cc: bitcoin-development On Mon, Sep 30, 2013 at 3:00 PM, Fatima Castiglione Maldonado 发 <castiglionemaldonado@gmail.com> wrote: > I am new on the list. I got a similar problem. > If I put "sendToAdress" transactions to bitcoind, it will accept between 1 > and 3 transactions per minute, depending on the underlying machine. > If I try to send one transaction every 20 seconds or less, bitcoind stops > responding to RPC calls. > > Does anybody know of any solution / fix / workaround for this? This is unrelated to Slush's question. The complexity of IsConfirmed/Ismine is exponential and starts taking tens of seconds at a chain of a dozen unconfirmed transactions (and growing from there) There are some patches that change this, but since the whole network will only average about 7tx per second, you're probably doing something wrong if you're building great big chains of unconfirmed transaction. Are you aware of sendmany? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 22:08 ` Gregory Maxwell @ 2013-10-01 0:05 ` Fatima Castiglione Maldonado 发 0 siblings, 0 replies; 16+ messages in thread From: Fatima Castiglione Maldonado 发 @ 2013-10-01 0:05 UTC (permalink / raw) To: Gregory Maxwell; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 2879 bytes --] Oh, excuse me; it looked related to me. It was not at all my intention to mess down the thread; on the contrary I was trying to be of help and at the same time getting an answer. No, I must recognize that I was not aware of sendmany. This is my first time managing bulk transactions. Thank you very much for your help. I lead a group of developers and we are trying to be of help to the Bitcoin community at large, and to this group. Kindest regards, Fatima 2013/9/30 Gregory Maxwell <gmaxwell@gmail.com> > On Mon, Sep 30, 2013 at 3:00 PM, Fatima Castiglione Maldonado 发 > <castiglionemaldonado@gmail.com> wrote: > > I am new on the list. I got a similar problem. > > If I put "sendToAdress" transactions to bitcoind, it will accept between > 1 > > and 3 transactions per minute, depending on the underlying machine. > > If I try to send one transaction every 20 seconds or less, bitcoind stops > > responding to RPC calls. > > > > Does anybody know of any solution / fix / workaround for this? > > This is unrelated to Slush's question. > > The complexity of IsConfirmed/Ismine is exponential and starts taking > tens of seconds at a chain of a dozen unconfirmed transactions (and > growing from there) > > There are some patches that change this, but since the whole network > will only average about 7tx per second, you're probably doing > something wrong if you're building great big chains of unconfirmed > transaction. Are you aware of sendmany? > -- ================================= Fátima Castiglione Maldonado castiglionemaldonado@gmail.com ____ ,'_ | __________________|__|__|__ <_____ ) _.------._ `-----------,------.-' ,-' `-. | | | ,' `. ,' | | ,' `. | _,-' |__ / \ _,'-' `. `---.___|_____________ \ .--' -----. | _____________________ `-. ----- | | ___| | | \ ,- \ | | ___| |===========================((|) | | | | | | _____________________/ `- / | `--._ -----' | _________________,-' ----- | `.-._ ,' __.---' | / | `-. | \ / `. | | `. ,' | | | `. ,' _____,----------`-------`-. `-._ _,-' <___________________________) `------' | _| | `.____| ================================= [-- Attachment #2: Type: text/html, Size: 3962 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 21:01 ` Warren Togami Jr. 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发 @ 2013-10-01 1:26 ` Jeff Garzik 1 sibling, 0 replies; 16+ messages in thread From: Jeff Garzik @ 2013-10-01 1:26 UTC (permalink / raw) To: Warren Togami Jr.; +Cc: bitcoin-development On Mon, Sep 30, 2013 at 11:01 PM, Warren Togami Jr. <wtogami@gmail.com> wrote: > 0.8.2 apparently was the first Bitcoin version to support RPC keepalive. No, this is not correct at all. RPC keepalive was present in 0.7.0, possibly earlier. Come on, it took a 30 second 'git checkout' session to verify this. > With the 4 RPC thread limit, four keepalive connections will exhaust all four and prevent further connections. I was about to flame you with a "WTF is this fiction?" but it seems true: 21eb5ada introduces this. Unfortunately, it does so erroneously, introducing clearly buggy behavior that did not exist with the thread-per-connection code that provided keep-alive before commit 21eb5ada. In my opinion, 21eb5ada half-changed the RPC code from thread-per-connection to a worker-group model, without considering all the consequences. -- Jeff Garzik Senior Software Engineer and open source evangelist BitPay, Inc. https://bitpay.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush 2013-09-30 21:01 ` Warren Togami Jr. @ 2013-10-01 0:20 ` Chris Double 2013-10-01 1:17 ` Jeff Garzik 2013-10-01 4:03 ` Olivier Langlois 3 siblings, 0 replies; 16+ messages in thread From: Chris Double @ 2013-10-01 0:20 UTC (permalink / raw) To: bitcoin-development slush <slush@centrum.cz> writes: > I observed this problem with version 0.8.2, but it is still happening with > 0.8.5. I see this as well with 0.8.2+. I don't see it on 0.8.1. I originally hit the '4 keep alive thread limit' where four clients on the same bitcoind would result in other clients not being able to connect. I resolved that and still hit the RPC hanging issue. The most problem I've had is using 'getwork' which I believe is going the wayside anyway. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush 2013-09-30 21:01 ` Warren Togami Jr. 2013-10-01 0:20 ` Chris Double @ 2013-10-01 1:17 ` Jeff Garzik 2013-10-01 8:58 ` slush 2013-10-01 4:03 ` Olivier Langlois 3 siblings, 1 reply; 16+ messages in thread From: Jeff Garzik @ 2013-10-01 1:17 UTC (permalink / raw) To: slush; +Cc: bitcoin-development Can you please describe more than "RPC stops working"? What is your precise RPC usage? getwork? getblocktemplate? other calls? What is your OS? On Mon, Sep 30, 2013 at 10:44 PM, slush <slush@centrum.cz> wrote: > Hi, > > during several weeks I'm observing more and more frequent issues with > bitcoind. The problem is that bitcoind stops responding to RPC calls, but > there's no other suspicious activity in bitcoind log, CPU usage is low, disk > I/O is standard etc. > > I observed this problem with version 0.8.2, but it is still happening with > 0.8.5. Originally this happen just one or twice per day. Today my monitoring > scripts restarted bitcoind more than 30x, which sounds alarming. This happen > on various backends, so it isn't a problem of one specific node. Is there > anybody else who's observing similar problem? > > Thanks, > slush > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > -- Jeff Garzik Senior Software Engineer and open source evangelist BitPay, Inc. https://bitpay.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-01 1:17 ` Jeff Garzik @ 2013-10-01 8:58 ` slush 2013-10-04 6:22 ` Gavin Andresen 0 siblings, 1 reply; 16+ messages in thread From: slush @ 2013-10-01 8:58 UTC (permalink / raw) To: Jeff Garzik; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 705 bytes --] ad "RPC stops working": * Client makes a 'getinfo' call and don't receive a response in a minute. "What is your precise RPC usage? " One process is asking getinfo every second as a fallback to possibly misconfigured blocknotify. It also calls getblocktemplate every 30 second. Second process is calling getinfo once a minute to check if bitcoind is working. If it don't receive a response in a minute, it kills bitcoind and starts it again. That's all. OS is Debian. On Tue, Oct 1, 2013 at 3:17 AM, Jeff Garzik <jgarzik@bitpay.com> wrote: > Can you please describe more than "RPC stops working"? What is your > precise RPC usage? getwork? getblocktemplate? other calls? What is > your OS? > > [-- Attachment #2: Type: text/html, Size: 1565 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-01 8:58 ` slush @ 2013-10-04 6:22 ` Gavin Andresen 2013-10-04 15:05 ` Jeff Garzik 0 siblings, 1 reply; 16+ messages in thread From: Gavin Andresen @ 2013-10-04 6:22 UTC (permalink / raw) To: slush; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 976 bytes --] On Tue, Oct 1, 2013 at 6:58 PM, slush <slush@centrum.cz> wrote: > One process is asking getinfo every second as a fallback to possibly > misconfigured blocknotify. It also calls getblocktemplate every 30 second. > getinfo does a bunch of stuff; with 0.9 you will be able to use getbestblockhash instead. > Second process is calling getinfo once a minute to check if bitcoind is > working. If it don't receive a response in a minute, it kills bitcoind and > starts it again. > If you just want to see if bitcoind is responding to RPC requests, then 'help getinfo' would do the trick without acquiring any locks. RE: running into the maximum-of-4-keepalive-requests : simple workaround is to run with -rpcthreads=11 (or however many keepalive connections you need to support). I agree that the rpc code should be smarter; making the last rpc thread ignore keepalive and always disconnecting should be a fairly simple patch, and "patches welcome." -- -- Gavin Andresen [-- Attachment #2: Type: text/html, Size: 1596 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-04 6:22 ` Gavin Andresen @ 2013-10-04 15:05 ` Jeff Garzik 0 siblings, 0 replies; 16+ messages in thread From: Jeff Garzik @ 2013-10-04 15:05 UTC (permalink / raw) To: Gavin Andresen; +Cc: bitcoin-development On Fri, Oct 4, 2013 at 2:22 AM, Gavin Andresen <gavinandresen@gmail.com> wrote: > RE: running into the maximum-of-4-keepalive-requests : simple workaround is > to run with -rpcthreads=11 (or however many keepalive connections you need > to support). I agree that the rpc code should be smarter; making the last > rpc thread ignore keepalive and always disconnecting should be a fairly > simple patch, and "patches welcome." It's all still working around a problem unchanged since Satoshi wrote it: the HTTP server code paths use blocking I/O. Amusingly, we do this through an async I/O library, which helps facilitate SSL, but all our connections and operations are blocking. That's why RPC was multi-threaded in part: to work around the ugly blocking nature of the code. At least with multiple threads, one thread will not stall another even if the network stalls (or a software bug triggers a stall etc.) Hopefully I can dive into the code and make is truly async I/O. It takes some work, and I'm happy if someone else steals the task, but that's what really needs to be done. I tried the multi-threaded approach, writing an entire boost::asio skeleton JSON-RPC HTTP server: https://github.com/jgarzik/rpcsrv This is working, tested code that uses boost::asio properly. It's also quite a bit of LOC, and a bit messy to boot (four LOC per boost async action and incomprehensible compiler errors in return for proper async+MT). A single thread with async I/O is likely sufficient for even heavy uses of RPC -- since today it all goes through a big lock anyway. -- Jeff Garzik Senior Software Engineer and open source evangelist BitPay, Inc. https://bitpay.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush ` (2 preceding siblings ...) 2013-10-01 1:17 ` Jeff Garzik @ 2013-10-01 4:03 ` Olivier Langlois 2013-10-01 7:10 ` Olivier Langlois 3 siblings, 1 reply; 16+ messages in thread From: Olivier Langlois @ 2013-10-01 4:03 UTC (permalink / raw) To: slush; +Cc: bitcoin-development On Mon, 2013-09-30 at 22:44 +0200, slush wrote: > Hi, > > > during several weeks I'm observing more and more frequent issues with > bitcoind. The problem is that bitcoind stops responding to RPC calls, > but there's no other suspicious activity in bitcoind log, CPU usage is > low, disk I/O is standard etc. > > > I observed this problem with version 0.8.2, but it is still happening > with 0.8.5. Originally this happen just one or twice per day. Today my > monitoring scripts restarted bitcoind more than 30x, which sounds > alarming. This happen on various backends, so it isn't a problem of > one specific node. Is there anybody else who's observing similar > problem? What a coincidence. I do have observed the same thing. right now with 0.8.5. I am writing a small app. My jsonrpc client is programmed to timeout after 2 secs and I did see a couple of timeouts once in while. What I did is a simple test app that just hammer bitcoind with 3 rpc requests every 30 seconds and I abort it as soon as it encountered a timeout. The 3 request burst is performed on the same HTTP 1.1 kept alive connection. Then I disconnect. When I launch my app before leaving in the morning, pretty sure that I have a core dump waiting for me when I come back. I choose very simple calls: getinfo,getaccount Added a couple of traces in the RPC handling code. (BTW, timestamps in traces would be tremendously useful for tracking problems...). I see my request received by bitcoind but there is no trace yet to show that the reply is sent. Not sure yet exactly where the problem is but my current #1 suspect is: LOCK2(cs_main, pwalletMain->cs_wallet); with some kind of lock contention with the other threads. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-01 4:03 ` Olivier Langlois @ 2013-10-01 7:10 ` Olivier Langlois 2013-10-01 14:17 ` Jeff Garzik 0 siblings, 1 reply; 16+ messages in thread From: Olivier Langlois @ 2013-10-01 7:10 UTC (permalink / raw) To: slush; +Cc: bitcoin-development > > Not sure yet exactly where the problem is but my current #1 suspect is: > > LOCK2(cs_main, pwalletMain->cs_wallet); > > with some kind of lock contention with the other threads. > I was right. It took more than 6 seconds to acquire the locks I did modify bitcoinrpc.cpp: namespace { struct timeval difftv( const struct timeval &tv1, const struct timeval &tv2 ) { struct timeval res; res.tv_sec = tv1.tv_sec - tv2.tv_sec; if (tv2.tv_usec > tv1.tv_usec) { res.tv_sec--; res.tv_usec = 1000000; } else res.tv_usec = 0; res.tv_usec += tv1.tv_usec; res.tv_usec -= tv2.tv_usec; return res; } void printExecTimes( const struct timeval &tv1, const struct timeval &tv2, const struct timeval &tv3 ) { struct timeval lockTime = difftv(tv2,tv1); struct timeval callTime = difftv(tv3,tv2); struct timeval totalTime = difftv(tv3,tv1); printf( "locktime : %ld.%06ld calltime : %ld.%06ld totaltime : % ld.%06ld\n", lockTime.tv_sec,lockTime.tv_usec,callTime.tv_sec,callTime.tv_usec, totalTime.tv_sec,totalTime.tv_usec); } } json_spirit::Value CRPCTable::execute(const std::string &strMethod, const json_spirit::Array ¶ms) const { // Find method const CRPCCommand *pcmd = tableRPC[strMethod]; if (!pcmd) throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Method not found"); // Observe safe mode string strWarning = GetWarnings("rpc"); if (strWarning != "" && !GetBoolArg("-disablesafemode") && !pcmd->okSafeMode) throw JSONRPCError(RPC_FORBIDDEN_BY_SAFE_MODE, string("Safe mode: ") + strWarning); try { // Execute Value result; { if (pcmd->threadSafe) result = pcmd->actor(params, false); else { struct timeval tv1,tv2,tv3; gettimeofday(&tv1,NULL); { LOCK2(cs_main, pwalletMain->cs_wallet); gettimeofday(&tv2,NULL); result = pcmd->actor(params, false); } gettimeofday(&tv3,NULL); printExecTimes(tv1,tv2,tv3); } } return result; } catch (std::exception& e) { throw JSONRPCError(RPC_MISC_ERROR, e.what()); } } locktime : 0.000001 calltime : 0.000153 totaltime : 0.000154 locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 locktime : 0.000000 calltime : 0.000451 totaltime : 0.000451 locktime : 0.000000 calltime : 0.000313 totaltime : 0.000313 locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 locktime : 0.051574 calltime : 0.000377 totaltime : 0.051951 locktime : 0.000000 calltime : 0.000222 totaltime : 0.000222 locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 locktime : 0.121106 calltime : 0.000471 totaltime : 0.121577 locktime : 0.078093 calltime : 0.000451 totaltime : 0.078544 locktime : 0.101185 calltime : 0.000021 totaltime : 0.101206 locktime : 0.000000 calltime : 0.000476 totaltime : 0.000476 locktime : 0.000001 calltime : 0.000291 totaltime : 0.000292 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.003025 calltime : 0.000036 totaltime : 0.003061 locktime : 0.000000 calltime : 0.000383 totaltime : 0.000383 locktime : 0.000000 calltime : 0.000210 totaltime : 0.000210 locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 locktime : 0.000000 calltime : 0.000470 totaltime : 0.000470 locktime : 0.000000 calltime : 0.000295 totaltime : 0.000295 locktime : 0.000000 calltime : 0.000020 totaltime : 0.000020 locktime : 0.000001 calltime : 0.000385 totaltime : 0.000386 locktime : 0.000000 calltime : 0.000241 totaltime : 0.000241 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000001 calltime : 0.000308 totaltime : 0.000309 locktime : 0.000000 calltime : 0.000164 totaltime : 0.000164 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000000 calltime : 0.000376 totaltime : 0.000376 locktime : 0.000000 calltime : 0.000356 totaltime : 0.000356 locktime : 0.000000 calltime : 0.000021 totaltime : 0.000021 locktime : 0.000000 calltime : 0.000496 totaltime : 0.000496 locktime : 0.000001 calltime : 0.000201 totaltime : 0.000202 locktime : 0.000001 calltime : 0.000017 totaltime : 0.000018 locktime : 0.000000 calltime : 0.000301 totaltime : 0.000301 locktime : 0.000000 calltime : 0.000180 totaltime : 0.000180 locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 locktime : 0.000001 calltime : 0.000359 totaltime : 0.000360 locktime : 0.000000 calltime : 0.000265 totaltime : 0.000265 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000001 calltime : 0.000488 totaltime : 0.000489 locktime : 0.000000 calltime : 0.000226 totaltime : 0.000226 locktime : 0.000000 calltime : 0.000025 totaltime : 0.000025 locktime : 0.000000 calltime : 0.000369 totaltime : 0.000369 locktime : 0.000000 calltime : 0.000262 totaltime : 0.000262 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000000 calltime : 0.000459 totaltime : 0.000459 locktime : 0.000000 calltime : 0.000260 totaltime : 0.000260 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000000 calltime : 0.000330 totaltime : 0.000330 locktime : 0.000000 calltime : 0.000223 totaltime : 0.000223 locktime : 0.000000 calltime : 0.000014 totaltime : 0.000014 locktime : 0.000001 calltime : 0.000449 totaltime : 0.000450 locktime : 0.000001 calltime : 0.000248 totaltime : 0.000249 locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 locktime : 0.000000 calltime : 0.000327 totaltime : 0.000327 locktime : 0.000000 calltime : 0.000196 totaltime : 0.000196 locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 locktime : 0.121170 calltime : 0.000408 totaltime : 0.121578 locktime : 0.004912 calltime : 0.000278 totaltime : 0.005190 locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-01 7:10 ` Olivier Langlois @ 2013-10-01 14:17 ` Jeff Garzik 2013-10-02 3:20 ` Olivier Langlois 0 siblings, 1 reply; 16+ messages in thread From: Jeff Garzik @ 2013-10-01 14:17 UTC (permalink / raw) To: Olivier Langlois; +Cc: bitcoin-development Olivier, What network activity was going on, during this test? i.e. during the call with 5.99 locktime, was bitcoind processing a block or sending a large transaction? There are plenty of valid reasons -- sadly -- that the locks are held for a long time, during random network events. On Tue, Oct 1, 2013 at 9:10 AM, Olivier Langlois <olivier@trillion01.com> wrote: > >> >> Not sure yet exactly where the problem is but my current #1 suspect is: >> >> LOCK2(cs_main, pwalletMain->cs_wallet); >> >> with some kind of lock contention with the other threads. >> > > I was right. It took more than 6 seconds to acquire the locks > > I did modify bitcoinrpc.cpp: > > namespace { > > struct timeval difftv( const struct timeval &tv1, const struct timeval > &tv2 ) > { > struct timeval res; > res.tv_sec = tv1.tv_sec - tv2.tv_sec; > if (tv2.tv_usec > tv1.tv_usec) { > res.tv_sec--; > res.tv_usec = 1000000; > } else > res.tv_usec = 0; > > res.tv_usec += tv1.tv_usec; > res.tv_usec -= tv2.tv_usec; > > return res; > } > > void printExecTimes( const struct timeval &tv1, > const struct timeval &tv2, > const struct timeval &tv3 ) > { > struct timeval lockTime = difftv(tv2,tv1); > struct timeval callTime = difftv(tv3,tv2); > struct timeval totalTime = difftv(tv3,tv1); > printf( "locktime : %ld.%06ld calltime : %ld.%06ld totaltime : % > ld.%06ld\n", > > lockTime.tv_sec,lockTime.tv_usec,callTime.tv_sec,callTime.tv_usec, > totalTime.tv_sec,totalTime.tv_usec); > } > > } > > json_spirit::Value CRPCTable::execute(const std::string &strMethod, > const json_spirit::Array ¶ms) const > { > // Find method > const CRPCCommand *pcmd = tableRPC[strMethod]; > if (!pcmd) > throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Method not found"); > > // Observe safe mode > string strWarning = GetWarnings("rpc"); > if (strWarning != "" && !GetBoolArg("-disablesafemode") && > !pcmd->okSafeMode) > throw JSONRPCError(RPC_FORBIDDEN_BY_SAFE_MODE, string("Safe > mode: ") + strWarning); > > try > { > // Execute > Value result; > { > if (pcmd->threadSafe) > result = pcmd->actor(params, false); > else { > struct timeval tv1,tv2,tv3; > gettimeofday(&tv1,NULL); { > LOCK2(cs_main, pwalletMain->cs_wallet); > gettimeofday(&tv2,NULL); > result = pcmd->actor(params, false); } > gettimeofday(&tv3,NULL); > printExecTimes(tv1,tv2,tv3); > } > } > return result; > } > catch (std::exception& e) > { > throw JSONRPCError(RPC_MISC_ERROR, e.what()); > } > } > > locktime : 0.000001 calltime : 0.000153 totaltime : 0.000154 > locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 > locktime : 0.000000 calltime : 0.000451 totaltime : 0.000451 > locktime : 0.000000 calltime : 0.000313 totaltime : 0.000313 > locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 > locktime : 0.051574 calltime : 0.000377 totaltime : 0.051951 > locktime : 0.000000 calltime : 0.000222 totaltime : 0.000222 > locktime : 0.000000 calltime : 0.000011 totaltime : 0.000011 > locktime : 0.121106 calltime : 0.000471 totaltime : 0.121577 > locktime : 0.078093 calltime : 0.000451 totaltime : 0.078544 > locktime : 0.101185 calltime : 0.000021 totaltime : 0.101206 > locktime : 0.000000 calltime : 0.000476 totaltime : 0.000476 > locktime : 0.000001 calltime : 0.000291 totaltime : 0.000292 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.003025 calltime : 0.000036 totaltime : 0.003061 > locktime : 0.000000 calltime : 0.000383 totaltime : 0.000383 > locktime : 0.000000 calltime : 0.000210 totaltime : 0.000210 > locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 > locktime : 0.000000 calltime : 0.000470 totaltime : 0.000470 > locktime : 0.000000 calltime : 0.000295 totaltime : 0.000295 > locktime : 0.000000 calltime : 0.000020 totaltime : 0.000020 > locktime : 0.000001 calltime : 0.000385 totaltime : 0.000386 > locktime : 0.000000 calltime : 0.000241 totaltime : 0.000241 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000001 calltime : 0.000308 totaltime : 0.000309 > locktime : 0.000000 calltime : 0.000164 totaltime : 0.000164 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000000 calltime : 0.000376 totaltime : 0.000376 > locktime : 0.000000 calltime : 0.000356 totaltime : 0.000356 > locktime : 0.000000 calltime : 0.000021 totaltime : 0.000021 > locktime : 0.000000 calltime : 0.000496 totaltime : 0.000496 > locktime : 0.000001 calltime : 0.000201 totaltime : 0.000202 > locktime : 0.000001 calltime : 0.000017 totaltime : 0.000018 > locktime : 0.000000 calltime : 0.000301 totaltime : 0.000301 > locktime : 0.000000 calltime : 0.000180 totaltime : 0.000180 > locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 > locktime : 0.000001 calltime : 0.000359 totaltime : 0.000360 > locktime : 0.000000 calltime : 0.000265 totaltime : 0.000265 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000001 calltime : 0.000488 totaltime : 0.000489 > locktime : 0.000000 calltime : 0.000226 totaltime : 0.000226 > locktime : 0.000000 calltime : 0.000025 totaltime : 0.000025 > locktime : 0.000000 calltime : 0.000369 totaltime : 0.000369 > locktime : 0.000000 calltime : 0.000262 totaltime : 0.000262 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000000 calltime : 0.000459 totaltime : 0.000459 > locktime : 0.000000 calltime : 0.000260 totaltime : 0.000260 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000000 calltime : 0.000330 totaltime : 0.000330 > locktime : 0.000000 calltime : 0.000223 totaltime : 0.000223 > locktime : 0.000000 calltime : 0.000014 totaltime : 0.000014 > locktime : 0.000001 calltime : 0.000449 totaltime : 0.000450 > locktime : 0.000001 calltime : 0.000248 totaltime : 0.000249 > locktime : 0.000000 calltime : 0.000017 totaltime : 0.000017 > locktime : 0.000000 calltime : 0.000327 totaltime : 0.000327 > locktime : 0.000000 calltime : 0.000196 totaltime : 0.000196 > locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 > locktime : 0.121170 calltime : 0.000408 totaltime : 0.121578 > locktime : 0.004912 calltime : 0.000278 totaltime : 0.005190 > locktime : 0.000000 calltime : 0.000016 totaltime : 0.000016 > locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148 > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- Jeff Garzik Senior Software Engineer and open source evangelist BitPay, Inc. https://bitpay.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-01 14:17 ` Jeff Garzik @ 2013-10-02 3:20 ` Olivier Langlois 2013-10-04 3:52 ` Olivier Langlois 0 siblings, 1 reply; 16+ messages in thread From: Olivier Langlois @ 2013-10-02 3:20 UTC (permalink / raw) To: Jeff Garzik; +Cc: bitcoin-development On Tue, 2013-10-01 at 16:17 +0200, Jeff Garzik wrote: > Olivier, > > What network activity was going on, during this test? i.e. during the > call with 5.99 locktime, was bitcoind processing a block or sending a > large transaction? There are plenty of valid reasons -- sadly -- that > the locks are held for a long time, during random network events. > Jeff, In your opinion what would be then a reasonable to value for a rpc client? From what you say, 2 seconds is apparently not enough. For your question, unfortunately or fortunately, not much. The only thing that strike out is that CTxMemPool::accept() seems to be called while the rpc thread is between jreq.parse() and tableRPC.execute(). Here are the logs surrounding the 5.99 sec lock wait. 1380610602.706628 : RPC connection closed Added 1 addresses from 85.202.231.133: 2633 tried, 12769 new CTxMemPool::accept() : accepted 6191dee582c1c19fd76d6f06a4c594ee1577dbb54b688c5dcc676ebfbc225cda (poolsz 1262) Added 1 addresses from 217.73.163.131: 2633 tried, 12769 new CTxMemPool::accept() : accepted 7a482a24d2588a66c6561da6ee29b5adff4e8ba72e58fb360a2c36cd78be3a71 (poolsz 1263) Added 1 addresses from 24.211.152.165: 2633 tried, 12770 new received getdata for: tx 3849ab1481c9d8c283e682f2ef8421a8cf01ef228a89d0daf2d5e888e04f45e4 CTxMemPool::accept() : accepted cf34b31f26d3275b66adf081f84fca2d761e8e260b42243ba36638368c1633b1 (poolsz 1264) CTxMemPool::accept() : accepted 61fc517d9461b79bab972593fbaf8b0be8ed0222c232b2c51eea2a781a44efef (poolsz 1265) CTxMemPool::accept() : accepted 4a42913c0b956f27cfc72ed44605ef8654ff8720b81e6ef1c7dfab8e300ee9d9 (poolsz 1266) CTxMemPool::accept() : accepted cec545fca0b4ad6efa44159758dd18b2e3047a1f1783812256a11aba96c32792 (poolsz 1267) CTxMemPool::accept() : accepted 1ad7ff07ed4b8299af0f5ce61b721f20d984225a4a963174f51d329587eef2e9 (poolsz 1268) CTxMemPool::accept() : accepted 12f6a6e453f34db7165983def2ecdfef174917194aa1b8f4c4883b02595c043c (poolsz 1269) CTxMemPool::accept() : accepted 69942aebbf3d3e6b781e56553f0b1e021e07fe1c4d922b221cadabb7eb84abb7 (poolsz 1270) CTxMemPool::accept() : accepted d7d95a6fc61209c8dbdf4b17a058f4f9c7fa6d6e76b36ca4cf8bb67f288ef3cc (poolsz 1271) CTxMemPool::accept() : accepted da6c8145c9f506ec0f146b37d57ef423098b584af8dbed9490002ed900228c27 (poolsz 1272) CTxMemPool::accept() : accepted 46debd7df211ea26060db539a909d7908f1f6fcca8103897909280ab156b8872 (poolsz 1273) CTxMemPool::accept() : accepted 30f47bae4e485e1129e0598f2264469a4f9186d9a7988debfe7e21f283975c9f (poolsz 1274) CTxMemPool::accept() : accepted 80848aebe10afda234c42db1ae3a013f873af54906d966afcb42547501d899c4 (poolsz 1275) CTxMemPool::accept() : accepted faaf9a2b57e80460293f3d53115fc8795bb1357d0ae6400710bc8a6610312b81 (poolsz 1276) CTxMemPool::accept() : accepted 44f5fc39cde3b35d32f20f8246001f946c9fdc1297ea21fd7400c0c483aceca2 (poolsz 1277) CTxMemPool::accept() : accepted d92a8389cfa6a8ea98a8103a138f45e56cf229d9b35bd46ac319167d17409edd (poolsz 1278) CTxMemPool::accept() : accepted 557e88dfed0d2e9cd3c8c1af2eb497ea5c0a888af7417f7b816390af1dd4f38a (poolsz 1279) CTxMemPool::accept() : accepted 037b48b12209b046bed88806bf5a92e3cc45cc2e30af895b5536fd8ef2ae865f (poolsz 1280) received getdata for: tx 1ad7ff07ed4b8299af0f5ce61b721f20d984225a4a963174f51d329587eef2e9 CTxMemPool::accept() : accepted cf8149016e4c25b0408a70dbb8f7bc596331de6840b9a3115a6aa5445cc2b3fe (poolsz 1281) received getdata for: tx 12f6a6e453f34db7165983def2ecdfef174917194aa1b8f4c4883b02595c043c received getdata for: tx 61fc517d9461b79bab972593fbaf8b0be8ed0222c232b2c51eea2a781a44efef CTxMemPool::accept() : accepted 890b41943b2cda0b9d7e34ddfcddb490582165c0cedd8fca09df6eec37ad768d (poolsz 1282) 1380610633.387730: POST / HTTP/1.1^M ThreadRPCServer method=getinfo CTxMemPool::accept() : accepted dc2941dd69b2f9fa2754f741dfba76489abef706bd237e3bd715181950723e4d (poolsz 1283) keypool reserve 15 keypool return 15 locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bitcoin-development] bitcoind stops responding 2013-10-02 3:20 ` Olivier Langlois @ 2013-10-04 3:52 ` Olivier Langlois 0 siblings, 0 replies; 16+ messages in thread From: Olivier Langlois @ 2013-10-04 3:52 UTC (permalink / raw) To: Jeff Garzik; +Cc: bitcoin-development > 1380610633.387730: POST / HTTP/1.1^M > ThreadRPCServer method=getinfo > CTxMemPool::accept() : accepted dc2941dd69b2f9fa2754f741dfba76489abef706bd237e3bd715181950723e4d (poolsz 1283) > keypool reserve 15 > keypool return 15 > locktime : 5.996820 calltime : 0.000328 totaltime : 5.997148 > I have found another circumstance where getinfo rpc function can take more than 2 secs to return. Here are the circunstances: received block 0000000000000013c889fbe1b2a23eb4b6bd38a8051420be1bf319da184f02ae Committing 1015 changed transactions to coin database... SetBestChain: new best=0000000000000013c889fbe1b2a23eb4b6bd38a8051420be1bf319da184f02ae height=261213 log2_work=72.38679 tx=24743665 date=2013-10-02 05:26:49 progress=0.999999 {"method":"getinfo","params":[],"id":783} ThreadRPCServer method=getinfo keypool reserve 15 keypool return 15 locktime : 0.000001 calltime : 2.453037 totaltime : 2.453038 This time locking the contention isn't from cs_main, cs_wallet mutex lock. I think that what happen here is 1. CWallet::SetBestChain is called 2. This trigger ThreadFlushWalletDB that will lock wallet db and flush it 3. getinfo will have to wait that ThreadFlushWalletDB ends to complete GetOldestKeyPoolTime(). Just throwing a couple of ideas here. Comments are welcome: 1. Is closing the db and reopen it the only way to flush the db to disk? Can it be done asynchonously? I don't know BDB much but probably closing the DB involves some blocking fsync() call and given that this follow 1015 records writing into the txdb, it takes more time than it should otherwise to return. 2. I'm relatively new to bitcoin. I understand that in a recent past everything was using BDB. Now that only the wallet seems to keep using it, maybe some options could be removed to make BDB more lightweight. DB_THREAD: From what I see, BDB API access is already well synchronized by bitcoind. There is no obvious reason to ask BDB API to be thread safe. DB_INIT_TXN: Correct me if I'm wrong, but I suspect this subsystem initialization to be a vestige of when txdb was in BDB. 3. not sure at all about this one but I'm throwing the idea anyway. Personnally, I have no used for 'keypoololdest'. I might experiment without it How about having an optional bool param to 'getinfo' that could have the name 'lowlatency' or 'nodb' to skip info gathering having the potential to make the call longer? ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-10-04 15:05 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-09-30 20:44 [Bitcoin-development] bitcoind stops responding slush 2013-09-30 21:01 ` Warren Togami Jr. 2013-09-30 22:00 ` Fatima Castiglione Maldonado 发 2013-09-30 22:08 ` Gregory Maxwell 2013-10-01 0:05 ` Fatima Castiglione Maldonado 发 2013-10-01 1:26 ` Jeff Garzik 2013-10-01 0:20 ` Chris Double 2013-10-01 1:17 ` Jeff Garzik 2013-10-01 8:58 ` slush 2013-10-04 6:22 ` Gavin Andresen 2013-10-04 15:05 ` Jeff Garzik 2013-10-01 4:03 ` Olivier Langlois 2013-10-01 7:10 ` Olivier Langlois 2013-10-01 14:17 ` Jeff Garzik 2013-10-02 3:20 ` Olivier Langlois 2013-10-04 3:52 ` Olivier Langlois
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox