From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
	helo=mx.sourceforge.net)
	by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <gmaxwell@gmail.com>) id 1TLjmK-0008Iz-2I
	for bitcoin-development@lists.sourceforge.net;
	Wed, 10 Oct 2012 00:04:00 +0000
Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.210.175 as permitted sender)
	client-ip=209.85.210.175; envelope-from=gmaxwell@gmail.com;
	helo=mail-ia0-f175.google.com; 
Received: from mail-ia0-f175.google.com ([209.85.210.175])
	by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1TLjmH-000718-QX
	for bitcoin-development@lists.sourceforge.net;
	Wed, 10 Oct 2012 00:04:00 +0000
Received: by mail-ia0-f175.google.com with SMTP id b35so1326572iac.34
	for <bitcoin-development@lists.sourceforge.net>;
	Tue, 09 Oct 2012 17:03:52 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.50.157.234 with SMTP id wp10mr3631775igb.5.1349827432300; Tue,
	09 Oct 2012 17:03:52 -0700 (PDT)
Received: by 10.64.34.4 with HTTP; Tue, 9 Oct 2012 17:03:52 -0700 (PDT)
In-Reply-To: <CA+8xBpchRLVQW4Rv2RQhsRF716cCTmJLuhZeuvCWtWB4sZX2Sw@mail.gmail.com>
References: <CA+8xBpchRLVQW4Rv2RQhsRF716cCTmJLuhZeuvCWtWB4sZX2Sw@mail.gmail.com>
Date: Tue, 9 Oct 2012 20:03:52 -0400
Message-ID: <CAAS2fgTJAL2g+0ezgyL4=L16RNgV2d_Ka3etb8UiF2V4qkLs-w@mail.gmail.com>
From: Gregory Maxwell <gmaxwell@gmail.com>
To: Jeff Garzik <jgarzik@exmulti.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -1.3 (-)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(gmaxwell[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
	0.3 AWL AWL: From: address is in the auto white-list
X-Headers-End: 1TLjmH-000718-QX
Cc: Bitcoin Development <bitcoin-development@lists.sourceforge.net>
Subject: Re: [Bitcoin-development] On bitcoin testing
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 00:04:00 -0000

On Tue, Oct 9, 2012 at 7:12 PM, Jeff Garzik <jgarzik@exmulti.com> wrote:
> * Data-driven tests: If possible, write software-neutral, data-driven
> tests.  This enables clients other than the reference one (Satoshi
> client) to be tested.  Embed tests in testnet3 chain, if possible.

The mention of testnet3 here reminds me to make a point:  Confirmation
bias is a common problem for software testing=E2=80=94 people often over te=
st
the success cases and under-test the failure cases.  This is certainly
the case in Bitcoin: For example, testnet3+the packaged tests test all
the branches inside the interior script evaluation engine _except_ the
rejection cases.

For us failure cases can be harder to package up (e.g. can't be placed
in testnet) but Matt's node-simulation based tester provides a good
example of how to create a data driven test set that tests both
failure cases and dynamic behavior (e.g. reorgs).

Testing of failure cases is absolutely critical for testing of
implementation compatibility: The existence of a difference in what
gets rejected in a widely deployed alternative node could result in an
utterly devastating network split.

Generally every test of something which must succeeded should be
matched by a test of something that must fail. Personally, I like to
test the boundary cases=E2=80=94 e.g. if something has an allowed range of
[0-8], I'll test -1,0,8,9 at a minimum. Though reasoning trumps rules
of thumb.

Confirmation bias is another reason why it's important to have a more
diverse collection of testers than the core developers.  People who
work closely with the software have strong expectations of how the
software should work and are less likely to test crazy corner cases
because they "know" the outcome, sometimes erroneously.


To reinforce Jeff's list of different approaches: I've long found that
each mechanism of software testing has diminishing returns the more of
it you apply. So you're best off using as many different approaches a
little rather than spending all your resources going as deep as
possible with any one approach.

There are also some kind of testing which are synergistic: Almost all
testing is enhanced enormously by combining it with valgrind because
it substantially lowers the threshold of issue detection substantially
(e.g. detecting bogus memory accesses which are _currently_ causing a
crash for you but could). If I could only test one of "with valgrind"
or "without" I'd test with every time.  Sadly valgrind doesn't exist
on windows and it's rather slow. Dr. Memory
(http://code.google.com/p/drmemory/) may be an alternative on Windows,
and there is work to port ASAN to GCC so it may be possible to mingw
ASAN builds in not too long.

I've also found that any highly automatable testing (coded data
driven, unit, and fuzz testing) combines well with diverse
compilation, e.g. building on as many system types and architectures=E2=80=
=94
including production irrelevant ones=E2=80=94 as possible in the hopes that
some system specific quark make a bug easier to detect.