From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pete@petertodd.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id A7E0B5B1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed,  1 Mar 2017 22:31:09 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail149055.authsmtp.co.uk (outmail149055.authsmtp.co.uk
	[62.13.149.55])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6424A1B7
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed,  1 Mar 2017 22:31:08 +0000 (UTC)
Received: from mail-c232.authsmtp.com (mail-c232.authsmtp.com [62.13.128.232])
	by punt21.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v21MV5sU031486;
	Wed, 1 Mar 2017 22:31:05 GMT
Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
	[52.5.185.120]) (authenticated bits=0)
	by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v21MV2Rc041636
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 1 Mar 2017 22:31:03 GMT
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by petertodd.org (Postfix) with ESMTPSA id D5923400AC;
	Wed,  1 Mar 2017 22:31:01 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
	id 18C1D2030A; Wed,  1 Mar 2017 17:31:01 -0500 (EST)
Date: Wed, 1 Mar 2017 17:31:01 -0500
From: Peter Todd <pete@petertodd.org>
To: Bram Cohen <bram@bittorrent.com>
Message-ID: <20170301223101.GA17022@savin.petertodd.org>
References: <20170224010943.GA29218@savin.petertodd.org>
	<CA+KqGkrOK76S3ffPJmpqYcBwtSeKESqN16yZsrwzDR6JZZmwFA@mail.gmail.com>
	<20170224025811.GA31911@savin.petertodd.org>
	<CA+KqGkq7gavAnAk-tcA+gxL2sWpv3ENhEmHrQHaPdyAsKrLjGg@mail.gmail.com>
	<20170224031531.GA32118@savin.petertodd.org>
	<CA+KqGkrfhg3GnbWwvKXHQ2NWuCnfzYyTPUxRhzYMuDBiNQR4eA@mail.gmail.com>
	<20170224043613.GA32502@savin.petertodd.org>
	<CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
	<20170225041202.GA11152@savin.petertodd.org>
	<CA+KqGkqs8F1hK6y-JnLFRpqhQ5i8i+MXVmtGUQBYmE5d1OCAAg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="pf9I7BMVVzbSWLtt"
Content-Disposition: inline
In-Reply-To: <CA+KqGkqs8F1hK6y-JnLFRpqhQ5i8i+MXVmtGUQBYmE5d1OCAAg@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Server-Quench: c121ac78-fece-11e6-829f-00151795d556
X-AuthReport-Spam: If SPAM / abuse - report it at:
	http://www.authsmtp.com/abuse
X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
	aQdMdgMUFVQGAgsB AmEbWVZeU1x7WWA7 bghPaBtcak9QXgdq
	T0pMXVMcUgdpfHoD ZE0eVhh0dAMIeXZ2 ZUAsDXFZXRUpck5g
	FBsHQHAHZDJmdWgd WRZFdwNVdQJNdxoR b1V5GhFYa3VsNCMk
	FAgyOXU9MCtqYA0d aAwRMV8ICWMuJHYQ Sh4DGzQzHEoDLwAA 
X-Authentic-SMTP: 61633532353630.1037:706
X-AuthFastPath: 0 (Was 255)
X-AuthSMTP-Origin: 52.5.185.120/25
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
	anti-virus system.
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] A Better MMR Definition
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Mar 2017 22:31:09 -0000


--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 24, 2017 at 10:23:20PM -0800, Bram Cohen wrote:
> > Your merkle-set implementation is 1500 lines of densely written Python
>=20
>=20
> The reference implementation which is included in those 1500 lines is less
> than 300 lines and fairly straightforward. The non-reference implementati=
on
> always behaves semantically identically to the reference implementation, =
it
> just does so faster and using less memory.

Great!

But do you see my point here? Even though I spent some time reading through
that code, I didn't realise you had a 300 line reference implementation
embedded within those 1500 lines. This makes it less likely for you to get =
any
review on either.

A better way to present your work would have been to at least explain that =
at
the top of the file, and perhaps even better, split the reference
implementation and optimized implementation into two separate files. If you=
 did
this, you'd be more likely to get others to review your work.

> > with
> > almost no comments,
>=20
>=20
> The comments at the top explain both the proof format and the in-memory
> data structures very precisely. The whole codebase was reviewed by a
> coworker of mine and comments were added explaining the subtleties which
> tripped him up.

Yes, and it's good that you have those comments. But the codebase itself co=
uld
definitely use more, and adding those comments would help get more people
reviewing your work.

> > and less than a 100 lines of (also uncommented) tests.
>=20
>=20
> Those tests get 98% code coverage and extensively hit not only the lines =
of
> code but the semantic edge cases as well. The lines which aren't hit are
> convenience functions and error conditions of the parsing code for when
> it's passed bad data

Great! But you see how without comments, it'll take a tremendous amount of =
work
for an external reviewer like myself to determine what is being tested, and
what edge cases you're targeting.

In fact, I'd suggest that for things like edge cases, you test edge cases in
separate unit tests that explain what edge cases you're trying to catch.

> > By
> > comparison, my Python MMR implementation is 300 lines of very readable
> > Python
> > with lots of comments, a 200 line explanation at the top, and 200 lines=
 of
> > (commented) tests. Yet no-one is taking the (still considerable) effort=
 to
> > understand and comment on my implementation. :)
> >
>=20
> Given that maaku's Merkle prefix trees were shelved because of performance
> problems despite being written in C and operating in basically the same w=
ay
> as your code and my reference code, it's clear that non-optimized Python
> won't be touching the bitcoin codebase any time soon.

To be clear, I gave my implementation as an example of how hard it is to get
external review, not to suggest it's going to be a part of Bitcoin; I've
pointed a lot of people to it when they asked for a MMR implementation, and=
 I'm
sure if some of those people had reviewed it carefully they would have
suggested changes. Yet they haven't, because doing good review is a lot of
work!

> > In particular, while at the top of merkle_set.py you have a list of
> > advantages,
> > and a bunch of TODO's, you don't explain *why* the code has any of these
> > advantages. To figure that out, I'd have to read and understand 1500 li=
nes
> > of
> > densely written Python. Without a human-readable pitch, not many people=
 are
> > going to do that, myself included.
> >
>=20
> It's all about cache coherence. When doing operations it pulls in a bunch
> of things which are near each other in memory instead of jumping all over
> the place. The improvements it gets should be much greater than the ones
> gained from insertion ordering, although the two could be accretive.

That's good, but that paragraph should be part of your MerkleSet git repo,
preferably in the README, where reviewers will immediately find it and get
excited about reviewing your code.

--=20
https://petertodd.org 'peter'[:-1]@petertodd.org

--pf9I7BMVVzbSWLtt
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJYt0ugAAoJECSBQD2l8JH7GT0IAJGdWMegQ2GZnHl9Jw5OLOCf
EwbKzt6+F40rajL0Rxcy6b5yhu9Vngi8eZno6Tr8FEDPFpIdE/6daf+3x+Gx5ifv
8R3dxiz1Y89IKczJHYEQgKbYviZ6KPI13REHhmSBtNw5dQee5ZX8/TOrfdagaCKd
TjgkKwrem69D+HrLy5pEnlFZqncVVSzz72l0TY61EA5r2RKwFILUP0nwBfRV7jnt
8rr0mSDDd4fQNpekN1nvYDAzcOLgXc+RJIHKliHB5U33WOomhciptJSMSthd9oDb
CMQEXj8xg6BQqDOaVBpnAQcdCdl7bvjtxAwc0kgDDSuBSegaznuxfUstjHAbVsY=
=gLIR
-----END PGP SIGNATURE-----

--pf9I7BMVVzbSWLtt--