public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
       [not found] <mailman.11604.1661435396.956.bitcoin-dev@lists.linuxfoundation.org>
@ 2022-08-26  6:06 ` Ali Sherief
  0 siblings, 0 replies; 20+ messages in thread
From: Ali Sherief @ 2022-08-26  6:06 UTC (permalink / raw)
  To: rhavar; +Cc: bitcoin-dev

I think these problems can be mitigated if the CSV format is strictly defined, such as how I specified it in my previous message.

In particular, the parser has to recognize only one specific header line that has a version number somewhere, or abort - and I still insist on quoting the labels with double-quote and introducing a 3rd column with specific string or numeric types and then replacing all the special characters in the input/output with ":".

Strictly defining CSV version and consequentially, the fields, and then specifying on what kind of data the import is supposed to fail at will limit the complexity of importers to N different switch cases - where N is the number of circulating versions of the format (for now 1).

- Ali

On Thu, Thu, 25 Aug 2022 13:48:36 +0000, rhavar@protonmail.com wrote:
> > Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard.
>
> I think quite simply: A forgiving format is not appropriate for a standard.
>
> It'd be hard to understate how much extra and pointless effort it creates for everyone, and every implementation ends up creating its own defacto standard for what it produces and accepts. Even doing something as simple as adding an extra column will not be possible in the future because it'll break comparability with previous parsers.
>
> I've literally worked on projects where the csv parser has evolved into scan-ahead to use heuristics to understand "rules" of a csv file, and then do line-by-line heuristics to override those rules in pathological cases. Makes a bit of sense when you're trying to achieve 30 years of backwards compatibility. Doesn't make sense for much else..
>
> If your application users really like csv, then introduce an application-specific import-from-csv and export-to-csv with your own rules.
> -Ryan
>
> ------- Original Message -------
> On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craigraw@gmail.com> wrote:
>
> > Thanks for your thoughts Ryan.
> >
> > Without reference to the quality feedback on this proposal, I was aware when submitting it for review that it provides an excellent opportunity for bike shedding. As developers, we have all experienced frustration with data formats. One thing that I did not perhaps make clear enough is that this format is not solely intended for developers, but general users who are probably not well represented on this list.
> >
> > While doing research for this proposal I spoke to several professional users of Sparrow Wallet (who are not developers). They all expressed a desire for the format to integrate with their business processes, which are driven by business tools such as Excel. Labelling provides an important function in UTXO and address management in these scenarios, and needs to be accessible and manageable outside of wallet software.
> >
> > If this is to be achieved, it immediately rules out JSON as a data format. Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard. With respect to your comments on escaping, my expectation would be that developers will be using a mature CSV library rather than handling character escaping themselves. I would rather propose a format that is generally usable, even if occasionally a label is escaped incorrectly.
> >
> > Finally, I'll note that CSV files are already common and uncontroversial in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many others) already export addresses and/or transactions with their labels as CSV files. This proposal simply attempts to create a standard for importing and exporting all the labels in a wallet.
> >
> > Craig
> >
> > On Wed, Aug 24, 2022 at 9:01 PM <rhavar@protonmail.com> wrote:
> >
> >> I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.
> >>
> >> Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.
> >>
> >> And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).
> >>
> >> Over time it morphs into its own mini-monster that introduces so much pain.
> >>
> >> On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.
> >>
> >> I know everyone loves to hate on it, but really (line-separated?) json is the way to go.
> >>
> >> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?", "label": "wow, such label" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }
> >>
> >> -Ryan
> >>
> >> ------- Original Message -------
> >> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
> >>>
> >>> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
> >>>
> >>> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
> >>>
> >>> Feedback is appreciated.
> >>>
> >>> Thanks,
> >>> Craig Raw
> >>>
> >>> ---
> >>>
> >>> <pre>
> >>> BIP: wallet-labels
> >>> Layer: Applications
> >>> Title: Wallet Labels Export Format
> >>> Author: Craig Raw <craig@sparrowwallet.com>
> >>> Comments-Summary: No comments yet.
> >>> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >>> Status: Draft
> >>> Type: Informational
> >>> Created: 2022-08-23
> >>> License: BSD-2-Clause
> >>> </pre>
> >>>
> >>> ==Abstract==
> >>>
> >>> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
> >>>
> >>> ==Copyright==
> >>>
> >>> This BIP is licensed under the BSD 2-clause license.
> >>>
> >>> ==Motivation==
> >>>
> >>> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> >>> These standards are well supported and allow users to move easily between different wallets.
> >>> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> >>> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> >>> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> >>> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> >>> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> >>> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
> >>>
> >>> ==Specification==
> >>>
> >>> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> >>> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> >>> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
> >>>
> >>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> >>> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> >>> The first line in the file is a header, and should be ignored on import.
> >>> Thereafter, each line represents a record that refers to a label applied in the wallet.
> >>> The order in which these records appear is not defined.
> >>>
> >>> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> >>> This is specified as one of the following:
> >>> * Transaction ID (<tt>txid</tt>)
> >>> * Address
> >>> * Input (rendered as <tt>txid<index</tt>)
> >>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >>>
> >>> The second field contains the label applied to the reference.
> >>> Exporting applications may omit records with no labels or labels of zero length.
> >>> Files exported should use the <tt>.csv</tt> file extension.
> >>>
> >>> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> >>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
> >>> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> >>> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> >>>
> >>> ==Importing==
> >>>
> >>> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
> >>> For example in the following pseudocode:
> >>> <pre>
> >>> if reference length < 64
> >>> Set address label
> >>> else if reference length == 64
> >>> Set transaction label
> >>> else if reference contains '<'
> >>> Set input label
> >>> else
> >>> Set output label
> >>> </pre>
> >>>
> >>> Importing applications may truncate labels if necessary.
> >>>
> >>> ==Test Vectors==
> >>>
> >>> The following fragment represents a wallet label export:
> >>> <pre>
> >>> Reference,Label
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> >>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output (alternative)
> >>> </pre>
> >>>
> >>> ==Reference Implementation==
> >>>
> >>> TBD



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24  9:18 Craig Raw
                   ` (3 preceding siblings ...)
  2022-08-29 19:52 ` NVK
@ 2022-09-26  8:23 ` Craig Raw
  4 siblings, 0 replies; 20+ messages in thread
From: Craig Raw @ 2022-09-26  8:23 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 8232 bytes --]

Following discussion with several wallet developers, I have come to the
conclusion that the secondary goal of managing labels in non-specialized
applications must be sacrificed in order to achieve the primary goal of
wide implementation across different wallets. While this tradeoff was
perhaps inevitable, it was worth a try!

As such I have rewritten the specification to use JSON, specifically the
JSON Lines format suggested by Ryan Havar and others (thank you). This
allows documents to be split or streamed, and is convenient for
command-line processing. The format is also now self describing via a type
field, permitting simple type identification (thank you Ali Sherief and
others). Public keys and xpubs have been added as types following further
suggestions. To keep the specification simple, compression and encryption
have been removed - with the strong recommendation to consider protecting
the data in a way suitable to its application.

The rewritten BIP can be found at
https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki

It is perhaps simplest to understand it by looking at an example export:

{ "type": "tx", "ref":
"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd",
"label": "Transaction" }
{ "type": "addr", "ref": "bc1q34aq5drpuwy3wgl9lhup9892qp6svr8ldzyy7c",
"label": "Address" }
{ "type": "pubkey", "ref":
"0283409659355b6d1cc3c32decd5d561abaac86c37a353b52895a5e6c196d6f448",
"label": "Public Key" }
{ "type": "input", "ref":
"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:0",
"label": "Input" }
{ "type": "output", "ref":
"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:1",
"label": "Output" }
{ "type": "xpub", "ref":
"xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8Nq...", "label": "Extended
Public Key" }

Feedback is always appreciated.

Craig


On Wed, Aug 24, 2022 at 11:18 AM Craig Raw <craigraw@gmail.com> wrote:

> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
>   BIP: wallet-labels
>   Layer: Applications
>   Title: Wallet Labels Export Format
>   Author: Craig Raw <craig@sparrowwallet.com>
>   Comments-Summary: No comments yet.
>   Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>   Status: Draft
>   Type: Informational
>   Created: 2022-08-23
>   License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet
> applications is well defined through standards such as BIP39, BIP32, BIP44
> etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user may
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or
> bulk management of labels accessible to users without specific technical
> expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied
> in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or AES-256 encryption, which is supported by numerous applications
> including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length == 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
>
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
> (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD
>
>
>

[-- Attachment #2: Type: text/html, Size: 9638 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-29 11:25   ` Craig Raw
@ 2022-09-21  6:07     ` Hugo Nguyen
  0 siblings, 0 replies; 20+ messages in thread
From: Hugo Nguyen @ 2022-09-21  6:07 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion; +Cc: Ali Sherief

[-- Attachment #1: Type: text/plain, Size: 25124 bytes --]

Hello Craig,
Thank you for putting this proposal together. It is indeed another big
missing piece of the puzzle.

I would like to echo some of the comments already made by others (and you
yourself) on this thread, that this proposal seems to have some inherent
conflicts between the 2 goals it tries to achieve.

> *Allowing users to import and export their labels in a standardized way
ensures that they do not experience lock-in to a particular wallet
application. As a secondary goal, by using common formats this BIP seeks to
make manual or bulk management of labels accessible to users outside of
wallet applications and without specific technical expertise.*

IMHO, the reason these conflicts exist is because the first one is an
engineering requirement, while the second one is a UX / product requirement.

Engineering requirements typically prioritize data integrity,
reliability/robustness and performance. Do we want some sort of error
detection / correction codes? What data format would be the most robust and
least error-prone? Is CSV a good fit or not for this purpose? etc.

UX requirements, on the other hand, typically prioritize convenience and
ease of use.

When we don’t separate these concerns it can backfire and we might end up
with a Frankenstein standard that is the worst of both worlds. That is: not
quite robust in engineering terms, but also not quite user-friendly in
product terms either.

SLIP-132 is one such example. It tries to solve what are inherently
engineering challenges — how to manage the complexities that arose due to
the evolution of keys and scripts — by sadly offloading those complexities
onto the end users. The end result is user confusion (what kind of [?]PUB
do I need here?) and a nightmare for engineers to maintain (the
complexities are better managed via a high level language such as Output
Descriptors).

Keeping in this mind, I also think having 2 separate BIPs for this is
better.

Cheers,
Hugo




On Mon, Aug 29, 2022 at 4:26 AM Craig Raw via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Thanks for your feedback @Ali.
>
> I am attempting to achieve two goals with this proposal, primarily for the
> benefit of wallet users:
>
> Goal #1. Transfer labels between different wallet implementations
> Goal #2. Manage labels in applications outside of Bitcoin wallets (such as
> Excel)
>
> Much of the feedback so far has indicated the tension between these two
> goals - it may be that it is too difficult to achieve both, in which case
> Goal #1 is the most important. That said, I think further exploration is
> still necessary before abandoning Goal #2, because removing it would
> significantly reduce the value of this proposal and mean users need to rely
> on application-specific workarounds.
>
> > it is important that a version byte is defined
> If Goal #2 is to be achieved it's difficult to mandate this, particularly
> if one requires bit flags to be set. Should an importing wallet fail to
> import if the version byte is not present, even if all the data is
> otherwise correct? Although it is difficult to know in advance how a format
> may be extended, it is certainly possible to extend this format with
> additional types where the nature of hashes serve as unique identifiers
> (more on this below).
>
>  > Don't mandate the file extension... There is no way to enforce this on
> a BIP level.
> I'm not quite sure what you mean here - for example BIP174, which is
> widely used, states "Binary PSBT files should use the .psbt file
> extension." Also, this contradicts Goal #2 - Excel and Numbers register as
> handlers for .csv, and so make it clear that the file is editable outside
> of a wallet.
>
> > ZIP does not have good performance or compression ratio
> Indeed, but it is very widely available. That said, gzip is supported
> widely too these days. Unfortunately, gzip does not offer encryption (see
> next answer).
>
> > ZIP is an archiving format, that happens to have its own compression
> format.
> I agree this is not ideal. My main reason for choosing ZIP was that it
> supports encryption. It seems to me that without considering encryption, an
> application must create label export files that allow privacy-sensitive
> wallet information to be readable in plain text. Being able to transfer
> labels without risking privacy is IMO valuable. I considered other
> encryption formats such as PGP, but they are much more niche and so again
> contradict Goal #2.
>
> > I don't see the benefit of encrypting addresses and labels together...
> additionally, the password you propose is insecure - anybody with access to
> the wallet can unlock it
> I'm not sure I understand your question, but both wallet addresses and
> wallet labels contain privacy-sensitive information that should be
> protected. Wrt to the password, there is actually a more fundamental
> problem with using the wallet xpub - there is no equivalent for multisig
> wallets. For this reason I'll remove that requirement in future iterations.
>
> > Why the need for input and output formats? There is no difference
> between them on the wallet level, because they are always identified with a
> txid and output index.
> The input refers to the txid and the input index (in the set of vin), so
> the difference is the context in which they are displayed. A wallet will
> not necessarily store the spent outputs for a funding transaction
> containing a UTXO coming into the wallet, but it will contain references to
> the inputs as part of that transaction.
>
> > Another important point is that practically nobody labels inputs or
> outputs
> To the contrary, UTXOs are very frequently labelled, as they link and
> reveal information when spent. Inputs are much less frequently labelled,
> but there is no particular reason to exclude them.
>
> > there is a net benefit for the addresses to be exported in ascending
> order
> Indeed, but it makes achieving Goal #2 much more difficult for marginal
> benefit.
>
> > It's better to mandate that they should always be double-quoted, since
> only wallets will generate label exports anyway.
> Rather I think it's better to mandate RFC4180 is followed, as per
> recommendations in other feedback.
>
> > The importing code is too naive... it should utilize a dedicate item
> type field that unambiguously identifies the item
> It's unclear to me what you mean here. As I've indicated it is currently
> possible to disambiguate between addresses/transactions/etc without the
> need for a 3rd column, but in any case the hash functions used ensure that
> labels will not be associated incorrectly. Even in the unlikely event of
> some future address type being indistinguishable from a txid, it will
> simply not match any txids in the wallet.
>
> Craig
>
>
>
> On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali@notatether.com> wrote:
>
>> Hi Craig,
>>
>> This a really good proposal. I studied your BIP and I have some feedback
>> on some parts of it.
>>
>> > The first line in the file is a header, and should be ignored on import.
>>
>> From past experience and lessons, most notably BIP39, it is important
>> that a version byte is defined somewhere in case someone wants to extend it
>> in the future, currently there is no version byte which someone can
>> increment if somebody wants to extend it. In the unique case of CSV files,
>> you should make the header line mandatory (I see you have already implied
>> this, but you should make it explicit in the BIP), but instead of a line
>> with columns in it, I suggest instead of Reference,Label, you make the
>> format like this:
>>
>> BIP-wallet-labels,<version>
>>
>> Since there are two columns per record, this works out nicely. The first
>> column can be the name of the BIP - BIPxxxx where the x's are numbers, and
>> the second column can be an unsigned 32-bit integer (most significant 8
>> bits reserved for version, the remaining for flags, or perhaps the entirety
>> for version - but I recommend leaving at least some bits for flags, even if
>> they all end up being just "reserved").
>>
>> You should make importing fail if the header line is not exactly as
>> specified - or appropriate, should you decide a different format for the
>> header.
>>
>> > Files exported should use the <tt>.csv</tt> file extension.
>> Don't mandate the file extension (read below for why):
>>
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> I see three problems with this. The first is more important than the
>> later two because it makes them moot points, but I'll mention them anyway
>> so you get a background of the situation:
>> - The BIP is trying to specify in what file format the export format can
>> be written in onto the filesystem. There is no way to enforce this on a BIP
>> level (besides, Unix operating systems don't even consider the file
>> extension, they use its mimetype). Also specifying this in the BIP will
>> prevent modular "Layer 2" protocols and schemes from encoding the Export
>> labels into another format - for example Base64 or with their own
>> compression algorithm.
>>
>> Now for the two "moot problems":
>> - ZIP does not have good performance or compression ratio, there are
>> better algorithms out there like gzip (which also happens to be more
>> ubiquitous; nearly all websites are serving HTML compressed with gzip
>> compression).
>> - ZIP is an archiving format, that happens to have its own compression
>> format. Archiving format parsers can have serious vulnerabilities in their
>> implementation that can allow malware to swipe private keys and passwords,
>> since the primary target for this BIP is wallets. For example, there was
>> Zip Slip[1] in 2018, which allows for remote code execution. So the malware
>> can even hide in memory until private keys or passwords are written to
>> memory, then send them accros the network. Assuming it's targeting a
>> specific wallet software it's not hard to carry out at all.
>>
>> There's two solutions for all this:
>> 1. The duck-tape solution: Use some compression algorithm like gzip
>> instead of ZIP archive format.
>> 2. The "throw it out and buy a new one" solution: Get rid of the optional
>> compression specs altogether, because users are responsible for supplying
>> the export labels in the first place, so all the compression stuff is
>> redundant and should be left up to the user use if they desire to.
>>
>> I prefer the second solution because it hits the nail at the problem
>> directly instead of putting duck tape on it like the first one.
>>
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> Not specific to AES, but I don't see the benefit of encrypting addresses
>> and labels together. Can you please elaborate why this would be desireable?
>>
>> Like I said though, it's better to leave it up to users to decide how to
>> store their exports, since BIPs can't enforce that anyway (additionally,
>> the password you propose is insecure - anybody with access to the wallet
>> can unlock it, which is not desireable to some users who want their own
>> security).
>>
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> Why the need for input and output formats? There is no difference between
>> them on the wallet level, because they are always identified with a txid
>> and output index. To distinguish between them and hence write them with the
>> correct format would require a UTXO set and thus access to a full node,
>> otherwise the CSV cannot be verified to be completely well-formed.
>>
>> Another important point is that practically nobody labels inputs or
>> outputs because most people do not know that those things even exist, and
>> the rest don't bother to label them.
>>
>> But the biggest downside to including them is related to the problem of
>> information leaking which you make reference to here:
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> A CSV dump that has inputs/outputs and addresses mixed together can infer
>> the owner of all those items. In fact, A CVS label dump is basically a
>> personal information store so everything in it can be correlated as coming
>> from the same wallet, so it's important that unnecessary types are kept out
>> of the format. People are known to leave files lying around on their
>> computer that they don't need anymore, so these files can find their way
>> via telemetry to surveillence entities. While we can't specify what users
>> can do with their exports, we can control the information leak by
>> preventing certain types of items that we know most users will never use
>> from being exported in the first place.
>>
>> > The order in which these records appear is not defined.
>> Again, since the primary use case for this BIP is wallets, which likely
>> use heirarchical derivation schemes like BIP44, there is a net benefit for
>> the addresses to be exported in ascending order of their `address_type`. It
>> means that wallets can import them in O(n) time as opposed to O(n^2) time
>> spent serially checking in which index the address appears at. Of course,
>> this implies that all addresses up to a certain index have to be exported
>> into the CSV as well, but most wallets I know of like Core, Electrum
>> already store addresses like that.
>>
>> Also if you do this, you will need to group all the transaction records
>> before the address records or vice versa - you can use lexigraphical
>> sorting if you want (ie. Addresses before Transactions). The benefit of
>> this separation of parts is that wallets can split the imported address
>> records from the transaction records internally, and feed them to separate
>> functions which set these labels internally.
>>
>> If you decide on doing it this way, then you need a 3rd column to
>> identify the item type, and also you should quote the label (see below). I
>> strongly recommend using numbers for identification as opposed to character
>> strings, so you don't have to worry about localization or character case
>> issues. There is always one unique number, but there could be multiple
>> strings that reference the same type. This will complicate importing
>> functions.
>>
>> If you insist on include Input and Output types then they can both be
>> specified as <txid>:<index> if you do this change. They won't be used to
>> determine the type anyway.
>>
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> Don't implement it like that, because that will break CSV parsers which
>> expect a fixed amount of rows in each record (2 in the header, and some
>> rows have >2 rows). It's better to mandate that they should always be
>> double-quoted, since only wallets will generate label exports anyway. If
>> you plan to use headers then the 3rd column can be blank for it (or you can
>> split the version and flags from each other).
>>
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> >   if reference length < 64
>> >     Set address label
>> >   else if reference length == 64
>> >     Set transaction label
>> >   else if reference contains '<'
>> >     Set input label
>> >   else
>> >     Set output label
>> > </pre>
>> The importing code is too naive and in its current form will prevent the
>> BIP from getting a number. It is perhaps the single most important part of
>> a BIP. When implementing an importer, it should utilize a dedicate item
>> type field that unambiguously identifies the item. So the naive importer is
>> not good, you need use a 3rd column for that like I explained above, so
>> that the importer becomes robust.
>>
>> In summary (exclamation marks indicate severity - one means low, two
>> means medium, and three means high):
>>
>> 1. Convert the header into a version line with optional flags, otherwise
>> nobody can extend this format without compatibility issues (!)
>> 2. Get rid of the specs related to file compression (!!!)
>> 3. Add a 3rd column for item type (address, transaction etc.) preferably
>> as numeric constants and grouping items of one type after items of another
>> type, or if you insist on strings, then only recognize their Titlecase
>> ASCII versions <spreadsheet software like Excel always tries to titlecase
>> the words> (!!)
>> 4. Require double quotes around the label (or single quotes if you
>> prefer, as long as spreadsheet software doesn't choke on them) (!!)
>> 5. Require sorting the records according to the order they are stored in
>> the wallet implementation. (!)
>> 6. Consider getting rid of Input and Output item types. (!)
>> 7. And last and most importantly, please write a more robust importer
>> algorithm in the example given by the BIP, because code in BIPs are
>> frequently used as references for software. (!!!)
>>
>> I hope you will consider these points in future revisions of your BIP.
>>
>> - Ali
>>
>> [1] https://github.com/snyk/zip-slip-vulnerability
>>
>> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
>> > Hi all,
>> >
>> > I would like to propose a BIP that specifies a format for the export and
>> > import of labels from a wallet. While transferring access to funds
>> across
>> > wallet applications has been made simple through standards such as
>> BIP39,
>> > wallet labels remain siloed and difficult to extract despite their
>> value,
>> > particularly in a privacy context.
>> >
>> > The proposed format is a simple two column CSV file, with the reference
>> to
>> > a transaction, address, input or output in the first column, and the
>> label
>> > in the second column. CSV was chosen for its wide accessibility,
>> especially
>> > to users without specific technical expertise. Similarly, the CSV file
>> may
>> > be compressed using the ZIP format, and optionally encrypted using AES.
>> >
>> > The full text of the BIP can be found at
>> >
>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
>> > and also copied below.
>> >
>> > Feedback is appreciated.
>> >
>> > Thanks,
>> > Craig Raw
>> >
>> > ---
>> >
>> > <pre>
>> >   BIP: wallet-labels
>> >   Layer: Applications
>> >   Title: Wallet Labels Export Format
>> >   Author: Craig Raw <craig@sparrowwallet.com>
>> >   Comments-Summary: No comments yet.
>> >   Comments-URI:
>> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>> >   Status: Draft
>> >   Type: Informational
>> >   Created: 2022-08-23
>> >   License: BSD-2-Clause
>> > </pre>
>> >
>> > ==Abstract==
>> >
>> > This document specifies a format for the export of labels that may be
>> > attached to the transactions, addresses, input and outputs in a wallet.
>> >
>> > ==Copyright==
>> >
>> > This BIP is licensed under the BSD 2-clause license.
>> >
>> > ==Motivation==
>> >
>> > The export and import of funds across different Bitcoin wallet
>> applications
>> > is well defined through standards such as BIP39, BIP32, BIP44 etc.
>> > These standards are well supported and allow users to move easily
>> between
>> > different wallets.
>> > There is, however, no defined standard to transfer any labels the user
>> may
>> > have applied to the transactions, addresses, inputs or outputs in their
>> > wallet.
>> > The UTXO model that Bitcoin uses makes these labels particularly
>> valuable
>> > as they may indicate the source of funds, whether received externally
>> or as
>> > a result of change from a prior transaction.
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> > Labels provide valuable guidance in this regard, and have even become
>> > mandatory when spending in several Bitcoin wallets.
>> > Allowing users to export their labels in a standardized way ensures that
>> > they do not experience lock-in to a particular wallet application.
>> > In addition, by using common formats, this BIP seeks to make manual or
>> bulk
>> > management of labels accessible to users without specific technical
>> > expertise.
>> >
>> > ==Specification==
>> >
>> > In order to make the import and export of labels as widely accessible as
>> > possible, this BIP uses the comma separated values (CSV) format, which
>> is
>> > widely supported by consumer, business, and scientific applications.
>> > Although the technical specification of CSV in RFC4180 is not always
>> > followed, the application of the format in this BIP is simple enough
>> that
>> > compatibility should not present a problem.
>> > Moreover, the simplicity and forgiving nature of CSV (over for example
>> > JSON) lends itself well to bulk label editing using spreadsheet and text
>> > editing tools.
>> >
>> > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
>> > containing one record per line, with records containing two fields
>> > delimited by a comma.
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> > The first line in the file is a header, and should be ignored on import.
>> > Thereafter, each line represents a record that refers to a label
>> applied in
>> > the wallet.
>> > The order in which these records appear is not defined.
>> >
>> > The first field in the record contains a reference to the transaction,
>> > address, input or output in the wallet.
>> > This is specified as one of the following:
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> >
>> > The second field contains the label applied to the reference.
>> > Exporting applications may omit records with no labels or labels of zero
>> > length.
>> > Files exported should use the <tt>.csv</tt> file extension.
>> >
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > In order to ensure that weak encryption does not proliferate, importers
>> > following this standard must refuse to import <tt>.zip</tt> files
>> encrypted
>> > with the weaker Zip 2.0 standard.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> >
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> >   if reference length < 64
>> >     Set address label
>> >   else if reference length == 64
>> >     Set transaction label
>> >   else if reference contains '<'
>> >     Set input label
>> >   else
>> >     Set output label
>> > </pre>
>> >
>> > Importing applications may truncate labels if necessary.
>> >
>> > ==Test Vectors==
>> >
>> > The following fragment represents a wallet label export:
>> > <pre>
>> > Reference,Label
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
>> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
>> > (alternative)
>> > </pre>
>> >
>> > ==Reference Implementation==
>> >
>> > TBD
>>
>> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 28465 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24  9:18 Craig Raw
                   ` (2 preceding siblings ...)
  2022-08-24 19:01 ` rhavar
@ 2022-08-29 19:52 ` NVK
  2022-09-26  8:23 ` Craig Raw
  4 siblings, 0 replies; 20+ messages in thread
From: NVK @ 2022-08-29 19:52 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 7339 bytes --]

Hello,

Thanks for this proposal.

I was trying to avoid adding more opinions / bike-shdding to the discussion and didn’t want to particularly pick at any of the threads.

But, I think I’d like to at least voice at how important having a human readable format for this is. CSV is indeed a format with many shortcomings, but so are most cross applications open formats that are human readable. I go through this every month for business and personal.

If contention is too high for CSV as cross application for import/export then maybe the route of two file formats maybe awkward but necessary. JSON maybe used as the choice for bitcoin clients for label syncing and CSV as the export for other purposes. I believe CSV is importable by most accounting software, old and new. JSON is not.

In regards to encryption, AES on 7z is a great, wide os native support.

Best,

NVK

> On Aug 24, 2022, at 05:46, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> Hi all,
> 
> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
> 
> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
> 
> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
> 
> Feedback is appreciated.
> 
> Thanks,
> Craig Raw
> 
> ---
> 
> <pre>
>   BIP: wallet-labels
>   Layer: Applications
>   Title: Wallet Labels Export Format
>   Author: Craig Raw <craig@sparrowwallet.com>
>   Comments-Summary: No comments yet.
>   Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>   Status: Draft
>   Type: Informational
>   Created: 2022-08-23
>   License: BSD-2-Clause
> </pre>
> 
> ==Abstract==
> 
> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
> 
> ==Copyright==
> 
> This BIP is licensed under the BSD 2-clause license.
> 
> ==Motivation==
> 
> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily between different wallets.
> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
> 
> ==Specification==
> 
> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools. 
> 
> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied in the wallet.
> The order in which these records appear is not defined.
> 
> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> 
> The second field contains the label applied to the reference. 
> Exporting applications may omit records with no labels or labels of zero length.
> Files exported should use the <tt>.csv</tt> file extension.
> 
> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip. 
> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> 
> ==Importing==
> 
> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs. 
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length == 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
> 
> Importing applications may truncate labels if necessary.
> 
> ==Test Vectors==
> 
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output (alternative)
> </pre>
> 
> ==Reference Implementation==
> 
> TBD
> 
> 
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

[-- Attachment #2: Type: text/html, Size: 8584 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-29 15:46 ` Ali Sherief
@ 2022-08-29 18:19   ` Christopher Allen
  0 siblings, 0 replies; 20+ messages in thread
From: Christopher Allen @ 2022-08-29 18:19 UTC (permalink / raw)
  To: Ali Sherief, Bitcoin Protocol Discussion; +Cc: Nicholas Ochiel

[-- Attachment #1: Type: text/plain, Size: 3693 bytes --]

On Mon, Aug 29, 2022 at 9:12 AM Ali Sherief via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> I am aware that business processes are mostly CSV file oriented


I disagree that business processes are mostly CSV.  Amateur processes
maybe, but professional accounting, no. Trying to do my business accounting
with CSV files from various exchanges is PITA.


> so you can make a statement akin to BIP174 in the Goal 2 BIP, that expects
> the medium of exchange to be in files ending in .csv. I wouldn't mind if
> you require .csv file extension in a BIP for Goal 2. But such a statement
> is not appropriate in the Goal 1 BIP which is only concerned with the
> wallet label format itself.


I too would like to see some separation of layers here, as there are other
possible output formats. Maybe expanding on another use case for this data
would help.

I've been working with @nochiel <https://github.com/nochiel> on export
to a Plain-Text
Accounting <https://plaintextaccounting.org> friendly format,
initially the beancount
python app <https://github.com/beancount/beancount/>: (our prototype is
current at /beancounter.py but it is being refactored into new repo).

Basically what the final tool will do is: given a descriptor, get any
transactions for that descriptor from a random explora via Tor (initially
ours and Blockstream's), and then get price information from a random
Spotbit price server via Tor (initially just ours, but seeking more hosts),
and export a beancount compatible file.

```

python app.py beancount
"wpkh(tpubD9hudZxy8Uj3453QrsEbr8KiyXTYC5ExHjJ5sNDVW7yKJ8wc7acKQcpdbvZX6dFerHK6MfVvs78VvGfotjN28yC4ij6nr4uSVhX2qorUV8V/0/*)"
Outputs: spotbit.beancount

2008-10-31 commodity BTC
  name: "Bitcoin"
  asset-class: "cryptocurrency"

2018-04-02 open Assets:BTC BTC
2018-04-02 open Liabilities:Cash:USDT USDT

2018-04-02 * "tb1qcrekknrspx28t9vl53ltsag5gqgqdj066ydf75" "Transaction
hash: 2a2f7f24761fa54cb6e559efea5678415d9cbbabc42e6a4e2ce463ee3c446230"
Assets:BTC 1.00000000 BTC { 6935.16 USDT }
Liabilities:Cash:USDT - 6935.16 USDT

2018-04-02 * "tb1q45whzx3emntntnpzjdx3gzj6z5cgxakkg7s3sa" "Transaction
hash: 387123efcaa707759a4af8159cb1309fae86b793d26b5fd8bba42637852dde89"
Assets:BTC - 0.36300616 BTC @  6935.16 USDT
Liabilities:Cash:USDT 2517.51 USDT

2018-04-02 * "tb1qgv5484m83e2mzz3n8tf4snvnwj5qgqgampnhvv" "Transaction
hash: 387123efcaa707759a4af8159cb1309fae86b793d26b5fd8bba42637852dde89"
Assets:BTC - 0.63699243 BTC @  6935.16 USDT
Liabilities:Cash:USDT 4417.64 USDT

2018-04-02 * "tb1q45whzx3emntntnpzjdx3gzj6z5cgxakkg7s3sa" "Transaction
hash: 387123efcaa707759a4af8159cb1309fae86b793d26b5fd8bba42637852dde89"
Assets:BTC 0.36300616 BTC { 6935.16 USDT }
Liabilities:Cash:USDT - 2517.51 USDT
```

I can then use the beancount cli app (or it's fava webapp) to easily add
other details to this file to do my bitcoin accounting (and any other
accounting I need). In particular, as beancount support lots, it solves a
problem for me with US taxes which is unrealized capital gain (I get 1 BTC
from donor at $20K, the price goes up to $30K and I pay it to an engineer,
my BTC balance is 0 but my unrealized capital gain for US tax purposes  is
$10K).

More ideally, if there were additional details that I could merge in from
my wallet export, such as payer and payee, notes, etc. it would make my
accounting much easier.

Thus I'd like to see an easier and interoperable way to merge these details
(my account details from an Esplora and price details from Spotbit), with
what my different wallets may (or may not) have available.

I hope that this might inspire some ideas from the people working on this
wallet export format.

-- Christopher Allen

[-- Attachment #2: Type: text/html, Size: 5072 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
       [not found] <mailman.13106.1661772392.956.bitcoin-dev@lists.linuxfoundation.org>
@ 2022-08-29 15:46 ` Ali Sherief
  2022-08-29 18:19   ` Christopher Allen
  0 siblings, 1 reply; 20+ messages in thread
From: Ali Sherief @ 2022-08-29 15:46 UTC (permalink / raw)
  To: craigraw; +Cc: bitcoin-dev

> I am attempting to achieve two goals with this proposal, primarily for the
> benefit of wallet users:
>
> Goal #1. Transfer labels between different wallet implementations
> Goal #2. Manage labels in applications outside of Bitcoin wallets (such as
> Excel)
>
> Much of the feedback so far has indicated the tension between these two
> goals - it may be that it is too difficult to achieve both, in which case
> Goal #1 is the most important. That said, I think further exploration is
> still necessary before abandoning Goal #2, because removing it would
> significantly reduce the value of this proposal and mean users need to rely
> on application-specific workarounds.
In my opinion, it would be best if these two goals were split into two separate BIPs where the BIP for Goal 2 requires Goal 1's BIP, gut Goal 1's BIP is independent. This is because wallet software and business spreadsheet processes have different and in some cases divergent needs.

A BIP shouldn't try to address too many things at once, that's why technologies like Segwit and Taproot were split into four or five BIPs each.

>  > Don't mandate the file extension... There is no way to enforce this on a
> BIP level.
> I'm not quite sure what you mean here - for example BIP174, which is widely
> used, states "Binary PSBT files should use the .psbt file extension." Also,
> this contradicts Goal #2 - Excel and Numbers register as handlers for .csv,
> and so make it clear that the file is editable outside of a wallet.
BIP174's assignment is a specification but not a hard requirement, becase if you have a file whose extension implies one type, but its MIME type (obtained from inspecting the file contents) indicates another type, then the extension should be disregarded by the parser.

I am aware that business processes are mostly CSV file oriented so you can make a statement akin to BIP174 in the Goal 2 BIP, that expects the medium of exchange to be in files ending in .csv. I wouldn't mind if you require .csv file extension in a BIP for Goal 2. But such a statement is not appropriate in the Goal 1 BIP which is only concerned with the wallet label format itself.

> > ZIP does not have good performance or compression ratio
> Indeed, but it is very widely available. That said, gzip is supported
> widely too these days. Unfortunately, gzip does not offer encryption (see
> next answer).
>
> > ZIP is an archiving format, that happens to have its own compression
> format.
> I agree this is not ideal. My main reason for choosing ZIP was that it
> supports encryption. It seems to me that without considering encryption, an
> application must create label export files that allow privacy-sensitive
> wallet information to be readable in plain text. Being able to transfer
> labels without risking privacy is IMO valuable. I considered other
> encryption formats such as PGP, but they are much more niche and so again
> contradict Goal #2.
Both of these look like parts of the spec that should be in the Goal 2 BIP. Because Goal 1, which is only concerned with wallet label importing, does not need to interact with compression or encryption.

I don't mind if you make Goal 2 BIP utilize ZIP compression with optional encryption, it's just that specifying this in the same place in the Goal 1 BIP stuff forces wallets to check for that stuff too to be compliant. It's important to make compliance as easy as possible.

Regardless, I still believe that making the xpub the ZIP password is a bad design, because some wallets that are made from a random list of private keys do not have xpubs at all. If the purpose of a password is to make label sharing between two parties secure, then why not simply let them agree on a password for their own use?

> > I don't see the benefit of encrypting addresses and labels together...
> additionally, the password you propose is insecure - anybody with access to
> the wallet can unlock it
> I'm not sure I understand your question, but both wallet addresses and
> wallet labels contain privacy-sensitive information that should be
> protected. Wrt to the password, there is actually a more fundamental
> problem with using the wallet xpub - there is no equivalent for multisig
> wallets. For this reason I'll remove that requirement in future iterations.
Let me explain.

Before you partitioned the BIP into two goals, I was under the impression that wallets would have to read an encrypted export file, which seemed very overkill to me (for one, all wallets would now need to bundle a ZIP or AES dependency module with their program).

But now I see why a password and encryption would be desireable for Goal 2 BIP applications. Like I said though, Goal 1 BIP applications (i.e. wallets) do not need any of that.

> > Why the need for input and output formats? There is no difference between
> them on the wallet level, because they are always identified with a txid
> and output index.
> The input refers to the txid and the input index (in the set of vin), so
> the difference is the context in which they are displayed. A wallet will
> not necessarily store the spent outputs for a funding transaction
> containing a UTXO coming into the wallet, but it will contain references to
> the inputs as part of that transaction.
>
> > Another important point is that practically nobody labels inputs or
> outputs
> To the contrary, UTXOs are very frequently labelled, as they link and
> reveal information when spent. Inputs are much less frequently labelled,
> but there is no particular reason to exclude them.
>
> > there is a net benefit for the addresses to be exported in ascending order
> Indeed, but it makes achieving Goal #2 much more difficult for marginal
> benefit.
Fair enough.

> > It's better to mandate that they should always be double-quoted, since
> only wallets will generate label exports anyway.
> Rather I think it's better to mandate RFC4180 is followed, as per
> recommendations in other feedback.
I agree with this.

> > The importing code is too naive... it should utilize a dedicate item type
> field that unambiguously identifies the item
> It's unclear to me what you mean here. As I've indicated it is currently
> possible to disambiguate between addresses/transactions/etc without the
> need for a 3rd column, but in any case the hash functions used ensure that
> labels will not be associated incorrectly. Even in the unlikely event of
> some future address type being indistinguishable from a txid, it will
> simply not match any txids in the wallet.
You already have a custom format proposed here, but this importer relies on heuristics of the data like how long it is, what characters it has, and so on. It is better for the importer to have the same kind of conditions.

You can make parsing vastly simpler by prefixing the items with some text. Similar to how we have "bitcoin:" for indicating a bitcoin address, you can have "address:", "transaction:", "input:", and "output:" at the beginning of each entity.

This has a major advantage: You can add new formats in a backward-compatible way without breaking parsers, since the parsers never depended on the text heuristics in the first place, therefore you don't have to clutter the BIP(s) with even more test vectors for these cases. You won't even need a version byte, since the only revision that will ever be made (that doesn't modify any existing format to preserve backward-compatibility) are adding new formats.

Take a look at your sample:

> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length == 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label

versus how mine would look:

> if reference.startsWith("address:")
>   Set address label
> else if reference.startsWith("transaction:")
>   Set transaction label
> else if reference.startsWith("input:")
>   Set input label
> else if reference.startsWith("output:")
>   Set output label
> # No else case: allows for future extensions

See how simpler it is to understand?

The truth is, a format has to be defined that developers find it easy to implement. If the implementation is such that a developer could misunderstand at a first glance, they will implement it wrongly, creating bugs.

Looking at your sample, a developer would think as such: "anything less than 64 chars is an address, anything 64 chars long is a transaction, anything that contains a '<' is an input (and is also greater than 64 chars), and everything else is an output (>64 chars and has no  '<')."

In light of all this, is it not much easier to simply introduce a prefix at the beginning of each entity? It has a negligible space cost. The "else" case can be ommitted hypothetically (although that's not strictly necessary), so developers can just add more "else if"'s when a BIP revision is made.

A good way to see if a reference implementation has a good design is by asking yourself the following question: Imagine you are committing your reference into Bitcoin Core. Based on the code quality, would a pull request for that be merged, or not?


So to summarize, I strongly suggest you do the following:
- Split the BIP into two, one defines the CSV format for label import/export between wallets, and the other defines workflows for distributing and sharing label CSVs in a universal and safe way.
- Add prefixes before each entity, so in other words: "adddress:bc1q23456...", "transaction:432abd874d...", "input:432abd874d...<DELIMITER>1", "output:432abd874d...<DELIMITER>1". Replace <DELIMITER> with any delimiter you want, it doesn't have to be consistent. This will make it much simpler to implement an importer, without applications doing any of the hacks that RHavar wrote about (IMO this is what people mean when they say that implementing a CSV importer will be complex work).

- Ali

On Mon, Aug 29, 2022 at 11:26:32AM +0000, craigraw@gmail.com wrote:
> Thanks for your feedback @Ali.
>
>
> > it is important that a version byte is defined
> If Goal #2 is to be achieved it's difficult to mandate this, particularly
> if one requires bit flags to be set. Should an importing wallet fail to
> import if the version byte is not present, even if all the data is
> otherwise correct? Although it is difficult to know in advance how a format
> may be extended, it is certainly possible to extend this format with
> additional types where the nature of hashes serve as unique identifiers
> (more on this below).
>
>  > Don't mandate the file extension... There is no way to enforce this on a
> BIP level.
> I'm not quite sure what you mean here - for example BIP174, which is widely
> used, states "Binary PSBT files should use the .psbt file extension." Also,
> this contradicts Goal #2 - Excel and Numbers register as handlers for .csv,
> and so make it clear that the file is editable outside of a wallet.
>
> > ZIP does not have good performance or compression ratio
> Indeed, but it is very widely available. That said, gzip is supported
> widely too these days. Unfortunately, gzip does not offer encryption (see
> next answer).
>
> > ZIP is an archiving format, that happens to have its own compression
> format.
> I agree this is not ideal. My main reason for choosing ZIP was that it
> supports encryption. It seems to me that without considering encryption, an
> application must create label export files that allow privacy-sensitive
> wallet information to be readable in plain text. Being able to transfer
> labels without risking privacy is IMO valuable. I considered other
> encryption formats such as PGP, but they are much more niche and so again
> contradict Goal #2.
>
> > I don't see the benefit of encrypting addresses and labels together...
> additionally, the password you propose is insecure - anybody with access to
> the wallet can unlock it
> I'm not sure I understand your question, but both wallet addresses and
> wallet labels contain privacy-sensitive information that should be
> protected. Wrt to the password, there is actually a more fundamental
> problem with using the wallet xpub - there is no equivalent for multisig
> wallets. For this reason I'll remove that requirement in future iterations.
>
> > Why the need for input and output formats? There is no difference between
> them on the wallet level, because they are always identified with a txid
> and output index.
> The input refers to the txid and the input index (in the set of vin), so
> the difference is the context in which they are displayed. A wallet will
> not necessarily store the spent outputs for a funding transaction
> containing a UTXO coming into the wallet, but it will contain references to
> the inputs as part of that transaction.
>
> > Another important point is that practically nobody labels inputs or
> outputs
> To the contrary, UTXOs are very frequently labelled, as they link and
> reveal information when spent. Inputs are much less frequently labelled,
> but there is no particular reason to exclude them.
>
> > there is a net benefit for the addresses to be exported in ascending order
> Indeed, but it makes achieving Goal #2 much more difficult for marginal
> benefit.
>
> > It's better to mandate that they should always be double-quoted, since
> only wallets will generate label exports anyway.
> Rather I think it's better to mandate RFC4180 is followed, as per
> recommendations in other feedback.
>
> > The importing code is too naive... it should utilize a dedicate item type
> field that unambiguously identifies the item
> It's unclear to me what you mean here. As I've indicated it is currently
> possible to disambiguate between addresses/transactions/etc without the
> need for a 3rd column, but in any case the hash functions used ensure that
> labels will not be associated incorrectly. Even in the unlikely event of
> some future address type being indistinguishable from a txid, it will
> simply not match any txids in the wallet.
>
> Craig
>
>
>
> On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali@notatether.com> wrote:
>
> > Hi Craig,
> >
> > This a really good proposal. I studied your BIP and I have some feedback
> > on some parts of it.
> >
> > > The first line in the file is a header, and should be ignored on import.
> >
> > From past experience and lessons, most notably BIP39, it is important that
> > a version byte is defined somewhere in case someone wants to extend it in
> > the future, currently there is no version byte which someone can increment
> > if somebody wants to extend it. In the unique case of CSV files, you should
> > make the header line mandatory (I see you have already implied this, but
> > you should make it explicit in the BIP), but instead of a line with columns
> > in it, I suggest instead of Reference,Label, you make the format like this:
> >
> > BIP-wallet-labels,<version>
> >
> > Since there are two columns per record, this works out nicely. The first
> > column can be the name of the BIP - BIPxxxx where the x's are numbers, and
> > the second column can be an unsigned 32-bit integer (most significant 8
> > bits reserved for version, the remaining for flags, or perhaps the entirety
> > for version - but I recommend leaving at least some bits for flags, even if
> > they all end up being just "reserved").
> >
> > You should make importing fail if the header line is not exactly as
> > specified - or appropriate, should you decide a different format for the
> > header.
> >
> > > Files exported should use the <tt>.csv</tt> file extension.
> > Don't mandate the file extension (read below for why):
> >
> > > In order to reduce file size while retaining wide accessibility, the CSV
> > > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > > file extension.
> > I see three problems with this. The first is more important than the later
> > two because it makes them moot points, but I'll mention them anyway so you
> > get a background of the situation:
> > - The BIP is trying to specify in what file format the export format can
> > be written in onto the filesystem. There is no way to enforce this on a BIP
> > level (besides, Unix operating systems don't even consider the file
> > extension, they use its mimetype). Also specifying this in the BIP will
> > prevent modular "Layer 2" protocols and schemes from encoding the Export
> > labels into another format - for example Base64 or with their own
> > compression algorithm.
> >
> > Now for the two "moot problems":
> > - ZIP does not have good performance or compression ratio, there are
> > better algorithms out there like gzip (which also happens to be more
> > ubiquitous; nearly all websites are serving HTML compressed with gzip
> > compression).
> > - ZIP is an archiving format, that happens to have its own compression
> > format. Archiving format parsers can have serious vulnerabilities in their
> > implementation that can allow malware to swipe private keys and passwords,
> > since the primary target for this BIP is wallets. For example, there was
> > Zip Slip[1] in 2018, which allows for remote code execution. So the malware
> > can even hide in memory until private keys or passwords are written to
> > memory, then send them accros the network. Assuming it's targeting a
> > specific wallet software it's not hard to carry out at all.
> >
> > There's two solutions for all this:
> > 1. The duck-tape solution: Use some compression algorithm like gzip
> > instead of ZIP archive format.
> > 2. The "throw it out and buy a new one" solution: Get rid of the optional
> > compression specs altogether, because users are responsible for supplying
> > the export labels in the first place, so all the compression stuff is
> > redundant and should be left up to the user use if they desire to.
> >
> > I prefer the second solution because it hits the nail at the problem
> > directly instead of putting duck tape on it like the first one.
> >
> > > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> > or
> > > AES-256 encryption, which is supported by numerous applications including
> > > Winzip and 7-zip.
> > > The textual representation of the wallet's extended public key (as
> > defined
> > > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> > Not specific to AES, but I don't see the benefit of encrypting addresses
> > and labels together. Can you please elaborate why this would be desireable?
> >
> > Like I said though, it's better to leave it up to users to decide how to
> > store their exports, since BIPs can't enforce that anyway (additionally,
> > the password you propose is insecure - anybody with access to the wallet
> > can unlock it, which is not desireable to some users who want their own
> > security).
> >
> > > * Transaction ID (<tt>txid</tt>)
> > > * Address
> > > * Input (rendered as <tt>txid<index</tt>)
> > > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> > Why the need for input and output formats? There is no difference between
> > them on the wallet level, because they are always identified with a txid
> > and output index. To distinguish between them and hence write them with the
> > correct format would require a UTXO set and thus access to a full node,
> > otherwise the CSV cannot be verified to be completely well-formed.
> >
> > Another important point is that practically nobody labels inputs or
> > outputs because most people do not know that those things even exist, and
> > the rest don't bother to label them.
> >
> > But the biggest downside to including them is related to the problem of
> > information leaking which you make reference to here:
> > > In both cases, care must be taken when spending to avoid undesirable
> > leaks
> > > of private information.
> > A CSV dump that has inputs/outputs and addresses mixed together can infer
> > the owner of all those items. In fact, A CVS label dump is basically a
> > personal information store so everything in it can be correlated as coming
> > from the same wallet, so it's important that unnecessary types are kept out
> > of the format. People are known to leave files lying around on their
> > computer that they don't need anymore, so these files can find their way
> > via telemetry to surveillence entities. While we can't specify what users
> > can do with their exports, we can control the information leak by
> > preventing certain types of items that we know most users will never use
> > from being exported in the first place.
> >
> > > The order in which these records appear is not defined.
> > Again, since the primary use case for this BIP is wallets, which likely
> > use heirarchical derivation schemes like BIP44, there is a net benefit for
> > the addresses to be exported in ascending order of their `address_type`. It
> > means that wallets can import them in O(n) time as opposed to O(n^2) time
> > spent serially checking in which index the address appears at. Of course,
> > this implies that all addresses up to a certain index have to be exported
> > into the CSV as well, but most wallets I know of like Core, Electrum
> > already store addresses like that.
> >
> > Also if you do this, you will need to group all the transaction records
> > before the address records or vice versa - you can use lexigraphical
> > sorting if you want (ie. Addresses before Transactions). The benefit of
> > this separation of parts is that wallets can split the imported address
> > records from the transaction records internally, and feed them to separate
> > functions which set these labels internally.
> >
> > If you decide on doing it this way, then you need a 3rd column to identify
> > the item type, and also you should quote the label (see below). I strongly
> > recommend using numbers for identification as opposed to character strings,
> > so you don't have to worry about localization or character case issues.
> > There is always one unique number, but there could be multiple strings that
> > reference the same type. This will complicate importing functions.
> >
> > If you insist on include Input and Output types then they can both be
> > specified as <txid>:<index> if you do this change. They won't be used to
> > determine the type anyway.
> >
> > > The fields may be quoted, but this is unnecessary, as the first comma in
> > > the line will always be the delimiter.
> > Don't implement it like that, because that will break CSV parsers which
> > expect a fixed amount of rows in each record (2 in the header, and some
> > rows have >2 rows). It's better to mandate that they should always be
> > double-quoted, since only wallets will generate label exports anyway. If
> > you plan to use headers then the 3rd column can be blank for it (or you can
> > split the version and flags from each other).
> >
> > > ==Importing==
> > >
> > > When importing, a naive algorithm may simply match against any reference,
> > > but it is possible to disambiguate between transactions, addresses,
> > inputs
> > > and outputs.
> > > For example in the following pseudocode:
> > > <pre>
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length == 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > </pre>
> > The importing code is too naive and in its current form will prevent the
> > BIP from getting a number. It is perhaps the single most important part of
> > a BIP. When implementing an importer, it should utilize a dedicate item
> > type field that unambiguously identifies the item. So the naive importer is
> > not good, you need use a 3rd column for that like I explained above, so
> > that the importer becomes robust.
> >
> > In summary (exclamation marks indicate severity - one means low, two means
> > medium, and three means high):
> >
> > 1. Convert the header into a version line with optional flags, otherwise
> > nobody can extend this format without compatibility issues (!)
> > 2. Get rid of the specs related to file compression (!!!)
> > 3. Add a 3rd column for item type (address, transaction etc.) preferably
> > as numeric constants and grouping items of one type after items of another
> > type, or if you insist on strings, then only recognize their Titlecase
> > ASCII versions <spreadsheet software like Excel always tries to titlecase
> > the words> (!!)
> > 4. Require double quotes around the label (or single quotes if you prefer,
> > as long as spreadsheet software doesn't choke on them) (!!)
> > 5. Require sorting the records according to the order they are stored in
> > the wallet implementation. (!)
> > 6. Consider getting rid of Input and Output item types. (!)
> > 7. And last and most importantly, please write a more robust importer
> > algorithm in the example given by the BIP, because code in BIPs are
> > frequently used as references for software. (!!!)
> >
> > I hope you will consider these points in future revisions of your BIP.
> >
> > - Ali
> >
> > [1] https://github.com/snyk/zip-slip-vulnerability
> >
> > On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> > > Hi all,
> > >
> > > I would like to propose a BIP that specifies a format for the export and
> > > import of labels from a wallet. While transferring access to funds across
> > > wallet applications has been made simple through standards such as BIP39,
> > > wallet labels remain siloed and difficult to extract despite their value,
> > > particularly in a privacy context.
> > >
> > > The proposed format is a simple two column CSV file, with the reference
> > to
> > > a transaction, address, input or output in the first column, and the
> > label
> > > in the second column. CSV was chosen for its wide accessibility,
> > especially
> > > to users without specific technical expertise. Similarly, the CSV file
> > may
> > > be compressed using the ZIP format, and optionally encrypted using AES.
> > >
> > > The full text of the BIP can be found at
> > > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> > > and also copied below.
> > >
> > > Feedback is appreciated.
> > >
> > > Thanks,
> > > Craig Raw
> > >
> > > ---
> > >
> > > <pre>
> > >   BIP: wallet-labels
> > >   Layer: Applications
> > >   Title: Wallet Labels Export Format
> > >   Author: Craig Raw <craig@sparrowwallet.com>
> > >   Comments-Summary: No comments yet.
> > >   Comments-URI:
> > > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> > >   Status: Draft
> > >   Type: Informational
> > >   Created: 2022-08-23
> > >   License: BSD-2-Clause
> > > </pre>
> > >
> > > ==Abstract==
> > >
> > > This document specifies a format for the export of labels that may be
> > > attached to the transactions, addresses, input and outputs in a wallet.
> > >
> > > ==Copyright==
> > >
> > > This BIP is licensed under the BSD 2-clause license.
> > >
> > > ==Motivation==
> > >
> > > The export and import of funds across different Bitcoin wallet
> > applications
> > > is well defined through standards such as BIP39, BIP32, BIP44 etc.
> > > These standards are well supported and allow users to move easily between
> > > different wallets.
> > > There is, however, no defined standard to transfer any labels the user
> > may
> > > have applied to the transactions, addresses, inputs or outputs in their
> > > wallet.
> > > The UTXO model that Bitcoin uses makes these labels particularly valuable
> > > as they may indicate the source of funds, whether received externally or
> > as
> > > a result of change from a prior transaction.
> > > In both cases, care must be taken when spending to avoid undesirable
> > leaks
> > > of private information.
> > > Labels provide valuable guidance in this regard, and have even become
> > > mandatory when spending in several Bitcoin wallets.
> > > Allowing users to export their labels in a standardized way ensures that
> > > they do not experience lock-in to a particular wallet application.
> > > In addition, by using common formats, this BIP seeks to make manual or
> > bulk
> > > management of labels accessible to users without specific technical
> > > expertise.
> > >
> > > ==Specification==
> > >
> > > In order to make the import and export of labels as widely accessible as
> > > possible, this BIP uses the comma separated values (CSV) format, which is
> > > widely supported by consumer, business, and scientific applications.
> > > Although the technical specification of CSV in RFC4180 is not always
> > > followed, the application of the format in this BIP is simple enough that
> > > compatibility should not present a problem.
> > > Moreover, the simplicity and forgiving nature of CSV (over for example
> > > JSON) lends itself well to bulk label editing using spreadsheet and text
> > > editing tools.
> > >
> > > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> > > containing one record per line, with records containing two fields
> > > delimited by a comma.
> > > The fields may be quoted, but this is unnecessary, as the first comma in
> > > the line will always be the delimiter.
> > > The first line in the file is a header, and should be ignored on import.
> > > Thereafter, each line represents a record that refers to a label applied
> > in
> > > the wallet.
> > > The order in which these records appear is not defined.
> > >
> > > The first field in the record contains a reference to the transaction,
> > > address, input or output in the wallet.
> > > This is specified as one of the following:
> > > * Transaction ID (<tt>txid</tt>)
> > > * Address
> > > * Input (rendered as <tt>txid<index</tt>)
> > > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> > >
> > > The second field contains the label applied to the reference.
> > > Exporting applications may omit records with no labels or labels of zero
> > > length.
> > > Files exported should use the <tt>.csv</tt> file extension.
> > >
> > > In order to reduce file size while retaining wide accessibility, the CSV
> > > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > > file extension.
> > > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> > or
> > > AES-256 encryption, which is supported by numerous applications including
> > > Winzip and 7-zip.
> > > In order to ensure that weak encryption does not proliferate, importers
> > > following this standard must refuse to import <tt>.zip</tt> files
> > encrypted
> > > with the weaker Zip 2.0 standard.
> > > The textual representation of the wallet's extended public key (as
> > defined
> > > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> > >
> > > ==Importing==
> > >
> > > When importing, a naive algorithm may simply match against any reference,
> > > but it is possible to disambiguate between transactions, addresses,
> > inputs
> > > and outputs.
> > > For example in the following pseudocode:
> > > <pre>
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length == 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > </pre>
> > >
> > > Importing applications may truncate labels if necessary.
> > >
> > > ==Test Vectors==
> > >
> > > The following fragment represents a wallet label export:
> > > <pre>
> > > Reference,Label
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> > > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> > > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> > > (alternative)
> > > </pre>
> > >
> > > ==Reference Implementation==
> > >
> > > TBD



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24 19:10 ` Ali Sherief
  2022-08-27 21:03   ` Billy Tetrud
@ 2022-08-29 11:25   ` Craig Raw
  2022-09-21  6:07     ` Hugo Nguyen
  1 sibling, 1 reply; 20+ messages in thread
From: Craig Raw @ 2022-08-29 11:25 UTC (permalink / raw)
  To: Ali Sherief; +Cc: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 21724 bytes --]

Thanks for your feedback @Ali.

I am attempting to achieve two goals with this proposal, primarily for the
benefit of wallet users:

Goal #1. Transfer labels between different wallet implementations
Goal #2. Manage labels in applications outside of Bitcoin wallets (such as
Excel)

Much of the feedback so far has indicated the tension between these two
goals - it may be that it is too difficult to achieve both, in which case
Goal #1 is the most important. That said, I think further exploration is
still necessary before abandoning Goal #2, because removing it would
significantly reduce the value of this proposal and mean users need to rely
on application-specific workarounds.

> it is important that a version byte is defined
If Goal #2 is to be achieved it's difficult to mandate this, particularly
if one requires bit flags to be set. Should an importing wallet fail to
import if the version byte is not present, even if all the data is
otherwise correct? Although it is difficult to know in advance how a format
may be extended, it is certainly possible to extend this format with
additional types where the nature of hashes serve as unique identifiers
(more on this below).

 > Don't mandate the file extension... There is no way to enforce this on a
BIP level.
I'm not quite sure what you mean here - for example BIP174, which is widely
used, states "Binary PSBT files should use the .psbt file extension." Also,
this contradicts Goal #2 - Excel and Numbers register as handlers for .csv,
and so make it clear that the file is editable outside of a wallet.

> ZIP does not have good performance or compression ratio
Indeed, but it is very widely available. That said, gzip is supported
widely too these days. Unfortunately, gzip does not offer encryption (see
next answer).

> ZIP is an archiving format, that happens to have its own compression
format.
I agree this is not ideal. My main reason for choosing ZIP was that it
supports encryption. It seems to me that without considering encryption, an
application must create label export files that allow privacy-sensitive
wallet information to be readable in plain text. Being able to transfer
labels without risking privacy is IMO valuable. I considered other
encryption formats such as PGP, but they are much more niche and so again
contradict Goal #2.

> I don't see the benefit of encrypting addresses and labels together...
additionally, the password you propose is insecure - anybody with access to
the wallet can unlock it
I'm not sure I understand your question, but both wallet addresses and
wallet labels contain privacy-sensitive information that should be
protected. Wrt to the password, there is actually a more fundamental
problem with using the wallet xpub - there is no equivalent for multisig
wallets. For this reason I'll remove that requirement in future iterations.

> Why the need for input and output formats? There is no difference between
them on the wallet level, because they are always identified with a txid
and output index.
The input refers to the txid and the input index (in the set of vin), so
the difference is the context in which they are displayed. A wallet will
not necessarily store the spent outputs for a funding transaction
containing a UTXO coming into the wallet, but it will contain references to
the inputs as part of that transaction.

> Another important point is that practically nobody labels inputs or
outputs
To the contrary, UTXOs are very frequently labelled, as they link and
reveal information when spent. Inputs are much less frequently labelled,
but there is no particular reason to exclude them.

> there is a net benefit for the addresses to be exported in ascending order
Indeed, but it makes achieving Goal #2 much more difficult for marginal
benefit.

> It's better to mandate that they should always be double-quoted, since
only wallets will generate label exports anyway.
Rather I think it's better to mandate RFC4180 is followed, as per
recommendations in other feedback.

> The importing code is too naive... it should utilize a dedicate item type
field that unambiguously identifies the item
It's unclear to me what you mean here. As I've indicated it is currently
possible to disambiguate between addresses/transactions/etc without the
need for a 3rd column, but in any case the hash functions used ensure that
labels will not be associated incorrectly. Even in the unlikely event of
some future address type being indistinguishable from a txid, it will
simply not match any txids in the wallet.

Craig



On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali@notatether.com> wrote:

> Hi Craig,
>
> This a really good proposal. I studied your BIP and I have some feedback
> on some parts of it.
>
> > The first line in the file is a header, and should be ignored on import.
>
> From past experience and lessons, most notably BIP39, it is important that
> a version byte is defined somewhere in case someone wants to extend it in
> the future, currently there is no version byte which someone can increment
> if somebody wants to extend it. In the unique case of CSV files, you should
> make the header line mandatory (I see you have already implied this, but
> you should make it explicit in the BIP), but instead of a line with columns
> in it, I suggest instead of Reference,Label, you make the format like this:
>
> BIP-wallet-labels,<version>
>
> Since there are two columns per record, this works out nicely. The first
> column can be the name of the BIP - BIPxxxx where the x's are numbers, and
> the second column can be an unsigned 32-bit integer (most significant 8
> bits reserved for version, the remaining for flags, or perhaps the entirety
> for version - but I recommend leaving at least some bits for flags, even if
> they all end up being just "reserved").
>
> You should make importing fail if the header line is not exactly as
> specified - or appropriate, should you decide a different format for the
> header.
>
> > Files exported should use the <tt>.csv</tt> file extension.
> Don't mandate the file extension (read below for why):
>
> > In order to reduce file size while retaining wide accessibility, the CSV
> > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > file extension.
> I see three problems with this. The first is more important than the later
> two because it makes them moot points, but I'll mention them anyway so you
> get a background of the situation:
> - The BIP is trying to specify in what file format the export format can
> be written in onto the filesystem. There is no way to enforce this on a BIP
> level (besides, Unix operating systems don't even consider the file
> extension, they use its mimetype). Also specifying this in the BIP will
> prevent modular "Layer 2" protocols and schemes from encoding the Export
> labels into another format - for example Base64 or with their own
> compression algorithm.
>
> Now for the two "moot problems":
> - ZIP does not have good performance or compression ratio, there are
> better algorithms out there like gzip (which also happens to be more
> ubiquitous; nearly all websites are serving HTML compressed with gzip
> compression).
> - ZIP is an archiving format, that happens to have its own compression
> format. Archiving format parsers can have serious vulnerabilities in their
> implementation that can allow malware to swipe private keys and passwords,
> since the primary target for this BIP is wallets. For example, there was
> Zip Slip[1] in 2018, which allows for remote code execution. So the malware
> can even hide in memory until private keys or passwords are written to
> memory, then send them accros the network. Assuming it's targeting a
> specific wallet software it's not hard to carry out at all.
>
> There's two solutions for all this:
> 1. The duck-tape solution: Use some compression algorithm like gzip
> instead of ZIP archive format.
> 2. The "throw it out and buy a new one" solution: Get rid of the optional
> compression specs altogether, because users are responsible for supplying
> the export labels in the first place, so all the compression stuff is
> redundant and should be left up to the user use if they desire to.
>
> I prefer the second solution because it hits the nail at the problem
> directly instead of putting duck tape on it like the first one.
>
> > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or
> > AES-256 encryption, which is supported by numerous applications including
> > Winzip and 7-zip.
> > The textual representation of the wallet's extended public key (as
> defined
> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> Not specific to AES, but I don't see the benefit of encrypting addresses
> and labels together. Can you please elaborate why this would be desireable?
>
> Like I said though, it's better to leave it up to users to decide how to
> store their exports, since BIPs can't enforce that anyway (additionally,
> the password you propose is insecure - anybody with access to the wallet
> can unlock it, which is not desireable to some users who want their own
> security).
>
> > * Transaction ID (<tt>txid</tt>)
> > * Address
> > * Input (rendered as <tt>txid<index</tt>)
> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> Why the need for input and output formats? There is no difference between
> them on the wallet level, because they are always identified with a txid
> and output index. To distinguish between them and hence write them with the
> correct format would require a UTXO set and thus access to a full node,
> otherwise the CSV cannot be verified to be completely well-formed.
>
> Another important point is that practically nobody labels inputs or
> outputs because most people do not know that those things even exist, and
> the rest don't bother to label them.
>
> But the biggest downside to including them is related to the problem of
> information leaking which you make reference to here:
> > In both cases, care must be taken when spending to avoid undesirable
> leaks
> > of private information.
> A CSV dump that has inputs/outputs and addresses mixed together can infer
> the owner of all those items. In fact, A CVS label dump is basically a
> personal information store so everything in it can be correlated as coming
> from the same wallet, so it's important that unnecessary types are kept out
> of the format. People are known to leave files lying around on their
> computer that they don't need anymore, so these files can find their way
> via telemetry to surveillence entities. While we can't specify what users
> can do with their exports, we can control the information leak by
> preventing certain types of items that we know most users will never use
> from being exported in the first place.
>
> > The order in which these records appear is not defined.
> Again, since the primary use case for this BIP is wallets, which likely
> use heirarchical derivation schemes like BIP44, there is a net benefit for
> the addresses to be exported in ascending order of their `address_type`. It
> means that wallets can import them in O(n) time as opposed to O(n^2) time
> spent serially checking in which index the address appears at. Of course,
> this implies that all addresses up to a certain index have to be exported
> into the CSV as well, but most wallets I know of like Core, Electrum
> already store addresses like that.
>
> Also if you do this, you will need to group all the transaction records
> before the address records or vice versa - you can use lexigraphical
> sorting if you want (ie. Addresses before Transactions). The benefit of
> this separation of parts is that wallets can split the imported address
> records from the transaction records internally, and feed them to separate
> functions which set these labels internally.
>
> If you decide on doing it this way, then you need a 3rd column to identify
> the item type, and also you should quote the label (see below). I strongly
> recommend using numbers for identification as opposed to character strings,
> so you don't have to worry about localization or character case issues.
> There is always one unique number, but there could be multiple strings that
> reference the same type. This will complicate importing functions.
>
> If you insist on include Input and Output types then they can both be
> specified as <txid>:<index> if you do this change. They won't be used to
> determine the type anyway.
>
> > The fields may be quoted, but this is unnecessary, as the first comma in
> > the line will always be the delimiter.
> Don't implement it like that, because that will break CSV parsers which
> expect a fixed amount of rows in each record (2 in the header, and some
> rows have >2 rows). It's better to mandate that they should always be
> double-quoted, since only wallets will generate label exports anyway. If
> you plan to use headers then the 3rd column can be blank for it (or you can
> split the version and flags from each other).
>
> > ==Importing==
> >
> > When importing, a naive algorithm may simply match against any reference,
> > but it is possible to disambiguate between transactions, addresses,
> inputs
> > and outputs.
> > For example in the following pseudocode:
> > <pre>
> >   if reference length < 64
> >     Set address label
> >   else if reference length == 64
> >     Set transaction label
> >   else if reference contains '<'
> >     Set input label
> >   else
> >     Set output label
> > </pre>
> The importing code is too naive and in its current form will prevent the
> BIP from getting a number. It is perhaps the single most important part of
> a BIP. When implementing an importer, it should utilize a dedicate item
> type field that unambiguously identifies the item. So the naive importer is
> not good, you need use a 3rd column for that like I explained above, so
> that the importer becomes robust.
>
> In summary (exclamation marks indicate severity - one means low, two means
> medium, and three means high):
>
> 1. Convert the header into a version line with optional flags, otherwise
> nobody can extend this format without compatibility issues (!)
> 2. Get rid of the specs related to file compression (!!!)
> 3. Add a 3rd column for item type (address, transaction etc.) preferably
> as numeric constants and grouping items of one type after items of another
> type, or if you insist on strings, then only recognize their Titlecase
> ASCII versions <spreadsheet software like Excel always tries to titlecase
> the words> (!!)
> 4. Require double quotes around the label (or single quotes if you prefer,
> as long as spreadsheet software doesn't choke on them) (!!)
> 5. Require sorting the records according to the order they are stored in
> the wallet implementation. (!)
> 6. Consider getting rid of Input and Output item types. (!)
> 7. And last and most importantly, please write a more robust importer
> algorithm in the example given by the BIP, because code in BIPs are
> frequently used as references for software. (!!!)
>
> I hope you will consider these points in future revisions of your BIP.
>
> - Ali
>
> [1] https://github.com/snyk/zip-slip-vulnerability
>
> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> > Hi all,
> >
> > I would like to propose a BIP that specifies a format for the export and
> > import of labels from a wallet. While transferring access to funds across
> > wallet applications has been made simple through standards such as BIP39,
> > wallet labels remain siloed and difficult to extract despite their value,
> > particularly in a privacy context.
> >
> > The proposed format is a simple two column CSV file, with the reference
> to
> > a transaction, address, input or output in the first column, and the
> label
> > in the second column. CSV was chosen for its wide accessibility,
> especially
> > to users without specific technical expertise. Similarly, the CSV file
> may
> > be compressed using the ZIP format, and optionally encrypted using AES.
> >
> > The full text of the BIP can be found at
> > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> > and also copied below.
> >
> > Feedback is appreciated.
> >
> > Thanks,
> > Craig Raw
> >
> > ---
> >
> > <pre>
> >   BIP: wallet-labels
> >   Layer: Applications
> >   Title: Wallet Labels Export Format
> >   Author: Craig Raw <craig@sparrowwallet.com>
> >   Comments-Summary: No comments yet.
> >   Comments-URI:
> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >   Status: Draft
> >   Type: Informational
> >   Created: 2022-08-23
> >   License: BSD-2-Clause
> > </pre>
> >
> > ==Abstract==
> >
> > This document specifies a format for the export of labels that may be
> > attached to the transactions, addresses, input and outputs in a wallet.
> >
> > ==Copyright==
> >
> > This BIP is licensed under the BSD 2-clause license.
> >
> > ==Motivation==
> >
> > The export and import of funds across different Bitcoin wallet
> applications
> > is well defined through standards such as BIP39, BIP32, BIP44 etc.
> > These standards are well supported and allow users to move easily between
> > different wallets.
> > There is, however, no defined standard to transfer any labels the user
> may
> > have applied to the transactions, addresses, inputs or outputs in their
> > wallet.
> > The UTXO model that Bitcoin uses makes these labels particularly valuable
> > as they may indicate the source of funds, whether received externally or
> as
> > a result of change from a prior transaction.
> > In both cases, care must be taken when spending to avoid undesirable
> leaks
> > of private information.
> > Labels provide valuable guidance in this regard, and have even become
> > mandatory when spending in several Bitcoin wallets.
> > Allowing users to export their labels in a standardized way ensures that
> > they do not experience lock-in to a particular wallet application.
> > In addition, by using common formats, this BIP seeks to make manual or
> bulk
> > management of labels accessible to users without specific technical
> > expertise.
> >
> > ==Specification==
> >
> > In order to make the import and export of labels as widely accessible as
> > possible, this BIP uses the comma separated values (CSV) format, which is
> > widely supported by consumer, business, and scientific applications.
> > Although the technical specification of CSV in RFC4180 is not always
> > followed, the application of the format in this BIP is simple enough that
> > compatibility should not present a problem.
> > Moreover, the simplicity and forgiving nature of CSV (over for example
> > JSON) lends itself well to bulk label editing using spreadsheet and text
> > editing tools.
> >
> > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> > containing one record per line, with records containing two fields
> > delimited by a comma.
> > The fields may be quoted, but this is unnecessary, as the first comma in
> > the line will always be the delimiter.
> > The first line in the file is a header, and should be ignored on import.
> > Thereafter, each line represents a record that refers to a label applied
> in
> > the wallet.
> > The order in which these records appear is not defined.
> >
> > The first field in the record contains a reference to the transaction,
> > address, input or output in the wallet.
> > This is specified as one of the following:
> > * Transaction ID (<tt>txid</tt>)
> > * Address
> > * Input (rendered as <tt>txid<index</tt>)
> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >
> > The second field contains the label applied to the reference.
> > Exporting applications may omit records with no labels or labels of zero
> > length.
> > Files exported should use the <tt>.csv</tt> file extension.
> >
> > In order to reduce file size while retaining wide accessibility, the CSV
> > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > file extension.
> > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or
> > AES-256 encryption, which is supported by numerous applications including
> > Winzip and 7-zip.
> > In order to ensure that weak encryption does not proliferate, importers
> > following this standard must refuse to import <tt>.zip</tt> files
> encrypted
> > with the weaker Zip 2.0 standard.
> > The textual representation of the wallet's extended public key (as
> defined
> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> >
> > ==Importing==
> >
> > When importing, a naive algorithm may simply match against any reference,
> > but it is possible to disambiguate between transactions, addresses,
> inputs
> > and outputs.
> > For example in the following pseudocode:
> > <pre>
> >   if reference length < 64
> >     Set address label
> >   else if reference length == 64
> >     Set transaction label
> >   else if reference contains '<'
> >     Set input label
> >   else
> >     Set output label
> > </pre>
> >
> > Importing applications may truncate labels if necessary.
> >
> > ==Test Vectors==
> >
> > The following fragment represents a wallet label export:
> > <pre>
> > Reference,Label
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> > (alternative)
> > </pre>
> >
> > ==Reference Implementation==
> >
> > TBD
>
>

[-- Attachment #2: Type: text/html, Size: 24641 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-25 22:54     ` Clark Moody
@ 2022-08-27 22:20       ` Billy Tetrud
  0 siblings, 0 replies; 20+ messages in thread
From: Billy Tetrud @ 2022-08-27 22:20 UTC (permalink / raw)
  To: Clark Moody, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 11580 bytes --]

I think it might be a good idea to record something that can directly
connect the list of labels with the correct wallet. Imagine someone backs
up a bunch of label files and then forgets which wallet they apply to. Sure
you could probably look through the list of transactions, addresses, etc
and compare against those contained in the actual wallet, but this would be
sort of messy and potentially inefficient. It might be useful to include a
hash of the wallet descriptor (hash for privacy) that can be compared
against potential matching wallet descriptors.

On Fri, Aug 26, 2022 at 2:46 AM Clark Moody via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Having previously developed an export format[1] for general cryptocurrency
> transaction information, I can attest to the value of the human-readable
> CSV. I was careful to mention the RFC 4180 spec so that implementations
> could avoid the pitfalls of incorrect CSV encoding.
>
> [1]: https://github.com/harmony-csv/harmony
>
> Clark
>
> ------- Original Message -------
> On Thursday, August 25th, 2022 at 3:59 AM, Craig Raw via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> Thanks for your thoughts Ryan.
>
> Without reference to the quality feedback on this proposal, I was aware
> when submitting it for review that it provides an excellent opportunity for
> bike shedding. As developers, we have all experienced frustration with data
> formats. One thing that I did not perhaps make clear enough is that this
> format is not solely intended for developers, but general users who are
> probably not well represented on this list.
>
> While doing research for this proposal I spoke to several professional
> users of Sparrow Wallet (who are not developers). They all expressed a
> desire for the format to integrate with their business processes, which are
> driven by business tools such as Excel. Labelling provides an important
> function in UTXO and address management in these scenarios, and needs to be
> accessible and manageable outside of wallet software.
>
> If this is to be achieved, it immediately rules out JSON as a data format.
> Not only is JSON limited to editing only through specific software or text
> editors, but (in the latter case) it is fragile enough that a single
> missing character can cause an entire file to fail parsing. CSV is more
> forgiving in this regard. With respect to your comments on escaping, my
> expectation would be that developers will be using a mature CSV library
> rather than handling character escaping themselves. I would rather propose
> a format that is generally usable, even if occasionally a label is escaped
> incorrectly.
>
> Finally, I'll note that CSV files are already common and uncontroversial
> in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt
> many others) already export addresses and/or transactions with their labels
> as CSV files. This proposal simply attempts to create a standard for
> importing and exporting all the labels in a wallet.
>
> Craig
>
> On Wed, Aug 24, 2022 at 9:01 PM <rhavar@protonmail.com> wrote:
>
>> I'd strongly suggest not using CSV. Especially for a standard. I've
>> worked with it as an interchange format many a times, and it's always been
>> a clusterfuck.
>>
>> Right off the bat, you have stuff like "The fields may be quoted, but
>> this is unnecessary as the first comma in the line will always be the
>> delimiter" which invariably leads to some implementations doing it, some
>> implementations not doing it, and others that are intolerant of the other
>> way.
>>
>> And you have also made the classic mistake of not strictly defining
>> escape rules. So everyone will pick their own (e.g. some will \, escape
>> commas, others will not cause it's quoted and escape quotes, and others
>> will assume no escaping is required since its the last column in a csv).
>>
>> Over time it morphs into its own mini-monster that introduces so much
>> pain.
>>
>> On a similar note, allowing alternatives (like: txid>index vs
>> txid:index) provides no benefit, but creates additional work for
>> implementations (who quite likely only test formats they produce) and
>> future incompatibilities.
>>
>> I know everyone loves to hate on it, but really (line-separated?) json is
>> the way to go.
>>
>> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎",
>> "label": "wow, such label" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
>> "txout": 4, "label": "omg this is so easy to parse" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
>> "txin": 0, "label": "wow this is going to be extensible as well" }
>>
>>
>>
>>
>> -Ryan
>>
>> ------- Original Message -------
>> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <
>> bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>> Hi all,
>>
>> I would like to propose a BIP that specifies a format for the export and
>> import of labels from a wallet. While transferring access to funds across
>> wallet applications has been made simple through standards such as BIP39,
>> wallet labels remain siloed and difficult to extract despite their value,
>> particularly in a privacy context.
>>
>> The proposed format is a simple two column CSV file, with the reference
>> to a transaction, address, input or output in the first column, and the
>> label in the second column. CSV was chosen for its wide accessibility,
>> especially to users without specific technical expertise. Similarly, the
>> CSV file may be compressed using the ZIP format, and optionally encrypted
>> using AES.
>>
>> The full text of the BIP can be found at
>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
>> and also copied below.
>>
>> Feedback is appreciated.
>>
>> Thanks,
>> Craig Raw
>>
>> ---
>>
>> <pre>
>> BIP: wallet-labels
>> Layer: Applications
>> Title: Wallet Labels Export Format
>> Author: Craig Raw <craig@sparrowwallet.com>
>> Comments-Summary: No comments yet.
>> Comments-URI:
>> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>> Status: Draft
>> Type: Informational
>> Created: 2022-08-23
>> License: BSD-2-Clause
>> </pre>
>>
>> ==Abstract==
>>
>> This document specifies a format for the export of labels that may be
>> attached to the transactions, addresses, input and outputs in a wallet.
>>
>> ==Copyright==
>>
>> This BIP is licensed under the BSD 2-clause license.
>>
>> ==Motivation==
>>
>> The export and import of funds across different Bitcoin wallet
>> applications is well defined through standards such as BIP39, BIP32, BIP44
>> etc.
>> These standards are well supported and allow users to move easily between
>> different wallets.
>> There is, however, no defined standard to transfer any labels the user
>> may have applied to the transactions, addresses, inputs or outputs in their
>> wallet.
>> The UTXO model that Bitcoin uses makes these labels particularly valuable
>> as they may indicate the source of funds, whether received externally or as
>> a result of change from a prior transaction.
>> In both cases, care must be taken when spending to avoid undesirable
>> leaks of private information.
>> Labels provide valuable guidance in this regard, and have even become
>> mandatory when spending in several Bitcoin wallets.
>> Allowing users to export their labels in a standardized way ensures that
>> they do not experience lock-in to a particular wallet application.
>> In addition, by using common formats, this BIP seeks to make manual or
>> bulk management of labels accessible to users without specific technical
>> expertise.
>>
>> ==Specification==
>>
>> In order to make the import and export of labels as widely accessible as
>> possible, this BIP uses the comma separated values (CSV) format, which is
>> widely supported by consumer, business, and scientific applications.
>> Although the technical specification of CSV in RFC4180 is not always
>> followed, the application of the format in this BIP is simple enough that
>> compatibility should not present a problem.
>> Moreover, the simplicity and forgiving nature of CSV (over for example
>> JSON) lends itself well to bulk label editing using spreadsheet and text
>> editing tools.
>>
>> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
>> containing one record per line, with records containing two fields
>> delimited by a comma.
>> The fields may be quoted, but this is unnecessary, as the first comma in
>> the line will always be the delimiter.
>> The first line in the file is a header, and should be ignored on import.
>> Thereafter, each line represents a record that refers to a label applied
>> in the wallet.
>> The order in which these records appear is not defined.
>>
>> The first field in the record contains a reference to the transaction,
>> address, input or output in the wallet.
>> This is specified as one of the following:
>> * Transaction ID (<tt>txid</tt>)
>> * Address
>> * Input (rendered as <tt>txid<index</tt>)
>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>>
>> The second field contains the label applied to the reference.
>> Exporting applications may omit records with no labels or labels of zero
>> length.
>> Files exported should use the <tt>.csv</tt> file extension.
>>
>> In order to reduce file size while retaining wide accessibility, the CSV
>> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
>> file extension.
>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128
>> or AES-256 encryption, which is supported by numerous applications
>> including Winzip and 7-zip.
>> In order to ensure that weak encryption does not proliferate, importers
>> following this standard must refuse to import <tt>.zip</tt> files encrypted
>> with the weaker Zip 2.0 standard.
>> The textual representation of the wallet's extended public key (as
>> defined by BIP32, with an <tt>xpub</tt> header) should be used as the
>> password.
>>
>> ==Importing==
>>
>> When importing, a naive algorithm may simply match against any reference,
>> but it is possible to disambiguate between transactions, addresses, inputs
>> and outputs.
>> For example in the following pseudocode:
>> <pre>
>> if reference length < 64
>> Set address label
>> else if reference length == 64
>> Set transaction label
>> else if reference contains '<'
>> Set input label
>> else
>> Set output label
>> </pre>
>>
>> Importing applications may truncate labels if necessary.
>>
>> ==Test Vectors==
>>
>> The following fragment represents a wallet label export:
>> <pre>
>> Reference,Label
>>
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
>> (alternative)
>> </pre>
>>
>> ==Reference Implementation==
>>
>> TBD
>>
>>
>>
>>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 14799 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
       [not found] <mailman.12565.1661634459.956.bitcoin-dev@lists.linuxfoundation.org>
@ 2022-08-27 21:26 ` Ali Sherief
  0 siblings, 0 replies; 20+ messages in thread
From: Ali Sherief @ 2022-08-27 21:26 UTC (permalink / raw)
  To: billy.tetrud; +Cc: bitcoin-dev

> This seems to run contrary with your point about letting users be in
> control of how they store this. Given that you can always connect together
> an output and its address or find the outputs at any address, it doesn't
> seem like it would actually leak any more information than just including
> addresses. Am I missing something?

That's actually true, and coming back to it now it feels more like a security-through-obscurity suggestion. It's still valid that the export files will be valuable telemetry, but now I'm starting to feel more concerned about how inputs and outputs would be represented in the first place.

Some folks have suggested writing them as descriptors for that purpose[1]. But I see problems with that approach; there are only descriptors for things like addresses, outputs, derivation paths and so on. I know of no descriptors for transaction IDs or inputs.

I am actually starting to contemplate whether it's wise to merge Inputs and Outputs to one classification conveniently called just "Outputs", because it's impossible to distinguish between them by looking at them (any input is also an output, but not vice versa). Wise, because I do not know of any wallet software that labels outputs.

- Ali

[1]: https://bitcointalk.org/index.php?topic=5411159.0

On Sat, Sat, 27 Aug 2022 16:03:01 -0500, billy.tetrud@gmail.com wrote:
> @Ali Thats some good well thought through and well articulated feedback. I
> have one point of contention
>
> > it's important that unnecessary types are kept out of the format. People
> are known to leave files lying around on their computer that they don't
> need anymore, so these files can find their way via telemetry to
> surveillence entities. While we can't specify what users can do with their
> exports, we can control the information leak by preventing certain types of
> items that we know most users will never use from being exported in the
> first place.
>
> This seems to run contrary with your point about letting users be in
> control of how they store this. Given that you can always connect together
> an output and its address or find the outputs at any address, it doesn't
> seem like it would actually leak any more information than just including
> addresses. Am I missing something?
>
> On Wed, Aug 24, 2022, 14:44 Ali Sherief via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> > Hi Craig,
> >
> > This a really good proposal. I studied your BIP and I have some feedback
> > on some parts of it.
> >
> > > The first line in the file is a header, and should be ignored on import.
> >
> > From past experience and lessons, most notably BIP39, it is important that
> > a version byte is defined somewhere in case someone wants to extend it in
> > the future, currently there is no version byte which someone can increment
> > if somebody wants to extend it. In the unique case of CSV files, you should
> > make the header line mandatory (I see you have already implied this, but
> > you should make it explicit in the BIP), but instead of a line with columns
> > in it, I suggest instead of Reference,Label, you make the format like this:
> >
> > BIP-wallet-labels,<version>
> >
> > Since there are two columns per record, this works out nicely. The first
> > column can be the name of the BIP - BIPxxxx where the x's are numbers, and
> > the second column can be an unsigned 32-bit integer (most significant 8
> > bits reserved for version, the remaining for flags, or perhaps the entirety
> > for version - but I recommend leaving at least some bits for flags, even if
> > they all end up being just "reserved").
> >
> > You should make importing fail if the header line is not exactly as
> > specified - or appropriate, should you decide a different format for the
> > header.
> >
> > > Files exported should use the <tt>.csv</tt> file extension.
> > Don't mandate the file extension (read below for why):
> >
> > > In order to reduce file size while retaining wide accessibility, the CSV
> > > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > > file extension.
> > I see three problems with this. The first is more important than the later
> > two because it makes them moot points, but I'll mention them anyway so you
> > get a background of the situation:
> > - The BIP is trying to specify in what file format the export format can
> > be written in onto the filesystem. There is no way to enforce this on a BIP
> > level (besides, Unix operating systems don't even consider the file
> > extension, they use its mimetype). Also specifying this in the BIP will
> > prevent modular "Layer 2" protocols and schemes from encoding the Export
> > labels into another format - for example Base64 or with their own
> > compression algorithm.
> >
> > Now for the two "moot problems":
> > - ZIP does not have good performance or compression ratio, there are
> > better algorithms out there like gzip (which also happens to be more
> > ubiquitous; nearly all websites are serving HTML compressed with gzip
> > compression).
> > - ZIP is an archiving format, that happens to have its own compression
> > format. Archiving format parsers can have serious vulnerabilities in their
> > implementation that can allow malware to swipe private keys and passwords,
> > since the primary target for this BIP is wallets. For example, there was
> > Zip Slip[1] in 2018, which allows for remote code execution. So the malware
> > can even hide in memory until private keys or passwords are written to
> > memory, then send them accros the network. Assuming it's targeting a
> > specific wallet software it's not hard to carry out at all.
> >
> > There's two solutions for all this:
> > 1. The duck-tape solution: Use some compression algorithm like gzip
> > instead of ZIP archive format.
> > 2. The "throw it out and buy a new one" solution: Get rid of the optional
> > compression specs altogether, because users are responsible for supplying
> > the export labels in the first place, so all the compression stuff is
> > redundant and should be left up to the user use if they desire to.
> >
> > I prefer the second solution because it hits the nail at the problem
> > directly instead of putting duck tape on it like the first one.
> >
> > > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> > or
> > > AES-256 encryption, which is supported by numerous applications including
> > > Winzip and 7-zip.
> > > The textual representation of the wallet's extended public key (as
> > defined
> > > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> > Not specific to AES, but I don't see the benefit of encrypting addresses
> > and labels together. Can you please elaborate why this would be desireable?
> >
> > Like I said though, it's better to leave it up to users to decide how to
> > store their exports, since BIPs can't enforce that anyway (additionally,
> > the password you propose is insecure - anybody with access to the wallet
> > can unlock it, which is not desireable to some users who want their own
> > security).
> >
> > > * Transaction ID (<tt>txid</tt>)
> > > * Address
> > > * Input (rendered as <tt>txid<index</tt>)
> > > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> > Why the need for input and output formats? There is no difference between
> > them on the wallet level, because they are always identified with a txid
> > and output index. To distinguish between them and hence write them with the
> > correct format would require a UTXO set and thus access to a full node,
> > otherwise the CSV cannot be verified to be completely well-formed.
> >
> > Another important point is that practically nobody labels inputs or
> > outputs because most people do not know that those things even exist, and
> > the rest don't bother to label them.
> >
> > But the biggest downside to including them is related to the problem of
> > information leaking which you make reference to here:
> > > In both cases, care must be taken when spending to avoid undesirable
> > leaks
> > > of private information.
> > A CSV dump that has inputs/outputs and addresses mixed together can infer
> > the owner of all those items. In fact, A CVS label dump is basically a
> > personal information store so everything in it can be correlated as coming
> > from the same wallet, so it's important that unnecessary types are kept out
> > of the format. People are known to leave files lying around on their
> > computer that they don't need anymore, so these files can find their way
> > via telemetry to surveillence entities. While we can't specify what users
> > can do with their exports, we can control the information leak by
> > preventing certain types of items that we know most users will never use
> > from being exported in the first place.
> >
> > > The order in which these records appear is not defined.
> > Again, since the primary use case for this BIP is wallets, which likely
> > use heirarchical derivation schemes like BIP44, there is a net benefit for
> > the addresses to be exported in ascending order of their `address_type`. It
> > means that wallets can import them in O(n) time as opposed to O(n^2) time
> > spent serially checking in which index the address appears at. Of course,
> > this implies that all addresses up to a certain index have to be exported
> > into the CSV as well, but most wallets I know of like Core, Electrum
> > already store addresses like that.
> >
> > Also if you do this, you will need to group all the transaction records
> > before the address records or vice versa - you can use lexigraphical
> > sorting if you want (ie. Addresses before Transactions). The benefit of
> > this separation of parts is that wallets can split the imported address
> > records from the transaction records internally, and feed them to separate
> > functions which set these labels internally.
> >
> > If you decide on doing it this way, then you need a 3rd column to identify
> > the item type, and also you should quote the label (see below). I strongly
> > recommend using numbers for identification as opposed to character strings,
> > so you don't have to worry about localization or character case issues.
> > There is always one unique number, but there could be multiple strings that
> > reference the same type. This will complicate importing functions.
> >
> > If you insist on include Input and Output types then they can both be
> > specified as <txid>:<index> if you do this change. They won't be used to
> > determine the type anyway.
> >
> > > The fields may be quoted, but this is unnecessary, as the first comma in
> > > the line will always be the delimiter.
> > Don't implement it like that, because that will break CSV parsers which
> > expect a fixed amount of rows in each record (2 in the header, and some
> > rows have >2 rows). It's better to mandate that they should always be
> > double-quoted, since only wallets will generate label exports anyway. If
> > you plan to use headers then the 3rd column can be blank for it (or you can
> > split the version and flags from each other).
> >
> > > ==Importing==
> > >
> > > When importing, a naive algorithm may simply match against any reference,
> > > but it is possible to disambiguate between transactions, addresses,
> > inputs
> > > and outputs.
> > > For example in the following pseudocode:
> > > <pre>
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length == 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > </pre>
> > The importing code is too naive and in its current form will prevent the
> > BIP from getting a number. It is perhaps the single most important part of
> > a BIP. When implementing an importer, it should utilize a dedicate item
> > type field that unambiguously identifies the item. So the naive importer is
> > not good, you need use a 3rd column for that like I explained above, so
> > that the importer becomes robust.
> >
> > In summary (exclamation marks indicate severity - one means low, two means
> > medium, and three means high):
> >
> > 1. Convert the header into a version line with optional flags, otherwise
> > nobody can extend this format without compatibility issues (!)
> > 2. Get rid of the specs related to file compression (!!!)
> > 3. Add a 3rd column for item type (address, transaction etc.) preferably
> > as numeric constants and grouping items of one type after items of another
> > type, or if you insist on strings, then only recognize their Titlecase
> > ASCII versions <spreadsheet software like Excel always tries to titlecase
> > the words> (!!)
> > 4. Require double quotes around the label (or single quotes if you prefer,
> > as long as spreadsheet software doesn't choke on them) (!!)
> > 5. Require sorting the records according to the order they are stored in
> > the wallet implementation. (!)
> > 6. Consider getting rid of Input and Output item types. (!)
> > 7. And last and most importantly, please write a more robust importer
> > algorithm in the example given by the BIP, because code in BIPs are
> > frequently used as references for software. (!!!)
> >
> > I hope you will consider these points in future revisions of your BIP.
> >
> > - Ali
> >
> > [1] https://github.com/snyk/zip-slip-vulnerability
> >
> > On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> > > Hi all,
> > >
> > > I would like to propose a BIP that specifies a format for the export and
> > > import of labels from a wallet. While transferring access to funds across
> > > wallet applications has been made simple through standards such as BIP39,
> > > wallet labels remain siloed and difficult to extract despite their value,
> > > particularly in a privacy context.
> > >
> > > The proposed format is a simple two column CSV file, with the reference
> > to
> > > a transaction, address, input or output in the first column, and the
> > label
> > > in the second column. CSV was chosen for its wide accessibility,
> > especially
> > > to users without specific technical expertise. Similarly, the CSV file
> > may
> > > be compressed using the ZIP format, and optionally encrypted using AES.
> > >
> > > The full text of the BIP can be found at
> > > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> > > and also copied below.
> > >
> > > Feedback is appreciated.
> > >
> > > Thanks,
> > > Craig Raw
> > >
> > > ---
> > >
> > > <pre>
> > >   BIP: wallet-labels
> > >   Layer: Applications
> > >   Title: Wallet Labels Export Format
> > >   Author: Craig Raw <craig@sparrowwallet.com>
> > >   Comments-Summary: No comments yet.
> > >   Comments-URI:
> > > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> > >   Status: Draft
> > >   Type: Informational
> > >   Created: 2022-08-23
> > >   License: BSD-2-Clause
> > > </pre>
> > >
> > > ==Abstract==
> > >
> > > This document specifies a format for the export of labels that may be
> > > attached to the transactions, addresses, input and outputs in a wallet.
> > >
> > > ==Copyright==
> > >
> > > This BIP is licensed under the BSD 2-clause license.
> > >
> > > ==Motivation==
> > >
> > > The export and import of funds across different Bitcoin wallet
> > applications
> > > is well defined through standards such as BIP39, BIP32, BIP44 etc.
> > > These standards are well supported and allow users to move easily between
> > > different wallets.
> > > There is, however, no defined standard to transfer any labels the user
> > may
> > > have applied to the transactions, addresses, inputs or outputs in their
> > > wallet.
> > > The UTXO model that Bitcoin uses makes these labels particularly valuable
> > > as they may indicate the source of funds, whether received externally or
> > as
> > > a result of change from a prior transaction.
> > > In both cases, care must be taken when spending to avoid undesirable
> > leaks
> > > of private information.
> > > Labels provide valuable guidance in this regard, and have even become
> > > mandatory when spending in several Bitcoin wallets.
> > > Allowing users to export their labels in a standardized way ensures that
> > > they do not experience lock-in to a particular wallet application.
> > > In addition, by using common formats, this BIP seeks to make manual or
> > bulk
> > > management of labels accessible to users without specific technical
> > > expertise.
> > >
> > > ==Specification==
> > >
> > > In order to make the import and export of labels as widely accessible as
> > > possible, this BIP uses the comma separated values (CSV) format, which is
> > > widely supported by consumer, business, and scientific applications.
> > > Although the technical specification of CSV in RFC4180 is not always
> > > followed, the application of the format in this BIP is simple enough that
> > > compatibility should not present a problem.
> > > Moreover, the simplicity and forgiving nature of CSV (over for example
> > > JSON) lends itself well to bulk label editing using spreadsheet and text
> > > editing tools.
> > >
> > > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> > > containing one record per line, with records containing two fields
> > > delimited by a comma.
> > > The fields may be quoted, but this is unnecessary, as the first comma in
> > > the line will always be the delimiter.
> > > The first line in the file is a header, and should be ignored on import.
> > > Thereafter, each line represents a record that refers to a label applied
> > in
> > > the wallet.
> > > The order in which these records appear is not defined.
> > >
> > > The first field in the record contains a reference to the transaction,
> > > address, input or output in the wallet.
> > > This is specified as one of the following:
> > > * Transaction ID (<tt>txid</tt>)
> > > * Address
> > > * Input (rendered as <tt>txid<index</tt>)
> > > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> > >
> > > The second field contains the label applied to the reference.
> > > Exporting applications may omit records with no labels or labels of zero
> > > length.
> > > Files exported should use the <tt>.csv</tt> file extension.
> > >
> > > In order to reduce file size while retaining wide accessibility, the CSV
> > > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > > file extension.
> > > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> > or
> > > AES-256 encryption, which is supported by numerous applications including
> > > Winzip and 7-zip.
> > > In order to ensure that weak encryption does not proliferate, importers
> > > following this standard must refuse to import <tt>.zip</tt> files
> > encrypted
> > > with the weaker Zip 2.0 standard.
> > > The textual representation of the wallet's extended public key (as
> > defined
> > > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> > >
> > > ==Importing==
> > >
> > > When importing, a naive algorithm may simply match against any reference,
> > > but it is possible to disambiguate between transactions, addresses,
> > inputs
> > > and outputs.
> > > For example in the following pseudocode:
> > > <pre>
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length == 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > </pre>
> > >
> > > Importing applications may truncate labels if necessary.
> > >
> > > ==Test Vectors==
> > >
> > > The following fragment represents a wallet label export:
> > > <pre>
> > > Reference,Label
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> > > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> > > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> > > (alternative)
> > > </pre>
> > >
> > > ==Reference Implementation==
> > >
> > > TBD



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24 19:10 ` Ali Sherief
@ 2022-08-27 21:03   ` Billy Tetrud
  2022-08-29 11:25   ` Craig Raw
  1 sibling, 0 replies; 20+ messages in thread
From: Billy Tetrud @ 2022-08-27 21:03 UTC (permalink / raw)
  To: Ali Sherief, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 18281 bytes --]

@Ali Thats some good well thought through and well articulated feedback. I
have one point of contention

> it's important that unnecessary types are kept out of the format. People
are known to leave files lying around on their computer that they don't
need anymore, so these files can find their way via telemetry to
surveillence entities. While we can't specify what users can do with their
exports, we can control the information leak by preventing certain types of
items that we know most users will never use from being exported in the
first place.

This seems to run contrary with your point about letting users be in
control of how they store this. Given that you can always connect together
an output and its address or find the outputs at any address, it doesn't
seem like it would actually leak any more information than just including
addresses. Am I missing something?

On Wed, Aug 24, 2022, 14:44 Ali Sherief via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Hi Craig,
>
> This a really good proposal. I studied your BIP and I have some feedback
> on some parts of it.
>
> > The first line in the file is a header, and should be ignored on import.
>
> From past experience and lessons, most notably BIP39, it is important that
> a version byte is defined somewhere in case someone wants to extend it in
> the future, currently there is no version byte which someone can increment
> if somebody wants to extend it. In the unique case of CSV files, you should
> make the header line mandatory (I see you have already implied this, but
> you should make it explicit in the BIP), but instead of a line with columns
> in it, I suggest instead of Reference,Label, you make the format like this:
>
> BIP-wallet-labels,<version>
>
> Since there are two columns per record, this works out nicely. The first
> column can be the name of the BIP - BIPxxxx where the x's are numbers, and
> the second column can be an unsigned 32-bit integer (most significant 8
> bits reserved for version, the remaining for flags, or perhaps the entirety
> for version - but I recommend leaving at least some bits for flags, even if
> they all end up being just "reserved").
>
> You should make importing fail if the header line is not exactly as
> specified - or appropriate, should you decide a different format for the
> header.
>
> > Files exported should use the <tt>.csv</tt> file extension.
> Don't mandate the file extension (read below for why):
>
> > In order to reduce file size while retaining wide accessibility, the CSV
> > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > file extension.
> I see three problems with this. The first is more important than the later
> two because it makes them moot points, but I'll mention them anyway so you
> get a background of the situation:
> - The BIP is trying to specify in what file format the export format can
> be written in onto the filesystem. There is no way to enforce this on a BIP
> level (besides, Unix operating systems don't even consider the file
> extension, they use its mimetype). Also specifying this in the BIP will
> prevent modular "Layer 2" protocols and schemes from encoding the Export
> labels into another format - for example Base64 or with their own
> compression algorithm.
>
> Now for the two "moot problems":
> - ZIP does not have good performance or compression ratio, there are
> better algorithms out there like gzip (which also happens to be more
> ubiquitous; nearly all websites are serving HTML compressed with gzip
> compression).
> - ZIP is an archiving format, that happens to have its own compression
> format. Archiving format parsers can have serious vulnerabilities in their
> implementation that can allow malware to swipe private keys and passwords,
> since the primary target for this BIP is wallets. For example, there was
> Zip Slip[1] in 2018, which allows for remote code execution. So the malware
> can even hide in memory until private keys or passwords are written to
> memory, then send them accros the network. Assuming it's targeting a
> specific wallet software it's not hard to carry out at all.
>
> There's two solutions for all this:
> 1. The duck-tape solution: Use some compression algorithm like gzip
> instead of ZIP archive format.
> 2. The "throw it out and buy a new one" solution: Get rid of the optional
> compression specs altogether, because users are responsible for supplying
> the export labels in the first place, so all the compression stuff is
> redundant and should be left up to the user use if they desire to.
>
> I prefer the second solution because it hits the nail at the problem
> directly instead of putting duck tape on it like the first one.
>
> > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or
> > AES-256 encryption, which is supported by numerous applications including
> > Winzip and 7-zip.
> > The textual representation of the wallet's extended public key (as
> defined
> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> Not specific to AES, but I don't see the benefit of encrypting addresses
> and labels together. Can you please elaborate why this would be desireable?
>
> Like I said though, it's better to leave it up to users to decide how to
> store their exports, since BIPs can't enforce that anyway (additionally,
> the password you propose is insecure - anybody with access to the wallet
> can unlock it, which is not desireable to some users who want their own
> security).
>
> > * Transaction ID (<tt>txid</tt>)
> > * Address
> > * Input (rendered as <tt>txid<index</tt>)
> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> Why the need for input and output formats? There is no difference between
> them on the wallet level, because they are always identified with a txid
> and output index. To distinguish between them and hence write them with the
> correct format would require a UTXO set and thus access to a full node,
> otherwise the CSV cannot be verified to be completely well-formed.
>
> Another important point is that practically nobody labels inputs or
> outputs because most people do not know that those things even exist, and
> the rest don't bother to label them.
>
> But the biggest downside to including them is related to the problem of
> information leaking which you make reference to here:
> > In both cases, care must be taken when spending to avoid undesirable
> leaks
> > of private information.
> A CSV dump that has inputs/outputs and addresses mixed together can infer
> the owner of all those items. In fact, A CVS label dump is basically a
> personal information store so everything in it can be correlated as coming
> from the same wallet, so it's important that unnecessary types are kept out
> of the format. People are known to leave files lying around on their
> computer that they don't need anymore, so these files can find their way
> via telemetry to surveillence entities. While we can't specify what users
> can do with their exports, we can control the information leak by
> preventing certain types of items that we know most users will never use
> from being exported in the first place.
>
> > The order in which these records appear is not defined.
> Again, since the primary use case for this BIP is wallets, which likely
> use heirarchical derivation schemes like BIP44, there is a net benefit for
> the addresses to be exported in ascending order of their `address_type`. It
> means that wallets can import them in O(n) time as opposed to O(n^2) time
> spent serially checking in which index the address appears at. Of course,
> this implies that all addresses up to a certain index have to be exported
> into the CSV as well, but most wallets I know of like Core, Electrum
> already store addresses like that.
>
> Also if you do this, you will need to group all the transaction records
> before the address records or vice versa - you can use lexigraphical
> sorting if you want (ie. Addresses before Transactions). The benefit of
> this separation of parts is that wallets can split the imported address
> records from the transaction records internally, and feed them to separate
> functions which set these labels internally.
>
> If you decide on doing it this way, then you need a 3rd column to identify
> the item type, and also you should quote the label (see below). I strongly
> recommend using numbers for identification as opposed to character strings,
> so you don't have to worry about localization or character case issues.
> There is always one unique number, but there could be multiple strings that
> reference the same type. This will complicate importing functions.
>
> If you insist on include Input and Output types then they can both be
> specified as <txid>:<index> if you do this change. They won't be used to
> determine the type anyway.
>
> > The fields may be quoted, but this is unnecessary, as the first comma in
> > the line will always be the delimiter.
> Don't implement it like that, because that will break CSV parsers which
> expect a fixed amount of rows in each record (2 in the header, and some
> rows have >2 rows). It's better to mandate that they should always be
> double-quoted, since only wallets will generate label exports anyway. If
> you plan to use headers then the 3rd column can be blank for it (or you can
> split the version and flags from each other).
>
> > ==Importing==
> >
> > When importing, a naive algorithm may simply match against any reference,
> > but it is possible to disambiguate between transactions, addresses,
> inputs
> > and outputs.
> > For example in the following pseudocode:
> > <pre>
> >   if reference length < 64
> >     Set address label
> >   else if reference length == 64
> >     Set transaction label
> >   else if reference contains '<'
> >     Set input label
> >   else
> >     Set output label
> > </pre>
> The importing code is too naive and in its current form will prevent the
> BIP from getting a number. It is perhaps the single most important part of
> a BIP. When implementing an importer, it should utilize a dedicate item
> type field that unambiguously identifies the item. So the naive importer is
> not good, you need use a 3rd column for that like I explained above, so
> that the importer becomes robust.
>
> In summary (exclamation marks indicate severity - one means low, two means
> medium, and three means high):
>
> 1. Convert the header into a version line with optional flags, otherwise
> nobody can extend this format without compatibility issues (!)
> 2. Get rid of the specs related to file compression (!!!)
> 3. Add a 3rd column for item type (address, transaction etc.) preferably
> as numeric constants and grouping items of one type after items of another
> type, or if you insist on strings, then only recognize their Titlecase
> ASCII versions <spreadsheet software like Excel always tries to titlecase
> the words> (!!)
> 4. Require double quotes around the label (or single quotes if you prefer,
> as long as spreadsheet software doesn't choke on them) (!!)
> 5. Require sorting the records according to the order they are stored in
> the wallet implementation. (!)
> 6. Consider getting rid of Input and Output item types. (!)
> 7. And last and most importantly, please write a more robust importer
> algorithm in the example given by the BIP, because code in BIPs are
> frequently used as references for software. (!!!)
>
> I hope you will consider these points in future revisions of your BIP.
>
> - Ali
>
> [1] https://github.com/snyk/zip-slip-vulnerability
>
> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> > Hi all,
> >
> > I would like to propose a BIP that specifies a format for the export and
> > import of labels from a wallet. While transferring access to funds across
> > wallet applications has been made simple through standards such as BIP39,
> > wallet labels remain siloed and difficult to extract despite their value,
> > particularly in a privacy context.
> >
> > The proposed format is a simple two column CSV file, with the reference
> to
> > a transaction, address, input or output in the first column, and the
> label
> > in the second column. CSV was chosen for its wide accessibility,
> especially
> > to users without specific technical expertise. Similarly, the CSV file
> may
> > be compressed using the ZIP format, and optionally encrypted using AES.
> >
> > The full text of the BIP can be found at
> > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> > and also copied below.
> >
> > Feedback is appreciated.
> >
> > Thanks,
> > Craig Raw
> >
> > ---
> >
> > <pre>
> >   BIP: wallet-labels
> >   Layer: Applications
> >   Title: Wallet Labels Export Format
> >   Author: Craig Raw <craig@sparrowwallet.com>
> >   Comments-Summary: No comments yet.
> >   Comments-URI:
> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >   Status: Draft
> >   Type: Informational
> >   Created: 2022-08-23
> >   License: BSD-2-Clause
> > </pre>
> >
> > ==Abstract==
> >
> > This document specifies a format for the export of labels that may be
> > attached to the transactions, addresses, input and outputs in a wallet.
> >
> > ==Copyright==
> >
> > This BIP is licensed under the BSD 2-clause license.
> >
> > ==Motivation==
> >
> > The export and import of funds across different Bitcoin wallet
> applications
> > is well defined through standards such as BIP39, BIP32, BIP44 etc.
> > These standards are well supported and allow users to move easily between
> > different wallets.
> > There is, however, no defined standard to transfer any labels the user
> may
> > have applied to the transactions, addresses, inputs or outputs in their
> > wallet.
> > The UTXO model that Bitcoin uses makes these labels particularly valuable
> > as they may indicate the source of funds, whether received externally or
> as
> > a result of change from a prior transaction.
> > In both cases, care must be taken when spending to avoid undesirable
> leaks
> > of private information.
> > Labels provide valuable guidance in this regard, and have even become
> > mandatory when spending in several Bitcoin wallets.
> > Allowing users to export their labels in a standardized way ensures that
> > they do not experience lock-in to a particular wallet application.
> > In addition, by using common formats, this BIP seeks to make manual or
> bulk
> > management of labels accessible to users without specific technical
> > expertise.
> >
> > ==Specification==
> >
> > In order to make the import and export of labels as widely accessible as
> > possible, this BIP uses the comma separated values (CSV) format, which is
> > widely supported by consumer, business, and scientific applications.
> > Although the technical specification of CSV in RFC4180 is not always
> > followed, the application of the format in this BIP is simple enough that
> > compatibility should not present a problem.
> > Moreover, the simplicity and forgiving nature of CSV (over for example
> > JSON) lends itself well to bulk label editing using spreadsheet and text
> > editing tools.
> >
> > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> > containing one record per line, with records containing two fields
> > delimited by a comma.
> > The fields may be quoted, but this is unnecessary, as the first comma in
> > the line will always be the delimiter.
> > The first line in the file is a header, and should be ignored on import.
> > Thereafter, each line represents a record that refers to a label applied
> in
> > the wallet.
> > The order in which these records appear is not defined.
> >
> > The first field in the record contains a reference to the transaction,
> > address, input or output in the wallet.
> > This is specified as one of the following:
> > * Transaction ID (<tt>txid</tt>)
> > * Address
> > * Input (rendered as <tt>txid<index</tt>)
> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >
> > The second field contains the label applied to the reference.
> > Exporting applications may omit records with no labels or labels of zero
> > length.
> > Files exported should use the <tt>.csv</tt> file extension.
> >
> > In order to reduce file size while retaining wide accessibility, the CSV
> > file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> > file extension.
> > This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or
> > AES-256 encryption, which is supported by numerous applications including
> > Winzip and 7-zip.
> > In order to ensure that weak encryption does not proliferate, importers
> > following this standard must refuse to import <tt>.zip</tt> files
> encrypted
> > with the weaker Zip 2.0 standard.
> > The textual representation of the wallet's extended public key (as
> defined
> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> >
> > ==Importing==
> >
> > When importing, a naive algorithm may simply match against any reference,
> > but it is possible to disambiguate between transactions, addresses,
> inputs
> > and outputs.
> > For example in the following pseudocode:
> > <pre>
> >   if reference length < 64
> >     Set address label
> >   else if reference length == 64
> >     Set transaction label
> >   else if reference contains '<'
> >     Set input label
> >   else
> >     Set output label
> > </pre>
> >
> > Importing applications may truncate labels if necessary.
> >
> > ==Test Vectors==
> >
> > The following fragment represents a wallet label export:
> > <pre>
> > Reference,Label
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> > (alternative)
> > </pre>
> >
> > ==Reference Implementation==
> >
> > TBD
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 21136 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-25  8:59   ` Craig Raw
  2022-08-25 13:48     ` rhavar
@ 2022-08-25 22:54     ` Clark Moody
  2022-08-27 22:20       ` Billy Tetrud
  1 sibling, 1 reply; 20+ messages in thread
From: Clark Moody @ 2022-08-25 22:54 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 10273 bytes --]

Having previously developed an export format[1] for general cryptocurrency transaction information, I can attest to the value of the human-readable CSV. I was careful to mention the RFC 4180 spec so that implementations could avoid the pitfalls of incorrect CSV encoding.

[1]: https://github.com/harmony-csv/harmony

Clark

------- Original Message -------
On Thursday, August 25th, 2022 at 3:59 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

> Thanks for your thoughts Ryan.
>
> Without reference to the quality feedback on this proposal, I was aware when submitting it for review that it provides an excellent opportunity for bike shedding. As developers, we have all experienced frustration with data formats. One thing that I did not perhaps make clear enough is that this format is not solely intended for developers, but general users who are probably not well represented on this list.
>
> While doing research for this proposal I spoke to several professional users of Sparrow Wallet (who are not developers). They all expressed a desire for the format to integrate with their business processes, which are driven by business tools such as Excel. Labelling provides an important function in UTXO and address management in these scenarios, and needs to be accessible and manageable outside of wallet software.
>
> If this is to be achieved, it immediately rules out JSON as a data format. Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard. With respect to your comments on escaping, my expectation would be that developers will be using a mature CSV library rather than handling character escaping themselves. I would rather propose a format that is generally usable, even if occasionally a label is escaped incorrectly.
>
> Finally, I'll note that CSV files are already common and uncontroversial in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many others) already export addresses and/or transactions with their labels as CSV files. This proposal simply attempts to create a standard for importing and exporting all the labels in a wallet.
>
> Craig
>
> On Wed, Aug 24, 2022 at 9:01 PM <rhavar@protonmail.com> wrote:
>
>> I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.
>>
>> Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.
>>
>> And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).
>>
>> Over time it morphs into its own mini-monster that introduces so much pain.
>>
>> On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.
>>
>> I know everyone loves to hate on it, but really (line-separated?) json is the way to go.
>>
>> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎", "label": "wow, such label" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }
>>
>> -Ryan
>>
>> ------- Original Message -------
>> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
>>>
>>> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
>>>
>>> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
>>>
>>> Feedback is appreciated.
>>>
>>> Thanks,
>>> Craig Raw
>>>
>>> ---
>>>
>>> <pre>
>>> BIP: wallet-labels
>>> Layer: Applications
>>> Title: Wallet Labels Export Format
>>> Author: Craig Raw <craig@sparrowwallet.com>
>>> Comments-Summary: No comments yet.
>>> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>>> Status: Draft
>>> Type: Informational
>>> Created: 2022-08-23
>>> License: BSD-2-Clause
>>> </pre>
>>>
>>> ==Abstract==
>>>
>>> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
>>>
>>> ==Copyright==
>>>
>>> This BIP is licensed under the BSD 2-clause license.
>>>
>>> ==Motivation==
>>>
>>> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
>>> These standards are well supported and allow users to move easily between different wallets.
>>> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
>>> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
>>> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
>>> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
>>> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
>>> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
>>>
>>> ==Specification==
>>>
>>> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
>>> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
>>> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
>>>
>>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
>>> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
>>> The first line in the file is a header, and should be ignored on import.
>>> Thereafter, each line represents a record that refers to a label applied in the wallet.
>>> The order in which these records appear is not defined.
>>>
>>> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
>>> This is specified as one of the following:
>>> * Transaction ID (<tt>txid</tt>)
>>> * Address
>>> * Input (rendered as <tt>txid<index</tt>)
>>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>>>
>>> The second field contains the label applied to the reference.
>>> Exporting applications may omit records with no labels or labels of zero length.
>>> Files exported should use the <tt>.csv</tt> file extension.
>>>
>>> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
>>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
>>> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
>>> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>>>
>>> ==Importing==
>>>
>>> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
>>> For example in the following pseudocode:
>>> <pre>
>>> if reference length < 64
>>> Set address label
>>> else if reference length == 64
>>> Set transaction label
>>> else if reference contains '<'
>>> Set input label
>>> else
>>> Set output label
>>> </pre>
>>>
>>> Importing applications may truncate labels if necessary.
>>>
>>> ==Test Vectors==
>>>
>>> The following fragment represents a wallet label export:
>>> <pre>
>>> Reference,Label
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
>>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output (alternative)
>>> </pre>
>>>
>>> ==Reference Implementation==
>>>
>>> TBD

[-- Attachment #2: Type: text/html, Size: 13430 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-25  8:59   ` Craig Raw
@ 2022-08-25 13:48     ` rhavar
  2022-08-25 22:54     ` Clark Moody
  1 sibling, 0 replies; 20+ messages in thread
From: rhavar @ 2022-08-25 13:48 UTC (permalink / raw)
  To: Craig Raw; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 11063 bytes --]

> Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard.

I think quite simply: A forgiving format is not appropriate for a standard.

It'd be hard to understate how much extra and pointless effort it creates for everyone, and every implementation ends up creating its own defacto standard for what it produces and accepts. Even doing something as simple as adding an extra column will not be possible in the future because it'll break comparability with previous parsers.

I've literally worked on projects where the csv parser has evolved into scan-ahead to use heuristics to understand "rules" of a csv file, and then do line-by-line heuristics to override those rules in pathological cases. Makes a bit of sense when you're trying to achieve 30 years of backwards compatibility. Doesn't make sense for much else..

If your application users really like csv, then introduce an application-specific import-from-csv and export-to-csv with your own rules.
-Ryan

------- Original Message -------
On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craigraw@gmail.com> wrote:

> Thanks for your thoughts Ryan.
>
> Without reference to the quality feedback on this proposal, I was aware when submitting it for review that it provides an excellent opportunity for bike shedding. As developers, we have all experienced frustration with data formats. One thing that I did not perhaps make clear enough is that this format is not solely intended for developers, but general users who are probably not well represented on this list.
>
> While doing research for this proposal I spoke to several professional users of Sparrow Wallet (who are not developers). They all expressed a desire for the format to integrate with their business processes, which are driven by business tools such as Excel. Labelling provides an important function in UTXO and address management in these scenarios, and needs to be accessible and manageable outside of wallet software.
>
> If this is to be achieved, it immediately rules out JSON as a data format. Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard. With respect to your comments on escaping, my expectation would be that developers will be using a mature CSV library rather than handling character escaping themselves. I would rather propose a format that is generally usable, even if occasionally a label is escaped incorrectly.
>
> Finally, I'll note that CSV files are already common and uncontroversial in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many others) already export addresses and/or transactions with their labels as CSV files. This proposal simply attempts to create a standard for importing and exporting all the labels in a wallet.
>
> Craig
>
> On Wed, Aug 24, 2022 at 9:01 PM <rhavar@protonmail.com> wrote:
>
>> I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.
>>
>> Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.
>>
>> And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).
>>
>> Over time it morphs into its own mini-monster that introduces so much pain.
>>
>> On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.
>>
>> I know everyone loves to hate on it, but really (line-separated?) json is the way to go.
>>
>> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎", "label": "wow, such label" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
>> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }
>>
>> -Ryan
>>
>> ------- Original Message -------
>> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
>>>
>>> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
>>>
>>> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
>>>
>>> Feedback is appreciated.
>>>
>>> Thanks,
>>> Craig Raw
>>>
>>> ---
>>>
>>> <pre>
>>> BIP: wallet-labels
>>> Layer: Applications
>>> Title: Wallet Labels Export Format
>>> Author: Craig Raw <craig@sparrowwallet.com>
>>> Comments-Summary: No comments yet.
>>> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>>> Status: Draft
>>> Type: Informational
>>> Created: 2022-08-23
>>> License: BSD-2-Clause
>>> </pre>
>>>
>>> ==Abstract==
>>>
>>> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
>>>
>>> ==Copyright==
>>>
>>> This BIP is licensed under the BSD 2-clause license.
>>>
>>> ==Motivation==
>>>
>>> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
>>> These standards are well supported and allow users to move easily between different wallets.
>>> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
>>> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
>>> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
>>> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
>>> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
>>> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
>>>
>>> ==Specification==
>>>
>>> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
>>> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
>>> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
>>>
>>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
>>> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
>>> The first line in the file is a header, and should be ignored on import.
>>> Thereafter, each line represents a record that refers to a label applied in the wallet.
>>> The order in which these records appear is not defined.
>>>
>>> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
>>> This is specified as one of the following:
>>> * Transaction ID (<tt>txid</tt>)
>>> * Address
>>> * Input (rendered as <tt>txid<index</tt>)
>>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>>>
>>> The second field contains the label applied to the reference.
>>> Exporting applications may omit records with no labels or labels of zero length.
>>> Files exported should use the <tt>.csv</tt> file extension.
>>>
>>> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
>>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
>>> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
>>> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>>>
>>> ==Importing==
>>>
>>> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
>>> For example in the following pseudocode:
>>> <pre>
>>> if reference length < 64
>>> Set address label
>>> else if reference length == 64
>>> Set transaction label
>>> else if reference contains '<'
>>> Set input label
>>> else
>>> Set output label
>>> </pre>
>>>
>>> Importing applications may truncate labels if necessary.
>>>
>>> ==Test Vectors==
>>>
>>> The following fragment represents a wallet label export:
>>> <pre>
>>> Reference,Label
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
>>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
>>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output (alternative)
>>> </pre>
>>>
>>> ==Reference Implementation==
>>>
>>> TBD

[-- Attachment #2: Type: text/html, Size: 14231 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24 19:01 ` rhavar
  2022-08-24 20:18   ` Pavol Rusnak
@ 2022-08-25  8:59   ` Craig Raw
  2022-08-25 13:48     ` rhavar
  2022-08-25 22:54     ` Clark Moody
  1 sibling, 2 replies; 20+ messages in thread
From: Craig Raw @ 2022-08-25  8:59 UTC (permalink / raw)
  To: rhavar; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 9918 bytes --]

Thanks for your thoughts Ryan.

Without reference to the quality feedback on this proposal, I was aware
when submitting it for review that it provides an excellent opportunity for
bike shedding. As developers, we have all experienced frustration with data
formats. One thing that I did not perhaps make clear enough is that this
format is not solely intended for developers, but general users who are
probably not well represented on this list.

While doing research for this proposal I spoke to several professional
users of Sparrow Wallet (who are not developers). They all expressed a
desire for the format to integrate with their business processes, which are
driven by business tools such as Excel. Labelling provides an important
function in UTXO and address management in these scenarios, and needs to be
accessible and manageable outside of wallet software.

If this is to be achieved, it immediately rules out JSON as a data format.
Not only is JSON limited to editing only through specific software or text
editors, but (in the latter case) it is fragile enough that a single
missing character can cause an entire file to fail parsing. CSV is more
forgiving in this regard. With respect to your comments on escaping, my
expectation would be that developers will be using a mature CSV library
rather than handling character escaping themselves. I would rather propose
a format that is generally usable, even if occasionally a label is escaped
incorrectly.

Finally, I'll note that CSV files are already common and uncontroversial in
Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many
others) already export addresses and/or transactions with their labels as
CSV files. This proposal simply attempts to create a standard for importing
and exporting all the labels in a wallet.

Craig

On Wed, Aug 24, 2022 at 9:01 PM <rhavar@protonmail.com> wrote:

> I'd strongly suggest not using CSV. Especially for a standard. I've worked
> with it as an interchange format many a times, and it's always been a
> clusterfuck.
>
> Right off the bat, you have stuff like "The fields may be quoted, but this
> is unnecessary as the first comma in the line will always be the delimiter"
> which invariably leads to some implementations doing it, some
> implementations not doing it, and others that are intolerant of the other
> way.
>
> And you have also made the classic mistake of not strictly defining escape
> rules. So everyone will pick their own (e.g. some will \, escape commas,
> others will not cause it's quoted and escape quotes, and others will assume
> no escaping is required since its the last column in a csv).
>
> Over time it morphs into its own mini-monster that introduces so much pain.
>
> On a similar note, allowing alternatives (like: txid>index vs txid:index)
> provides no benefit, but creates additional work for implementations (who
> quite likely only test formats they produce) and future incompatibilities.
>
> I know everyone loves to hate on it, but really (line-separated?) json is
> the way to go.
>
> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎",
> "label": "wow, such label" }
> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
> "txout": 4, "label": "omg this is so easy to parse" }
> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
> "txin": 0, "label": "wow this is going to be extensible as well" }
>
>
>
>
> -Ryan
>
> ------- Original Message -------
> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
> BIP: wallet-labels
> Layer: Applications
> Title: Wallet Labels Export Format
> Author: Craig Raw <craig@sparrowwallet.com>
> Comments-Summary: No comments yet.
> Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> Status: Draft
> Type: Informational
> Created: 2022-08-23
> License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet
> applications is well defined through standards such as BIP39, BIP32, BIP44
> etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user may
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or
> bulk management of labels accessible to users without specific technical
> expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied
> in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or AES-256 encryption, which is supported by numerous applications
> including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
> if reference length < 64
> Set address label
> else if reference length == 64
> Set transaction label
> else if reference contains '<'
> Set input label
> else
> Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
>
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
> (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 12103 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24 13:53 ` Clark Moody
@ 2022-08-25  8:59   ` Craig Raw
  0 siblings, 0 replies; 20+ messages in thread
From: Craig Raw @ 2022-08-25  8:59 UTC (permalink / raw)
  To: Clark Moody; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 7625 bytes --]

Thanks Clark - despite having worked in the Bitcoin wallet space for a
number of years I have not come across SLIP-0015. I did try to find prior
work on this - this one escaped me.

That said, having reviewed SLIP-0015, I think it has different design
goals. For example, it requires private key derivation from seed, which
means wallets functioning as coordinators cannot use the format without
access to the devices storing the private keys. It seems to me that wallet
labels are more privacy rather than security sensitive, and coordinators
should be able to import and export a wallet label format independently.

Secondly, it uses JSON as a data format, which I wanted to avoid for
reasons I'll describe in a separate reply in this thread. Finally (and this
is minor as it could easily be extended) SLIP-0015 does not currently
support transaction labels.

Craig

On Wed, Aug 24, 2022 at 3:53 PM Clark Moody <clark@clarkmoody.com> wrote:

> Craig,
>
> Thanks for the proposal.
>
> How does this proposal compare with SLIP-0015, which provides encryption
> by default? Would it be worth exploring a merge of the two approaches?
>
> https://github.com/satoshilabs/slips/blob/master/slip-0015.md
>
> Clark
>
> ------- Original Message -------
> On Wednesday, August 24th, 2022 at 4:18 AM, Craig Raw via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
> BIP: wallet-labels
> Layer: Applications
> Title: Wallet Labels Export Format
> Author: Craig Raw <craig@sparrowwallet.com>
> Comments-Summary: No comments yet.
> Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> Status: Draft
> Type: Informational
> Created: 2022-08-23
> License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet
> applications is well defined through standards such as BIP39, BIP32, BIP44
> etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user may
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or
> bulk management of labels accessible to users without specific technical
> expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied
> in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or AES-256 encryption, which is supported by numerous applications
> including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
> if reference length < 64
> Set address label
> else if reference length == 64
> Set transaction label
> else if reference contains '<'
> Set input label
> else
> Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
>
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
> (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 9642 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24 19:01 ` rhavar
@ 2022-08-24 20:18   ` Pavol Rusnak
  2022-08-25  8:59   ` Craig Raw
  1 sibling, 0 replies; 20+ messages in thread
From: Pavol Rusnak @ 2022-08-24 20:18 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion, rhavar

[-- Attachment #1: Type: text/plain, Size: 8578 bytes --]

There is already a JSON standard that has been already used in the wild for
the last 7 years described in SLIP-0015 (mentioned by Clark in this
thread). No need to reinventing the wheel again.

On Wed 24. 8. 2022 at 21:44, Ryan Havar via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> I'd strongly suggest not using CSV. Especially for a standard. I've worked
> with it as an interchange format many a times, and it's always been a
> clusterfuck.
>
> Right off the bat, you have stuff like "The fields may be quoted, but this
> is unnecessary as the first comma in the line will always be the delimiter"
> which invariably leads to some implementations doing it, some
> implementations not doing it, and others that are intolerant of the other
> way.
>
> And you have also made the classic mistake of not strictly defining escape
> rules. So everyone will pick their own (e.g. some will \, escape commas,
> others will not cause it's quoted and escape quotes, and others will assume
> no escaping is required since its the last column in a csv).
>
> Over time it morphs into its own mini-monster that introduces so much pain.
>
> On a similar note, allowing alternatives (like: txid>index vs txid:index)
> provides no benefit, but creates additional work for implementations (who
> quite likely only test formats they produce) and future incompatibilities.
>
> I know everyone loves to hate on it, but really (line-separated?) json is
> the way to go.
>
> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎",
> "label": "wow, such label" }
> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
> "txout": 4, "label": "omg this is so easy to parse" }
> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b",
> "txin": 0, "label": "wow this is going to be extensible as well" }
>
>
>
>
> -Ryan
>
> ------- Original Message -------
>
> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
> BIP: wallet-labels
> Layer: Applications
> Title: Wallet Labels Export Format
> Author: Craig Raw <craig@sparrowwallet.com>
> Comments-Summary: No comments yet.
> Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> Status: Draft
> Type: Informational
> Created: 2022-08-23
> License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet
> applications is well defined through standards such as BIP39, BIP32, BIP44
> etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user may
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or
> bulk management of labels accessible to users without specific technical
> expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied
> in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128
> or AES-256 encryption, which is supported by numerous applications
> including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
> if reference length < 64
> Set address label
> else if reference length == 64
> Set transaction label
> else if reference contains '<'
> Set input label
> else
> Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
>
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
> (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
-- 
Best Regards / S pozdravom,

Pavol "stick" Rusnak
Co-Founder, SatoshiLabs

[-- Attachment #2: Type: text/html, Size: 11718 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
       [not found] <mailman.9.1661342403.3868.bitcoin-dev@lists.linuxfoundation.org>
@ 2022-08-24 19:10 ` Ali Sherief
  2022-08-27 21:03   ` Billy Tetrud
  2022-08-29 11:25   ` Craig Raw
  0 siblings, 2 replies; 20+ messages in thread
From: Ali Sherief @ 2022-08-24 19:10 UTC (permalink / raw)
  To: craigraw; +Cc: bitcoin-dev

Hi Craig,

This a really good proposal. I studied your BIP and I have some feedback on some parts of it.

> The first line in the file is a header, and should be ignored on import.

From past experience and lessons, most notably BIP39, it is important that a version byte is defined somewhere in case someone wants to extend it in the future, currently there is no version byte which someone can increment if somebody wants to extend it. In the unique case of CSV files, you should make the header line mandatory (I see you have already implied this, but you should make it explicit in the BIP), but instead of a line with columns in it, I suggest instead of Reference,Label, you make the format like this:

BIP-wallet-labels,<version>

Since there are two columns per record, this works out nicely. The first column can be the name of the BIP - BIPxxxx where the x's are numbers, and the second column can be an unsigned 32-bit integer (most significant 8 bits reserved for version, the remaining for flags, or perhaps the entirety for version - but I recommend leaving at least some bits for flags, even if they all end up being just "reserved").

You should make importing fail if the header line is not exactly as specified - or appropriate, should you decide a different format for the header.

> Files exported should use the <tt>.csv</tt> file extension.
Don't mandate the file extension (read below for why):

> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
I see three problems with this. The first is more important than the later two because it makes them moot points, but I'll mention them anyway so you get a background of the situation:
- The BIP is trying to specify in what file format the export format can be written in onto the filesystem. There is no way to enforce this on a BIP level (besides, Unix operating systems don't even consider the file extension, they use its mimetype). Also specifying this in the BIP will prevent modular "Layer 2" protocols and schemes from encoding the Export labels into another format - for example Base64 or with their own compression algorithm.

Now for the two "moot problems":
- ZIP does not have good performance or compression ratio, there are better algorithms out there like gzip (which also happens to be more ubiquitous; nearly all websites are serving HTML compressed with gzip compression).
- ZIP is an archiving format, that happens to have its own compression format. Archiving format parsers can have serious vulnerabilities in their implementation that can allow malware to swipe private keys and passwords, since the primary target for this BIP is wallets. For example, there was Zip Slip[1] in 2018, which allows for remote code execution. So the malware can even hide in memory until private keys or passwords are written to memory, then send them accros the network. Assuming it's targeting a specific wallet software it's not hard to carry out at all.

There's two solutions for all this:
1. The duck-tape solution: Use some compression algorithm like gzip instead of ZIP archive format.
2. The "throw it out and buy a new one" solution: Get rid of the optional compression specs altogether, because users are responsible for supplying the export labels in the first place, so all the compression stuff is redundant and should be left up to the user use if they desire to.

I prefer the second solution because it hits the nail at the problem directly instead of putting duck tape on it like the first one.

> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or
> AES-256 encryption, which is supported by numerous applications including
> Winzip and 7-zip.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
Not specific to AES, but I don't see the benefit of encrypting addresses and labels together. Can you please elaborate why this would be desireable?

Like I said though, it's better to leave it up to users to decide how to store their exports, since BIPs can't enforce that anyway (additionally, the password you propose is insecure - anybody with access to the wallet can unlock it, which is not desireable to some users who want their own security).

> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
Why the need for input and output formats? There is no difference between them on the wallet level, because they are always identified with a txid and output index. To distinguish between them and hence write them with the correct format would require a UTXO set and thus access to a full node, otherwise the CSV cannot be verified to be completely well-formed.

Another important point is that practically nobody labels inputs or outputs because most people do not know that those things even exist, and the rest don't bother to label them.

But the biggest downside to including them is related to the problem of information leaking which you make reference to here:
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
A CSV dump that has inputs/outputs and addresses mixed together can infer the owner of all those items. In fact, A CVS label dump is basically a personal information store so everything in it can be correlated as coming from the same wallet, so it's important that unnecessary types are kept out of the format. People are known to leave files lying around on their computer that they don't need anymore, so these files can find their way via telemetry to surveillence entities. While we can't specify what users can do with their exports, we can control the information leak by preventing certain types of items that we know most users will never use from being exported in the first place.

> The order in which these records appear is not defined.
Again, since the primary use case for this BIP is wallets, which likely use heirarchical derivation schemes like BIP44, there is a net benefit for the addresses to be exported in ascending order of their `address_type`. It means that wallets can import them in O(n) time as opposed to O(n^2) time spent serially checking in which index the address appears at. Of course, this implies that all addresses up to a certain index have to be exported into the CSV as well, but most wallets I know of like Core, Electrum already store addresses like that.

Also if you do this, you will need to group all the transaction records before the address records or vice versa - you can use lexigraphical sorting if you want (ie. Addresses before Transactions). The benefit of this separation of parts is that wallets can split the imported address records from the transaction records internally, and feed them to separate functions which set these labels internally.

If you decide on doing it this way, then you need a 3rd column to identify the item type, and also you should quote the label (see below). I strongly recommend using numbers for identification as opposed to character strings, so you don't have to worry about localization or character case issues. There is always one unique number, but there could be multiple strings that reference the same type. This will complicate importing functions.

If you insist on include Input and Output types then they can both be specified as <txid>:<index> if you do this change. They won't be used to determine the type anyway.

> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
Don't implement it like that, because that will break CSV parsers which expect a fixed amount of rows in each record (2 in the header, and some rows have >2 rows). It's better to mandate that they should always be double-quoted, since only wallets will generate label exports anyway. If you plan to use headers then the 3rd column can be blank for it (or you can split the version and flags from each other).

> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length == 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
The importing code is too naive and in its current form will prevent the BIP from getting a number. It is perhaps the single most important part of a BIP. When implementing an importer, it should utilize a dedicate item type field that unambiguously identifies the item. So the naive importer is not good, you need use a 3rd column for that like I explained above, so that the importer becomes robust.

In summary (exclamation marks indicate severity - one means low, two means medium, and three means high):

1. Convert the header into a version line with optional flags, otherwise nobody can extend this format without compatibility issues (!)
2. Get rid of the specs related to file compression (!!!)
3. Add a 3rd column for item type (address, transaction etc.) preferably as numeric constants and grouping items of one type after items of another type, or if you insist on strings, then only recognize their Titlecase ASCII versions <spreadsheet software like Excel always tries to titlecase the words> (!!)
4. Require double quotes around the label (or single quotes if you prefer, as long as spreadsheet software doesn't choke on them) (!!)
5. Require sorting the records according to the order they are stored in the wallet implementation. (!)
6. Consider getting rid of Input and Output item types. (!)
7. And last and most importantly, please write a more robust importer algorithm in the example given by the BIP, because code in BIPs are frequently used as references for software. (!!!)

I hope you will consider these points in future revisions of your BIP.

- Ali

[1] https://github.com/snyk/zip-slip-vulnerability

On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
>   BIP: wallet-labels
>   Layer: Applications
>   Title: Wallet Labels Export Format
>   Author: Craig Raw <craig@sparrowwallet.com>
>   Comments-Summary: No comments yet.
>   Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>   Status: Draft
>   Type: Informational
>   Created: 2022-08-23
>   License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet applications
> is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user may
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or bulk
> management of labels accessible to users without specific technical
> expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied in
> the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or
> AES-256 encryption, which is supported by numerous applications including
> Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, inputs
> and outputs.
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length == 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24  9:18 Craig Raw
  2022-08-24 13:53 ` Clark Moody
  2022-08-24 15:57 ` Brandon Black
@ 2022-08-24 19:01 ` rhavar
  2022-08-24 20:18   ` Pavol Rusnak
  2022-08-25  8:59   ` Craig Raw
  2022-08-29 19:52 ` NVK
  2022-09-26  8:23 ` Craig Raw
  4 siblings, 2 replies; 20+ messages in thread
From: rhavar @ 2022-08-24 19:01 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 7627 bytes --]

I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.

Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.

And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).

Over time it morphs into its own mini-monster that introduces so much pain.

On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.

I know everyone loves to hate on it, but really (line-separated?) json is the way to go.

{ "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎", "label": "wow, such label" }
{ "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
{ "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }

-Ryan

------- Original Message -------
On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
> BIP: wallet-labels
> Layer: Applications
> Title: Wallet Labels Export Format
> Author: Craig Raw <craig@sparrowwallet.com>
> Comments-Summary: No comments yet.
> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> Status: Draft
> Type: Informational
> Created: 2022-08-23
> License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily between different wallets.
> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
> For example in the following pseudocode:
> <pre>
> if reference length < 64
> Set address label
> else if reference length == 64
> Set transaction label
> else if reference contains '<'
> Set input label
> else
> Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD

[-- Attachment #2: Type: text/html, Size: 9932 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24  9:18 Craig Raw
  2022-08-24 13:53 ` Clark Moody
@ 2022-08-24 15:57 ` Brandon Black
  2022-08-24 19:01 ` rhavar
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Brandon Black @ 2022-08-24 15:57 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion

On 2022-08-24 (Wed) at 11:18:43 +0200, Craig Raw via bitcoin-dev wrote:
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.

I like the idea of standardizing the transfer of this valuable
information.

> The proposed format is a simple two column CSV file, with the reference to
> a transaction, address, input or output in the first column, and the label
> in the second column. CSV was chosen for its wide accessibility, especially
> to users without specific technical expertise. Similarly, the CSV file may
> be compressed using the ZIP format, and optionally encrypted using AES.

It seems like the format would be more useful if it also included
descriptors so that a single file could be used to transfer a wallet. I
think such an addition would improve usability for advanced users who
might have many such CSVs to manage, and would then be able to more
easily select between them. Descriptor,Label pairs could also be useful
in the format for the transfer of a wallet with several sub accounts.

Thanks,

--Brandon


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
  2022-08-24  9:18 Craig Raw
@ 2022-08-24 13:53 ` Clark Moody
  2022-08-25  8:59   ` Craig Raw
  2022-08-24 15:57 ` Brandon Black
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Clark Moody @ 2022-08-24 13:53 UTC (permalink / raw)
  To: Craig Raw, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 6360 bytes --]

Craig,

Thanks for the proposal.

How does this proposal compare with SLIP-0015, which provides encryption by default? Would it be worth exploring a merge of the two approaches?

https://github.com/satoshilabs/slips/blob/master/slip-0015.md

Clark

------- Original Message -------
On Wednesday, August 24th, 2022 at 4:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
> BIP: wallet-labels
> Layer: Applications
> Title: Wallet Labels Export Format
> Author: Craig Raw <craig@sparrowwallet.com>
> Comments-Summary: No comments yet.
> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> Status: Draft
> Type: Informational
> Created: 2022-08-23
> License: BSD-2-Clause
> </pre>
>
> ==Abstract==
>
> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
>
> ==Copyright==
>
> This BIP is licensed under the BSD 2-clause license.
>
> ==Motivation==
>
> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily between different wallets.
> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
>
> ==Specification==
>
> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied in the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> ==Importing==
>
> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
> For example in the following pseudocode:
> <pre>
> if reference length < 64
> Set address label
> else if reference length == 64
> Set transaction label
> else if reference contains '<'
> Set input label
> else
> Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> ==Test Vectors==
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output (alternative)
> </pre>
>
> ==Reference Implementation==
>
> TBD

[-- Attachment #2: Type: text/html, Size: 8476 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
@ 2022-08-24  9:18 Craig Raw
  2022-08-24 13:53 ` Clark Moody
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Craig Raw @ 2022-08-24  9:18 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5969 bytes --]

Hi all,

I would like to propose a BIP that specifies a format for the export and
import of labels from a wallet. While transferring access to funds across
wallet applications has been made simple through standards such as BIP39,
wallet labels remain siloed and difficult to extract despite their value,
particularly in a privacy context.

The proposed format is a simple two column CSV file, with the reference to
a transaction, address, input or output in the first column, and the label
in the second column. CSV was chosen for its wide accessibility, especially
to users without specific technical expertise. Similarly, the CSV file may
be compressed using the ZIP format, and optionally encrypted using AES.

The full text of the BIP can be found at
https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
and also copied below.

Feedback is appreciated.

Thanks,
Craig Raw

---

<pre>
  BIP: wallet-labels
  Layer: Applications
  Title: Wallet Labels Export Format
  Author: Craig Raw <craig@sparrowwallet.com>
  Comments-Summary: No comments yet.
  Comments-URI:
https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
  Status: Draft
  Type: Informational
  Created: 2022-08-23
  License: BSD-2-Clause
</pre>

==Abstract==

This document specifies a format for the export of labels that may be
attached to the transactions, addresses, input and outputs in a wallet.

==Copyright==

This BIP is licensed under the BSD 2-clause license.

==Motivation==

The export and import of funds across different Bitcoin wallet applications
is well defined through standards such as BIP39, BIP32, BIP44 etc.
These standards are well supported and allow users to move easily between
different wallets.
There is, however, no defined standard to transfer any labels the user may
have applied to the transactions, addresses, inputs or outputs in their
wallet.
The UTXO model that Bitcoin uses makes these labels particularly valuable
as they may indicate the source of funds, whether received externally or as
a result of change from a prior transaction.
In both cases, care must be taken when spending to avoid undesirable leaks
of private information.
Labels provide valuable guidance in this regard, and have even become
mandatory when spending in several Bitcoin wallets.
Allowing users to export their labels in a standardized way ensures that
they do not experience lock-in to a particular wallet application.
In addition, by using common formats, this BIP seeks to make manual or bulk
management of labels accessible to users without specific technical
expertise.

==Specification==

In order to make the import and export of labels as widely accessible as
possible, this BIP uses the comma separated values (CSV) format, which is
widely supported by consumer, business, and scientific applications.
Although the technical specification of CSV in RFC4180 is not always
followed, the application of the format in this BIP is simple enough that
compatibility should not present a problem.
Moreover, the simplicity and forgiving nature of CSV (over for example
JSON) lends itself well to bulk label editing using spreadsheet and text
editing tools.

A CSV export of labels from a wallet must be a UTF-8 encoded text file,
containing one record per line, with records containing two fields
delimited by a comma.
The fields may be quoted, but this is unnecessary, as the first comma in
the line will always be the delimiter.
The first line in the file is a header, and should be ignored on import.
Thereafter, each line represents a record that refers to a label applied in
the wallet.
The order in which these records appear is not defined.

The first field in the record contains a reference to the transaction,
address, input or output in the wallet.
This is specified as one of the following:
* Transaction ID (<tt>txid</tt>)
* Address
* Input (rendered as <tt>txid<index</tt>)
* Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)

The second field contains the label applied to the reference.
Exporting applications may omit records with no labels or labels of zero
length.
Files exported should use the <tt>.csv</tt> file extension.

In order to reduce file size while retaining wide accessibility, the CSV
file may be compressed using the ZIP file format, using the <tt>.zip</tt>
file extension.
This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or
AES-256 encryption, which is supported by numerous applications including
Winzip and 7-zip.
In order to ensure that weak encryption does not proliferate, importers
following this standard must refuse to import <tt>.zip</tt> files encrypted
with the weaker Zip 2.0 standard.
The textual representation of the wallet's extended public key (as defined
by BIP32, with an <tt>xpub</tt> header) should be used as the password.

==Importing==

When importing, a naive algorithm may simply match against any reference,
but it is possible to disambiguate between transactions, addresses, inputs
and outputs.
For example in the following pseudocode:
<pre>
  if reference length < 64
    Set address label
  else if reference length == 64
    Set transaction label
  else if reference contains '<'
    Set input label
  else
    Set output label
</pre>

Importing applications may truncate labels if necessary.

==Test Vectors==

The following fragment represents a wallet label export:
<pre>
Reference,Label
c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎,Transaction
1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎<0,Input
c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎>0,Output
c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎:0,Output
(alternative)
</pre>

==Reference Implementation==

TBD

[-- Attachment #2: Type: text/html, Size: 6744 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-09-26  8:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.11604.1661435396.956.bitcoin-dev@lists.linuxfoundation.org>
2022-08-26  6:06 ` [bitcoin-dev] BIP Proposal: Wallet Labels Export Format Ali Sherief
     [not found] <mailman.13106.1661772392.956.bitcoin-dev@lists.linuxfoundation.org>
2022-08-29 15:46 ` Ali Sherief
2022-08-29 18:19   ` Christopher Allen
     [not found] <mailman.12565.1661634459.956.bitcoin-dev@lists.linuxfoundation.org>
2022-08-27 21:26 ` Ali Sherief
     [not found] <mailman.9.1661342403.3868.bitcoin-dev@lists.linuxfoundation.org>
2022-08-24 19:10 ` Ali Sherief
2022-08-27 21:03   ` Billy Tetrud
2022-08-29 11:25   ` Craig Raw
2022-09-21  6:07     ` Hugo Nguyen
2022-08-24  9:18 Craig Raw
2022-08-24 13:53 ` Clark Moody
2022-08-25  8:59   ` Craig Raw
2022-08-24 15:57 ` Brandon Black
2022-08-24 19:01 ` rhavar
2022-08-24 20:18   ` Pavol Rusnak
2022-08-25  8:59   ` Craig Raw
2022-08-25 13:48     ` rhavar
2022-08-25 22:54     ` Clark Moody
2022-08-27 22:20       ` Billy Tetrud
2022-08-29 19:52 ` NVK
2022-09-26  8:23 ` Craig Raw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox