From mboxrd@z Thu Jan 1 00:00:00 1970 Delivery-date: Fri, 02 May 2025 04:01:18 -0700 Received: from mail-oi1-f183.google.com ([209.85.167.183]) by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1uAo8q-0001Fd-KY for bitcoindev@gnusha.org; Fri, 02 May 2025 04:01:18 -0700 Received: by mail-oi1-f183.google.com with SMTP id 5614622812f47-3f6a7ea5a60sf1584913b6e.1 for ; Fri, 02 May 2025 04:01:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1746183671; cv=pass; d=google.com; s=arc-20240605; b=fRurdcQf6NDx7QHw6L6ncoGUwzN4mIv0zp7sqLCAvkrC9QgIbe88NP8tRMIrHrN5+I X4wNn6D099D/YAZKFB6MRIpRCnYbKHrnlYTAKwwbclvFSh8xvtklntX9IOFar9XjmwP9 XzGntJXquZFBI1WSBy4aL/TjmTNzWiCOp3ZQ0H2wLqD69GMetljU8Y2SaPo4e+Kdo7TA QCsC20F1EQS2mHjTe0Nsio3s/mi6nihX8n0LsyrP4f938VJsAWv8uaqUldb3UYlQqa42 rslu2nsTsQPr+hS3j1nqnTzrKBZVelqdZUaNjxENSklidKF2JMFtC5He0BWUvDRvQ/s7 R6iA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:sender:dkim-signature :dkim-signature; bh=mlbfB0i3NkvPjPj6X+fbbW/yknjOTj3Dq6y668SKX7o=; fh=xkO/uzRdFzf82dhRPY17cgM7LMU7psKr+IxXIWc1kbc=; b=RdAyt96tmZYyFkt/g7n8yO3/AjhTncxaOi4bL/4WwfCE6NVikRh4T+qL3DuXNPPdik w3rmHUGhmBGpy8TE66dRotEuGLFYdUnNPHH4sDSrq+jt1ZTWKbMrH6Gw1rmGlyXwCPDK pLnYvg8o6rIC73/Pb/4ZUF0FN7JvSD5MCm5tt+u3l3yvaB4mgErMOPzZOel4U2yIfzuj eAYUfw1d1v4oQX8g0NfBfLZiommKYhA72Y1LVsfI40c7CSWCSZJlX8htoQ4X6lbrZqbp Uc7h/ePb1P3mcKUZaKiEaB2zfSuU7L43QZ1dj3DF3HejjnP/5+DH/bEUw8lf8iR2zOui HHXQ==; darn=gnusha.org ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hNmy9HMt; spf=pass (google.com: domain of rsomsen@gmail.com designates 2607:f8b0:4864:20::e2e as permitted sender) smtp.mailfrom=rsomsen@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=pass header.i=@googlegroups.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1746183671; x=1746788471; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-authentication-results :x-original-sender:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:sender:from:to:cc:subject:date:message-id :reply-to; bh=mlbfB0i3NkvPjPj6X+fbbW/yknjOTj3Dq6y668SKX7o=; b=a7pyYadHn8PzGtfFY2Yvyu9vzk2KLFzIhb1L83z6S4zuHL5BeExjjDaZKObXYLkUJH uTPND0s4M9HzgPtH1Zq7YM2/McfiLsW2gbd+oFO9GfGqRJHk0SR1ag1+yQ5L+Qlwgn94 FJPsndjfWMtPqMDArSVzJrcMPuW8ue1SmobCfeBt05/7iordqhU0cnU8ApK7MNkrawXe v7GqUDbjrynj44yDCo0ait9jUZgu3LNaQXN9IxqufzFXuZUA910zyxTPpxC5Urnuy/OR Z+oRZZZne+m2NwoY6HOe26yTwthp5mJZVyg+DD12yhP/KjeIj4seVURQDZtKKMqQC3Oz q4Bw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746183671; x=1746788471; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-authentication-results :x-original-sender:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mlbfB0i3NkvPjPj6X+fbbW/yknjOTj3Dq6y668SKX7o=; b=aJzhbxR6YRYIKq0WooIgpz7dp2yETefhpk7e4pYnZokjdGsCMrzP6Sf5U0Jlk1JJ18 Nj9oFOVzss721wxmWIKt2R3tm/UP6DL/jx+1h24gL+Rsew1eTt/i4J7d4kyESjgYADr+ YphPSI46PraGaDA6sqrc1T70ddZg6rtODfN8TqGcMLwqlUxW2jwgZfwBCnUjZF+A0Ttt rndghHUIPpbvX+HXFPARYDjeBiCdk3yUgsnN2mAV7/0VdCMoc0mqQT5I4J7UWU92amNn FHJSO0nDt0xVmiM4N4QEFVxJKqOyisND59KGxu8ztlf5Ty9sAdR8GndSurgvWqVc6EZ5 4G1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746183671; x=1746788471; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-authentication-results :x-original-sender:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:x-beenthere:x-gm-message-state:sender:from :to:cc:subject:date:message-id:reply-to; bh=mlbfB0i3NkvPjPj6X+fbbW/yknjOTj3Dq6y668SKX7o=; b=TIiMIIlEZnuxH1+FRzIITRVnlv9bvdLpXjwOgxJe6NPTgk/FZoGt4QD6aHG3sjbpDw fyctD4yKg5lBLkDDSaRA13MdeufJtU9yPC2GFYGsZoIspFb2BUG7ayuprJniawEwE4cH nFjQJR5IfqTh0cTahEEK9dEZzD3qE7WJ0DteEFCRj/6KVgMB5XovmYBwL7dVwPFXta5o Y8oKn/DCQ/msh59S80Akx3m+ybNA/k7eL2fP9zV6Y1tHtdJe3O5zfuzeo0mYITDUrEBe kpiKE76UOEyD+XdgjNKOy+vLIaVl6uRIMkpiAyQGAg6u74Tei/CU9jAlGnd6za13doKx XXUw== Sender: bitcoindev@googlegroups.com X-Forwarded-Encrypted: i=2; AJvYcCWtj+jVxVmONdH2Qsxk21Ysq9I7YEsgTNcWQFhq/QZso1MKsIBU0UlTi2jGelTs0qD6rLL0cZbr6Xnj@gnusha.org X-Gm-Message-State: AOJu0YxDEEnLnmuLN14j+qTAKeAkttHNQtQcKy61vRJRh4z6UoCTWwud hYXZlGxv+zUandcPijiwGOeX7rIPiZjK7qgPuUK3X8wE2Pjs74Is X-Google-Smtp-Source: AGHT+IG/8kAkDiL2AfcTE03Pke/ypqiDLrOH7W/0fTB1cOub+uXdBe3VtNLHAH0NC1v/YpZy52CJIQ== X-Received: by 2002:a05:6808:2128:b0:3f6:a1f8:ce3 with SMTP id 5614622812f47-4034138a072mr1496220b6e.14.1746183670273; Fri, 02 May 2025 04:01:10 -0700 (PDT) X-BeenThere: bitcoindev@googlegroups.com; h=AVT/gBGWIHX54zD0qGilzTRtqaaBISjD04cLBvjUu/cZ/kYFkA== Received: by 2002:a4a:e715:0:b0:606:75d6:45fe with SMTP id 006d021491bc7-607ded914d3ls696473eaf.0.-pod-prod-03-us; Fri, 02 May 2025 04:01:07 -0700 (PDT) X-Received: by 2002:a05:6808:2288:b0:403:3814:b2b1 with SMTP id 5614622812f47-403413865c3mr1589665b6e.10.1746183667134; Fri, 02 May 2025 04:01:07 -0700 (PDT) Received: by 2002:a05:6808:4382:b0:3f6:a384:eb6f with SMTP id 5614622812f47-4034248474dmsb6e; Fri, 2 May 2025 03:59:35 -0700 (PDT) X-Received: by 2002:a05:6808:1649:b0:401:e5d6:31cd with SMTP id 5614622812f47-4034115b698mr1422900b6e.3.1746183573980; Fri, 02 May 2025 03:59:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1746183573; cv=none; d=google.com; s=arc-20240605; b=cgL4MwqIfNaheX0luNGmheCzxs/LN/1FmiO0HnxjjxfNGzj2qA7T1Mhs3wr3c8ppMF gIT7PGAAeiNnHjXBFHRuzRN/Ch+SeM9hgxXgyCKAVbnhSmXG9e6/osskxF/Mc3udRp5J OjZWGzGgadR15toIgutDTQHkKm+P+IJnQM7mQ+xWc+tUrMUdx9JegX6W2G8dj2/6Gcc6 HzdpyMeafyeoUh9nocFUvhTtvlkfwFWYTbU+tkSUTL5NrPfiqKdX1lPSmBb4fjrxuFuF xf9NVwTO1Z1+OtVLLcUPZVtgbdeAg8gIN8d8RdGfP92gYcezqDiZX65cCj7cGd28V0X/ 04Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=Bg/RGMyCisS9sF4l/cApD6HZHKpX5WZ+67msEgx8ZTw=; fh=buJUvwPPgdi5Z5zmcvUt6NajLrOVgzwZz6n1oNFrtB8=; b=FgeEeA+ae62aIoRgA9c13ITu3aNElPo9F7P9ZspdAtiViSOMvM6xd1WOMtmvcjY+fO YuR41x5ZxlCDwbzhMVqmU+b1iA5cpr7Zxe/Cmaxq3QwpeAeFZXi0wY7hRhCe2NVxasz5 6UFFyUf77r7Ux2RIOs7G4m86aAzSSGSdghmo2Yc/T/wRPs81zu5VOybhuN+qtXK9wJWe N5DdiI/A0Ftz3m42cL6Eyh4g8f9svRWrhGpIapyuqh76+ZHt5W6QDm00TTMh8YbuxVLB gipjbCo22ZfoXXb8zuYS8rnX0qgf0Iugl/I8Mfesh21zGCN8vRghgCwHykCf79g2srez /fjg==; dara=google.com ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hNmy9HMt; spf=pass (google.com: domain of rsomsen@gmail.com designates 2607:f8b0:4864:20::e2e as permitted sender) smtp.mailfrom=rsomsen@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=pass header.i=@googlegroups.com Received: from mail-vs1-xe2e.google.com (mail-vs1-xe2e.google.com. [2607:f8b0:4864:20::e2e]) by gmr-mx.google.com with ESMTPS id 5614622812f47-4033dc614acsi142580b6e.5.2025.05.02.03.59.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 02 May 2025 03:59:33 -0700 (PDT) Received-SPF: pass (google.com: domain of rsomsen@gmail.com designates 2607:f8b0:4864:20::e2e as permitted sender) client-ip=2607:f8b0:4864:20::e2e; Received: by mail-vs1-xe2e.google.com with SMTP id ada2fe7eead31-4dae6e1c408so461157137.2 for ; Fri, 02 May 2025 03:59:33 -0700 (PDT) X-Gm-Gg: ASbGncv/qV2ehJNwxmwophpf7BxJ9rwu6AzIBlseRcBj2Gqhad42T72QpEvgHMc22Li SBIX9WC0vGasc2i5/Hs4ovRmPCNK2RfYOT/RTOmRA9rfuPxn0WOTh8CN73xzW05W9SluWHwI6/Y z/ctXPXG/EcAKuUSh+n/ddmbUZLMuRsqJ8YfrVSDH1CU8tjz3tuMjihoNh X-Received: by 2002:a05:6102:4a82:b0:4bb:dba6:99cd with SMTP id ada2fe7eead31-4dafb4f1b10mr1291692137.8.1746183573145; Fri, 02 May 2025 03:59:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ruben Somsen Date: Fri, 2 May 2025 12:59:22 +0200 X-Gm-Features: ATxdqUHln512qi4V1eHZDv8kVTWnv7KoTBeM6D8wsL3yqsQKuar4b73pWxIgFMI Message-ID: Subject: Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints To: Greg Maxwell Cc: Bitcoin Development Mailing List Content-Type: multipart/alternative; boundary="0000000000009712750634250b5e" X-Original-Sender: rsomsen@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hNmy9HMt; spf=pass (google.com: domain of rsomsen@gmail.com designates 2607:f8b0:4864:20::e2e as permitted sender) smtp.mailfrom=rsomsen@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=pass header.i=@googlegroups.com Precedence: list Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com List-ID: X-Google-Group-Id: 786775582512 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -0.5 (/) --0000000000009712750634250b5e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Greg, I appreciate your thoughts. >excellent idea that preserves the assume valid-like security properties Thanks. Yeah, the assumevalid version is particularly efficient. Without assumevalid there's a bandwidth tradeoff, but that also clearly seems worth it. >Also I like the lack of basically anything being normative, it's easier to feel comfortable with something when you won't be stuck with it forever Yes, that is a very pleasant property. There is essentially no difference between the end state that you reach with or without SwiftSync. >the hints themselves are not security relevant, if they're wrong you'll just fail I agree. >sha256 you should place the salt first so that it can be absorbed then the midstate reused Nice. That is a very good point that I had not yet considered. >false positives are harmless as long as they're rare, as in you can save some extra txouts and if you later find them being spent, then just go ahead and remove them I don't see a practical way to do this without compromising the benefits of SwiftSync because of the "later find them being spent" part. For one, it would require doing a lookup for each input to see if it's in your UTXO set, something we currently don't need to do at all. Secondly, it would require blocks to be processed in order, limiting parallelization. The space savings would also be negligible, considering the hint data is already <100MB (compressed). I think most of the remaining optimization potential (other than the engineering work to enable parallel validation) is in the hash aggregate, like the midstate reuse. Is there a faster alternative to sha256? Can we get away with 16 byte hashes? I could be mistaken, but in theory it seems we could even get away with 4 byte hashes if we allowed for a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving consensus to chance feels philosophically improper, but it's a pretty safe bet considering it also involves PoW to trigger and even then would only affect one or two random unlucky people on Earth. Thanks for chiming in. Cheers, Ruben On Fri, May 2, 2025 at 8:48=E2=80=AFAM Greg Maxwell wr= ote: > This sounds like an excellent idea that preserves the assume valid-like > security properties but with more performance and more optimization > oppturnities. > > I like particularly -- if I understand it correctly-- that the hints > themselves are not security relevant, if they're wrong you'll just fail > rather than end up with something incorrect. Also I like the lack of > basically anything being normative, it's easier to feel comfortable with > something when you won't be stuck with it forever... the whole scheme cou= ld > just be reworked every version with no particular harm because its effect= s > are all ephemeral. At worst you might have problems if you started IBD > with one version and tried to complete it with another. > > I haven't seen any code, but if hash() is a MD style hash function such > as sha256 you should place the salt first so that it can be absorbed then > the midstate reused. For sha256 you could get potentially double the > hashing performance and I assume/hope that hash is actually fairly high i= n > the profile cost.. maybe more like a 1/3 improvement considering the size > of the entry, care should be taken to try to minimize compression functio= n > runs. ... but at least the utxo representation there is entirely > implementation defined. > > You may be able to optimize the size of the hints further with the > observation that false positives are harmless as long as they're rare, a= s > in you can save some extra txouts and if you later find them being spent, > then just go ahead and remove them. So for example ribbon filters give y= ou > a memory space and read efficient hash table that is constructed by solvi= ng > a linear system to make sure all the inputs match. One could solve for > successively larger filters to target a specific false positive rate. (I > mean just a 2,4 cuckoo filter or similar would also work, but that's two > memory accesses required and you don't actually need runtime modificatio= n). > > > > > On Wednesday, April 9, 2025 at 10:11:29=E2=80=AFAM UTC Ruben Somsen wrote= : > >> Hi everyone, >> >> SwiftSync is a new validation method that allows for near-stateless, >> fully parallelizable validation of the Bitcoin blockchain via hints abou= t >> which outputs remain unspent (<100MB total). All other inputs/outputs ar= e >> efficiently crossed off inside a single hash aggregate that only reaches >> zero if validation was successful and the hints were correct. >> >> The main observation is that it can be much easier to validate that a >> given UTXO set is correct than to compute it yourself. It allows us to n= o >> longer require a stateful moment-to-moment UTXO set during IBD and allow= s >> everything to be order independent. I'll briefly summarize the protocol, >> before sharing the link to the full write-up. >> >> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about >> whether or not it will still be in the UTXO set after validation complet= es. >> If it does, we write it to disk (append-only - it won't be used until >> SwiftSync finishes). If it does not, we hash the UTXO data and add it to= an >> aggregate. For each input, we once again hash the UTXO data and remove i= t >> from the aggregate. >> >> At the end, for every added output there should have been exactly one >> removed input, bringing the end total of the aggregate to zero. If this = is >> indeed the case, we will have validated that the hints, and the resultin= g >> UTXO set, were correct. >> >> E.g. For spent outputs A, B and inputs C, D we calculate >> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - >> hash(UTXO_D||salt) =3D=3D 0 (proving (A=3D=3DC && B=3D=3DD) || (A=3D=3DD= && B=3D=3DC)). >> >> There is one missing step. The UTXO data is only available when >> processing the output, but not when processing the input. We resolve thi= s >> by either downloading the outputs that were spent for each block >> (equivalent to the undo data, maybe 10-15% of a block), or we lean on >> assumevalid, making it sufficient to only hash the outpoints (which are >> available in both the output and input) rather than the full UTXO data. >> >> Ignoring bandwidth, the expectation is that the speedup will be most >> significant on either RAM limited devices and/or devices with many CPU >> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up= , >> while currently still being largely sequential. >> >> Many more details are in the full write-up: >> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd >> >> It will answer the following questions (and more): >> >> - How the hash aggregate can be made secure against the generalized >> birthday problem >> - How exactly assumevalid is utilized and what the implications are >> - How we can still check for inflation when we skip the amounts with >> assumevalid >> - How we can validate transaction order while doing everything in parall= el >> - How we can perform the BIP30 duplicate output check without the UTXO s= et >> - How this all relates to assumeutxo >> >> To my knowledge, every validation check involving the UTXO set is >> covered, but I'd be curious to hear if anything was overlooked or if you >> spot any other issues. >> >> Thanks for reading, and thanks to everyone who provided invaluable >> feedback while the idea was coming together. >> >> -- Ruben Somsen >> > -- > You received this message because you are subscribed to the Google Groups > "Bitcoin Development Mailing List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bitcoindev+unsubscribe@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894e= a189a0e2n%40googlegroups.com > > . > --=20 You received this message because you are subscribed to the Google Groups "= Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoindev+unsubscribe@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/= CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com. --0000000000009712750634250b5e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Greg,

I appreciate your thoughts.


>excellent idea that preserves the= assume valid-like security properties

Thanks. Yea= h, the assumevalid version is particularly efficient. Without assumevalid t= here's a bandwidth tradeoff, but that also clearly seems=C2=A0worth it.=


>Also I like the lack of basica= lly anything being normative, it's easier to feel comfortable with some= thing when you won't be stuck with it forever

= Yes, that is a very pleasant property. There is essentially no difference b= etween the end state that you reach with or without SwiftSync.

>the hints themselves are not security relev= ant, if they're wrong you'll just fail

I agree.


>sha256 you should pla= ce the salt first so that it can be absorbed then the midstate reused
=

Nice. That is a very good point that I had not yet cons= idered.


>false positives are har= mless as long as they're rare,=C2=A0 as in you can save some extra txou= ts and if you later find them being spent, then just go ahead and remove th= em

I don't see a practical way to do this with= out compromising the benefits of SwiftSync because of the=C2=A0"later = find them being spent" part. For one, it would require doing a lookup = for each input to see if it's in your UTXO set, something we currently = don't need to do at all. Secondly, it would require blocks to be proces= sed in order, limiting parallelization. The space savings would also be neg= ligible, considering the hint data is already <100MB (compressed).
=

=C2=A0
I think most of the remaining optimiza= tion potential (other than the engineering work to enable parallel validati= on) is in the hash aggregate,=C2=A0like the midstate reuse. Is there a fast= er alternative to sha256? Can we get away with 16 byte hashes? I could be m= istaken, but in theory it seems we could even get away with 4 byte hashes i= f we allowed for a 1 in 4 billion chance of accidentally accepting an inval= id chain. Leaving consensus to chance feels philosophically improper, but i= t's a pretty safe bet considering it also involves PoW to trigger and e= ven then would only affect one or two random unlucky people on=C2=A0Earth.<= /div>


Thanks for chiming in.
Cheers,
Ruben

On Fri, May= 2, 2025 at 8:48=E2=80=AFAM Greg Maxwell <gmaxwell@gmail.com> wrote:
This sounds like an excellent idea that pres= erves the assume valid-like security properties but with more performance a= nd more optimization oppturnities.

I like particul= arly -- if I understand it correctly-- that the hints themselves are not se= curity relevant, if they're wrong you'll just fail rather than end = up with something incorrect.=C2=A0 Also I like the lack of basically anythi= ng being normative, it's easier to feel comfortable with something when= you won't be stuck with it forever... the whole scheme could just be r= eworked every version with no particular harm because its effects are all e= phemeral.=C2=A0 At worst you might have problems if you started IBD with on= e version and tried to complete it with another.

I= haven't seen any code,=C2=A0 but if hash() is a MD style hash function= such as sha256 you should place the salt first so that it can be absorbed = then the midstate reused. For sha256 you could get potentially double the h= ashing performance and I assume/hope that hash is actually fairly high in t= he profile cost.. maybe more like a 1/3 improvement considering the size of= the entry, care should be taken to try to minimize compression function ru= ns. ... but at least the utxo representation there is entirely implementati= on defined.

You may be able to optimize the size o= f the hints further with the observation that false positives are harmless = as long as they're rare,=C2=A0 as in you can save some extra txouts and= if you later find them being spent, then just go ahead and remove them.=C2= =A0 So for example ribbon filters give you a memory space and read efficien= t hash table that is constructed by solving a linear system to make sure al= l the inputs match.=C2=A0 One could solve for successively larger filters t= o target a specific false positive rate.=C2=A0 (I mean just a 2,4 cuckoo fi= lter or similar would also work, but that's two memory accesses require= d and=C2=A0 you don't actually need runtime modification).



On Wednesday, April 9, 2025 at 10:11:29=E2= =80=AFAM UTC Ruben Somsen wrote:
Hi everyone,

Swi= ftSync is a new validation method that allows for near-stateless, fully par= allelizable validation of the Bitcoin blockchain via hints about which outp= uts remain unspent (<100MB total). All other inputs/outputs are efficien= tly crossed off inside a single hash aggregate that only reaches zero if va= lidation was successful and the hints were correct.

The main observa= tion is that it can be much easier to validate that a given UTXO set is cor= rect than to compute it yourself. It allows us to no longer require a state= ful moment-to-moment UTXO set during IBD and allows everything to be order = independent. I'll briefly summarize the protocol, before sharing the li= nk to the full write-up.

Each output gets a boolean hint (e.g. commi= tted into Bitcoin Core) about whether or not it will still be in the UTXO s= et after validation completes. If it does, we write it to disk (append-only= - it won't be used until SwiftSync finishes). If it does not, we hash = the UTXO data and add it to an aggregate. For each input, we once again has= h the UTXO data and remove it from the aggregate.

At the end, for ev= ery added output there should have been exactly one removed input, bringing= the end total of the aggregate to zero. If this is indeed the case, we wil= l have validated that the hints, and the resulting UTXO set, were correct.<= br>
E.g. For spent outputs A, B and inputs C, D we calculate hash(UTXO_A= ||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - hash(UTXO_D||salt) =3D= =3D 0 (proving (A=3D=3DC && B=3D=3DD) || (A=3D=3DD && B=3D= =3DC)).

There is one missing step. The UTXO data is only available w= hen processing the output, but not when processing the input. We resolve th= is by either downloading the outputs that were spent for each block (equiva= lent to the undo data, maybe 10-15% of a block), or we lean on assumevalid,= making it sufficient to only hash the outpoints (which are available in bo= th the output and input) rather than the full UTXO data.

Ignoring ba= ndwidth, the expectation is that the speedup will be most significant on ei= ther RAM limited devices and/or devices with many CPU cores. Initial PoC be= nchmarks (thanks to theStack) show a 5.28x speed-up, while currently still = being largely sequential.

Many more details are in the full write-up= :
https://gist.github.com/Ruben= Somsen/a61a37d14182ccd78760e477c78133cd

It will answer the follo= wing questions (and more):

- How the hash aggregate can be made secu= re against the generalized birthday problem
- How exactly assumevalid i= s utilized and what the implications are
- How we can still check= for inflation when we skip the amounts with assumevalid
- How we= can validate transaction order while doing everything in parallel
- How we can perform the BIP30 duplicate output check without the UTXO se= t
- How this all relates to assumeutxo

To my knowledge, ev= ery validation check involving the UTXO set is covered, but I'd be curi= ous to hear if anything was overlooked or if you spot any other issues.
=
Thanks for reading, and thanks to everyone who provided invaluable feed= back while the idea was coming together.

-- Ruben Somsen

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.googl= e.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegrou= ps.com.

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoind= ev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/ms= gid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.g= mail.com.
--0000000000009712750634250b5e--