When I was reading bitcoin-paper-errata-and-details.md written by David A. Harding, I realized that there’s probably a common misunderstanding or over-simplification about Merkle tree pruning. What Nick ODell had said might be a live example:
- A leaf (transaction) can be pruned when all of its outputs have been spent.
This once seemed to be true for me, until I read what David had written:
there is currently no way in Bitcoin to prove that a transaction has not been spent
I’m not sure whether I have grasped it, so firstly I made a diagram to illustrate (part of) my understanding to this problem:
Still, I don’t think merely this problem can kill the whole idea of Merkle tree pruning yet, I think it just means that "the reclaimable disk capacity is much lower than expectation". In other words, if I’m not mistaken, Nick ODell’s claim could be "corrected" like:
- A leaf (transaction) can be pruned when all of its outputs have been spent, and all of its previous transactions have been pruned.
However, I then think that, even if the "corrected" claim is taken into consideration, the idea of Merkle tree pruning still doesn’t seem to be feasible/useful:
Even if the problem mentioned above is avoided, a malicious node can still deceive the new full node by hiding/picking some merkle branches. A malicious node can lie about the actual ownership of coins (spent/unspent state) without breaking the Merkle tree structure at all. In other words, a new full node joining the network still needs to download & verify everything, otherwise, it could be deceived by a malicious node.
If a full node needs to enable pruning to reduce disk space requirement for itself, directly reading/modifying the blockchain files seems to be much less efficient than the current implementation that the UTXO set is completely separated from the blockchain storage, so that a full node (no matter it’s pruning or not) only needs to query and update the UTXO set database during the downloading & validation process. The blockchain itself doesn’t need to be touched once again for validation purposes at all, which is the reason why the old blocks can be simply deleted when "pruning" (not Merkle tree pruning) is enabled.
However, I’m still not sure about this conclusion. Is this related to the idea of fraud proofs, in the sense that as long as there’s still at least one honest full node, the new node would be able to spot which piece of data is the correct one? What if the UTXO set is also committed to the blockchain? What if some more commitments like the block height of previous transaction are also added to the blockchain?
Furthermore, I’ve heard that the Mimblewimble protocol enables secure blockchain pruning. I’m also curious how Mimblewimble could achieve this, and whether similar goal could be eventually achieved in Bitcoin?
From what I understand, a full node will download everything, verify everything first. Then only the pruning starts. Thus, the issue of inconsistent pruning does not arise. As off now, there is no common consensus on the UTXO sets. For example, up to block 500,000, the UTXO set is "XXXX". Just download "XXXX",it should be fine. I believe we will have this feature in the future. As off now, fullnode have to download everything, verify and then only prune.
Answered by Cisco Mmu on December 10, 2021
1 Asked on November 29, 2020 by libertarian-monarchist-bot
1 Asked on November 18, 2020 by austinfoss
3 Asked on November 8, 2020 by mirsab-jastram
2 Asked on September 22, 2020 by dujon-w
1 Asked on September 3, 2020 by adam-mitchell
1 Asked on August 29, 2020 by nicklessguy
1 Asked on August 22, 2020 by isaac-chen
Get help from others!