{"id":18941,"date":"2026-04-03T06:20:07","date_gmt":"2026-04-03T06:20:07","guid":{"rendered":"https:\/\/cryptoted.net\/index.php\/2026\/04\/03\/state-tree-pruning-ethereum-foundation-blog\/"},"modified":"2026-04-03T06:20:07","modified_gmt":"2026-04-03T06:20:07","slug":"state-tree-pruning-ethereum-foundation-blog","status":"publish","type":"post","link":"https:\/\/cryptoted.net\/index.php\/2026\/04\/03\/state-tree-pruning-ethereum-foundation-blog\/","title":{"rendered":"State Tree Pruning | Ethereum Foundation Blog"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"\">\n<p class=\"chakra-text css-gi02ar\">One of the important issues that has been brought up over the course of the Olympic stress-net release is the large amount of data that clients are required to store; over little more than three months of operation, and particularly during the last month, the amount of data in each Ethereum client&#8217;s blockchain folder has ballooned to an impressive 10-40 gigabytes, depending on which client you are using and whether or not compression is enabled. Although it is important to note that this is indeed a stress test scenario where users are incentivized to dump transactions on the blockchain paying only the free test-ether as a transaction fee, and transaction throughput levels are thus several times higher than Bitcoin, it is nevertheless a legitimate concern for users, who in many cases do not have hundreds of gigabytes to spare on storing other people&#8217;s transaction histories.<\/p>\n<p class=\"chakra-text css-gi02ar\">First of all, let us begin by exploring why the current Ethereum client database is so large. Ethereum, unlike Bitcoin, has the property that every block contains something called the &#8220;state root&#8221;: the root hash of a <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/github.com\/ethereum\/wiki\/wiki\/Patricia-Tree\">specialized kind of Merkle tree<\/a> which stores the entire state of the system: all account balances, contract storage, contract code and account nonces are inside.<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"https:\/\/blog.ethereum.org\/images\/posts\/2015\/06\/ethblockchain_oneblock.png\" class=\"chakra-image css-hw6q2r\"\/><br \/>\n<\/center><\/p>\n<p class=\"chakra-text css-gi02ar\">The purpose of this is simple: it allows a node given only the last block, together with some assurance that the last block actually is the most recent block, to &#8220;synchronize&#8221; with the blockchain extremely quickly without processing any historical transactions, by simply downloading the rest of the tree from nodes in the network (the proposed <span class=\"chakra-text css-ons8vw\">HashLookup<\/span> <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/github.com\/ethereum\/wiki\/wiki\/Ethereum-Wire-Protocol\">wire protocol message<\/a> will faciliate this), verifying that the tree is correct by checking that all of the hashes match up, and then proceeding from there. In a fully decentralized context, this will likely be done through an advanced version of Bitcoin&#8217;s headers-first-verification strategy, which will look roughly as follows:<\/p>\n<ol role=\"list\" class=\"css-vgl4zd\">\n<li class=\"css-0\">Download as many block headers as the client can get its hands on.<\/li>\n<li class=\"css-0\">Determine the header which is on the end of the longest chain. Starting from that header, go back 100 blocks for safety, and call the block at that position P<sup>100<\/sup>(H) (&#8220;the hundredth-generation grandparent of the head&#8221;)<\/li>\n<li class=\"css-0\">Download the state tree from the state root of P<sup>100<\/sup>(H), using the <span class=\"chakra-text css-ons8vw\">HashLookup<\/span> opcode (note that after the first one or two rounds, this can be parallelized among as many peers as desired). Verify that all parts of the tree match up.<\/li>\n<li class=\"css-0\">Proceed normally from there.<\/li>\n<\/ol>\n<p class=\"chakra-text css-gi02ar\">For light clients, the state root is even more advantageous: they can immediately determine the exact balance and status of any account by simply asking the network for <em class=\"chakra-text css-0\">a particular branch<\/em> of the tree, without needing to follow Bitcoin&#8217;s multi-step 1-of-N &#8220;ask for all transaction outputs, then ask for all transactions spending those outputs, and take the remainder&#8221; light-client model.<\/p>\n<p class=\"chakra-text css-gi02ar\">However, this state tree mechanism has an important disadvantage if implemented naively: the intermediate nodes in the tree greatly increase the amount of disk space required to store all the data. To see why, consider this diagram here:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"https:\/\/blog.ethereum.org\/images\/posts\/2015\/06\/ethblockchain.png\" class=\"chakra-image css-hw6q2r\"\/><br \/>\n<\/center><\/p>\n<p class=\"chakra-text css-gi02ar\">The change in the tree during each individual block is fairly small, and the magic of the tree as a data structure is that most of the data can simply be referenced twice without being copied. However, even still, for every change to the state that is made, a logarithmically large number of nodes (ie. ~5 at 1000 nodes, ~10 at 1000000 nodes, ~15 at 1000000000 nodes) need to be stored twice, one version for the old tree and one version for the new trie. Eventually, as a node processes every block, we can thus expect the total disk space utilization to be, in computer science terms, roughly <span class=\"chakra-text css-ons8vw\">O(n*log(n))<\/span>, where <span class=\"chakra-text css-ons8vw\">n<\/span> is the transaction load. In practical terms, the Ethereum blockchain is only 1.3 gigabytes, but the size of the database including all these extra nodes is 10-40 gigabytes.<\/p>\n<p class=\"chakra-text css-gi02ar\">So, what can we do? One backward-looking fix is to simply go ahead and implement headers-first syncing, essentially resetting new users&#8217; hard disk consumption to zero, and allowing users to keep their hard disk consumption low by re-syncing every one or two months, but that is a somewhat ugly solution. The alternative approach is to implement <em class=\"chakra-text css-0\">state tree pruning<\/em>: essentially, use <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/en.wikipedia.org\/wiki\/Reference_counting\">reference counting<\/a> to track when nodes in the tree (here using &#8220;node&#8221; in the computer-science term meaning &#8220;piece of data that is somewhere in a graph or tree structure&#8221;, not &#8220;computer on the network&#8221;) drop out of the tree, and at that point put them on &#8220;death row&#8221;: unless the node somehow becomes used again within the next <span class=\"chakra-text css-ons8vw\">X<\/span> blocks (eg. <span class=\"chakra-text css-ons8vw\">X = 5000<\/span>), after that number of blocks pass the node should be permanently deleted from the database. Essentially, we store the tree nodes that are part of the current state, and we even store recent history, but we do not store history older than 5000 blocks.<\/p>\n<p class=\"chakra-text css-gi02ar\"><span class=\"chakra-text css-ons8vw\">X<\/span> should be set as low as possible to conserve space, but setting <span class=\"chakra-text css-ons8vw\">X<\/span> too low compromises robustness: once this technique is implemented, a node cannot revert back more than <span class=\"chakra-text css-ons8vw\">X<\/span> blocks without essentially completely restarting synchronization. Now, let&#8217;s see how this approach can be implemented fully, taking into account all of the corner cases:<\/p>\n<ol role=\"list\" class=\"css-vgl4zd\">\n<li class=\"css-0\">When processing a block with number <span class=\"chakra-text css-ons8vw\">N<\/span>, keep track of all nodes (in the state, tree and receipt trees) whose reference count drops to zero. Place the hashes of these nodes into a &#8220;death row&#8221; database in some kind of data structure so that the list can later be recalled by block number (specifically, block number <span class=\"chakra-text css-ons8vw\">N + X<\/span>), and mark the node database entry itself as being deletion-worthy at block <span class=\"chakra-text css-ons8vw\">N + X<\/span>.<\/li>\n<li class=\"css-0\">If a node that is on death row gets re-instated (a practical example of this is account A acquiring some particular balance\/nonce\/code\/storage combination <span class=\"chakra-text css-ons8vw\">f<\/span>, then switching to a different value <span class=\"chakra-text css-ons8vw\">g<\/span>, and then account B acquiring state <span class=\"chakra-text css-ons8vw\">f<\/span> while the node for <span class=\"chakra-text css-ons8vw\">f<\/span> is on death row), then increase its reference count back to one. If that node is deleted again at some future block <span class=\"chakra-text css-ons8vw\">M<\/span> (with <span class=\"chakra-text css-ons8vw\">M &gt; N<\/span>), then put it back on the future block&#8217;s death row to be deleted at block <span class=\"chakra-text css-ons8vw\">M + X<\/span>.<\/li>\n<li class=\"css-0\">When you get to processing block <span class=\"chakra-text css-ons8vw\">N + X<\/span>, recall the list of hashes that you logged back during block <span class=\"chakra-text css-ons8vw\">N<\/span>. Check the node associated with each hash; if the node is still marked for deletion <em class=\"chakra-text css-0\">during that specific block<\/em> (ie. not reinstated, and importantly not reinstated and then re-marked for deletion <em class=\"chakra-text css-0\">later<\/em>), delete it. Delete the list of hashes in the death row database as well.<\/li>\n<li class=\"css-0\">Sometimes, the new head of a chain will not be on top of the previous head and you will need to revert a block. For these cases, you will need to keep in the database a journal of all changes to reference counts (that&#8217;s &#8220;journal&#8221; as in <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/en.wikipedia.org\/wiki\/Journaling_file_system\">journaling file systems<\/a>; essentially an ordered list of the changes made); when reverting a block, delete the death row list generated when producing that block, and undo the changes made according to the journal (and delete the journal when you&#8217;re done).<\/li>\n<li class=\"css-0\">When processing a block, delete the journal at block <span class=\"chakra-text css-ons8vw\">N &#8211; X<\/span>; you are not capable of reverting more than <span class=\"chakra-text css-ons8vw\">X<\/span> blocks anyway, so the journal is superfluous (and, if kept, would in fact defeat the whole point of pruning).<\/li>\n<\/ol>\n<p class=\"chakra-text css-gi02ar\">Once this is done, the database should only be storing state nodes associated with the last <span class=\"chakra-text css-ons8vw\">X<\/span> blocks, so you will still have all the information you need from those blocks but nothing more. On top of this, there are further optimizations. Particularly, after <span class=\"chakra-text css-ons8vw\">X<\/span> blocks, transaction and receipt trees should be deleted entirely, and even blocks may arguably be deleted as well &#8211; although there is an important argument for keeping some subset of &#8220;archive nodes&#8221; that store absolutely everything so as to help the rest of the network acquire the data that it needs.<\/p>\n<p class=\"chakra-text css-gi02ar\">Now, how much savings can this give us? As it turns out, quite a lot! Particularly, if we were to take the ultimate daredevil route and go <span class=\"chakra-text css-ons8vw\">X = 0<\/span> (ie. lose absolutely all ability to handle even single-block forks, storing no history whatsoever), then the size of the database would essentially be the size of the state: a value which, even now (this data was grabbed at block 670000) stands at roughly 40 megabytes &#8211; the majority of which is made up of <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/explorer.etherapps.info\/address\/0x798d86e782c8c34da97f7389a464c8af76ea3442\">accounts like this one<\/a> with storage slots filled to deliberately spam the network. At <span class=\"chakra-text css-ons8vw\">X = 100000<\/span>, we would get essentially the current size of 10-40 gigabytes, as most of the growth happened in the last hundred thousand blocks, and the extra space required for storing journals and death row lists would make up the rest of the difference. At every value in between, we can expect the disk space growth to be linear (ie. <span class=\"chakra-text css-ons8vw\">X = 10000<\/span> would take us about ninety percent of the way there to near-zero).<\/p>\n<p class=\"chakra-text css-gi02ar\">Note that we may want to pursue a hybrid strategy: keeping every <em class=\"chakra-text css-0\">block<\/em> but not every <em class=\"chakra-text css-0\">state tree node<\/em>; in this case, we would need to add roughly 1.4 gigabytes to store the block data. It&#8217;s important to note that the cause of the blockchain size is NOT fast block times; currently, the block headers of the last three months make up roughly 300 megabytes, and the rest is transactions of the last one month, so at high levels of usage we can expect to continue to see transactions dominate. That said, light clients will also need to prune block headers if they are to survive in low-memory circumstances.<\/p>\n<p class=\"chakra-text css-gi02ar\">The strategy described above has been implemented in a very early alpha form in <a target=\"_blank\" rel=\"noopener\" class=\"chakra-link css-vezwxf\" href=\"https:\/\/github.com\/ethereum\/pyethereum\/tree\/pruning\">pyeth<\/a>; it will be implemented properly in all clients in due time after Frontier launches, as such storage bloat is only a medium-term and not a short-term scalability concern.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/blog.ethereum.org\/en\/2015\/06\/26\/state-tree-pruning\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the important issues that has been brought up over the course of the Olympic stress-net release is the large amount of data that clients are required to store; over little more than three months of operation, and particularly during the last month, the amount of data in each Ethereum client&#8217;s blockchain folder has [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":18498,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"tdm_status":"","tdm_grid_status":"","footnotes":""},"categories":[24],"tags":[],"kronos_expire_date":[],"class_list":["post-18941","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ethereum"],"_links":{"self":[{"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/posts\/18941","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/comments?post=18941"}],"version-history":[{"count":0,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/posts\/18941\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/media\/18498"}],"wp:attachment":[{"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/media?parent=18941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/categories?post=18941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/tags?post=18941"},{"taxonomy":"kronos_expire_date","embeddable":true,"href":"https:\/\/cryptoted.net\/index.php\/wp-json\/wp\/v2\/kronos_expire_date?post=18941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}