What is Geth's "fast" sync, and why is it faster?

Question

One of the answers to this question suggested using Geth's --fast flag to help quickly synchronise the block data.

How does the flag work, and how does using it speed up the synchronisation? Are we syncing less data, or are we in some way performing fewer checks on its integrity or source?

Edit:

As of Geth version 1.6.0, the --fast flag has become --syncmode=fast (though --fast is also still usable for now).

eth · Answer

From the Geth FAQ https://geth.ethereum.org/docs/faq
Q. How do Ethereum syncing work?
A. The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.
Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.
So, what’s the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account’s are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.
Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you’re downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.
If you see that you are 64 blocks behind mainnet, you aren’t yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You’ll need to wait that out too before your node comes truly online.

Read the rest of the FAQ for further answers like:
Q: The node just hangs on importing state enties?!
Q: I’m stuck at 64 blocks behind mainnet?!
Q: Why does downloading the state take so long, I have good bandwidth?
Q: Wait, so I can’t run a full node on an HDD?

Alexey Koshlatyy · Answer

"fast" is the default value for --syncmode key

It means, that there are no different to use --syncmode fast or do not use it.

The information from https://github.com/ethereum/go-ethereum/wiki/command-line-options

--syncmode value      Blockchain sync mode ("fast", "full", or "light") (default: fast)

eth · Answer

Don't forget to use an SSD
If you're able to, using an NVMe SSD is even better.
If you have limited space on SSD see Can chaindata be split across two (or more) locations?
That said, the Ethereum state is large and getting larger. Be patient and it will be worth it.
How can I get a geth node to download the blockchain quickly? wiki has been updated. This answer has been updated for those finding this and having problems syncing.

Prior answer
As --fast is often the only thing associated with a fast sync, don't forget --cache too.
From the Homestead Guide:

Below are some flags to use when you want to sync your client more
quickly.
--fast
This flag enables fast syncing through state downloads rather than
downloading the full block data. This will also reduce the size of
your blockchain dramatically. NOTE: --fast can only be run if you are
syncing your blockchain from scratch and only the first time you
download the blockchain for security reasons. See this Reddit post for
more information.
--cache=1024
Megabytes of memory allocated to internal caching (min 16MB / database
forced). Default is 16MB, so increasing this to 256, 512, 1024 (1GB),
or 2048 (2GB) depending on how much RAM your computer has should make
a difference.

paulmorriss · Answer

There's a lot of detail on this PR on github. Here's a quote:

Instead of processing the entire block-chain one link at a time, and replay all transactions that ever happened in history, fast syncing downloads the transaction receipts along the blocks, and pulls an entire recent state database.

What is Geth's "fast" sync, and why is it faster?

4 Answers

Don't forget to use an SSD

Prior answer

Add your own answers!

Ask a Question