From c8fbadc67bcf9acca29c8cae114c6db8533ae6d1 Mon Sep 17 00:00:00 2001 From: kaustubhkapatral Date: Tue, 14 Oct 2025 11:23:03 +0530 Subject: [PATCH 1/3] added guide for reenabling pruning --- SUMMARY.md | 1 + docs/validator-guide/reenable-pruning.md | 60 ++++++++++++++++++++++++ 2 files changed, 61 insertions(+) create mode 100644 docs/validator-guide/reenable-pruning.md diff --git a/SUMMARY.md b/SUMMARY.md index 7c02c9d..153177c 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -26,6 +26,7 @@ * [Unjailing a jailed validator](docs/validator-guide/unjail.md) * [Move validator to a different machine](docs/validator-guide/move-validator.md) * [Backup and restore node keys with Hashicorp Vault](docs/validator-guide/backup-and-restore.md) + * [Re-enable pruning and recovering node db](docs/validator-guide/reenable-pruning.md) * [Network-wide Software Upgrades](docs/upgrades/README.md) * [Upgrade Guides](docs/upgrades/upgrade-guides/README.md) * [Upgrade to v0.6.x](docs/upgrades/upgrade-guides/v0.6-upgrade.md) diff --git a/docs/validator-guide/reenable-pruning.md b/docs/validator-guide/reenable-pruning.md new file mode 100644 index 0000000..fb83e53 --- /dev/null +++ b/docs/validator-guide/reenable-pruning.md @@ -0,0 +1,60 @@ +# Guide for re-enabling pruning and recovering node db +This guide is specifically made for the the validators/node operators affected by the pruning issue which caused a mainnet halt on September 6th 2025. + +Re enabling pruning (to `default`/`custom` from `nothing`) on the affected nodes will cause the nodes to halt again as the database pruning resumes its operation. To avoid this it is recommended to reset your node db and recover it using state sync or a db snapshot. The nodes that weren't affected by the pruning issue or that had recovered by the time the `nothing` pruning fix was rolled out do not have to undergo this procedure. + +## State Sync:- +Stop the systemd service of the node. +`sudo systemctl stop cheqd-cosmovisor.service` + +Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes.** +`cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json` + +Turn the pruning strategy to default or custom based on your preference in the `~/.cheqdnode/config/app.toml` file. + +Reset the database. +`cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book` + +Restore the priv_validator_state.json . **This step is very Important for validator nodes** +`cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json` + +Enable statesync on the node and provide the required variables. +``` +STATESYNC_RPC="https://rpc.cheqd.net:443" +LATEST_HEIGHT=$(curl -s $STATESYNC_RPC/block | jq -r .result.block.header.height) +BLOCK_HEIGHT=$((LATEST_HEIGHT - 2000)) +TRUST_HASH=$(curl -s "$STATESYNC_RPC/block?height=$BLOCK_HEIGHT" | jq -r .result.block_id.hash) +sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1true| ; \ +s|^(rpc_servers[[:space:]]+=[[:space:]]+).*$|\1\"$STATESYNC_RPC,$STATESYNC_RPC\"| ; \ +s|^(trust_height[[:space:]]+=[[:space:]]+).*$|\1$BLOCK_HEIGHT| ; \ +s|^(trust_hash[[:space:]]+=[[:space:]]+).*$|\1\"$TRUST_HASH\"|" ~/.cheqdnode/config/config.toml +``` + +Start the node +`sudo systemctl restart cheqd-cosmovisor.service` + +The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. + +## Snapshot:- +Stop the systemd service of the node. +`sudo systemctl stop cheqd-cosmovisor.service` +  +Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes.** +`cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json` +  +Turn the pruning strategy to default or custom based on your preference in the `~/.cheqdnode/config/app.toml` file. +  +Reset the database. +`cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book` +  +Restore the priv_validator_state.json. **This step is very Important for validator nodes** +`cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json` + +Download the latest lz4 tar archive for mainnet from this link https://snapshots.cheqd.net/#mainnet/ +`wget https://cheqd-node-backups.ams3.digitaloceanspaces.com/mainnet//cheqd-mainnet-1_.tar.lz4` + +Unpack the tar archive and restore the db +`lz4 -c -d cheqd-mainnet-1_.tar.lz4 | tar -x -C ~/.cheqdnode` +  +Start the node +`sudo systemctl restart cheqd-cosmovisor.service` \ No newline at end of file From 884e5b8ee0f972fb801240170fb16d007c03d7d7 Mon Sep 17 00:00:00 2001 From: filipdjokic Date: Tue, 14 Oct 2025 14:55:12 +0200 Subject: [PATCH 2/3] Additional details & minor format changes --- docs/validator-guide/reenable-pruning.md | 116 ++++++++++++++++------- 1 file changed, 80 insertions(+), 36 deletions(-) diff --git a/docs/validator-guide/reenable-pruning.md b/docs/validator-guide/reenable-pruning.md index fb83e53..cdbc47b 100644 --- a/docs/validator-guide/reenable-pruning.md +++ b/docs/validator-guide/reenable-pruning.md @@ -1,25 +1,44 @@ -# Guide for re-enabling pruning and recovering node db -This guide is specifically made for the the validators/node operators affected by the pruning issue which caused a mainnet halt on September 6th 2025. +# Guide for re-enabling pruning and recovering node db -Re enabling pruning (to `default`/`custom` from `nothing`) on the affected nodes will cause the nodes to halt again as the database pruning resumes its operation. To avoid this it is recommended to reset your node db and recover it using state sync or a db snapshot. The nodes that weren't affected by the pruning issue or that had recovered by the time the `nothing` pruning fix was rolled out do not have to undergo this procedure. +This guide is specifically made for the the validators/node operators affected by the pruning issue encountered following our v4.x upgrade. This issue required some validators/node operators having to disable pruning entirely. -## State Sync:- -Stop the systemd service of the node. +Re-enabling pruning (to `default`/`custom` from `nothing`) on the affected nodes will cause the nodes to halt again as the database pruning resumes its operation. To avoid this it is recommended to reset your node db and recover it using state sync or a db snapshot. The nodes that weren't affected by the pruning issue or that had recovered by the time the pruning fix was rolled out do not have to undergo this procedure. + +Following this procedure will significantly reduce the disk space required for your node’s regular operations, thereby lowering operational costs (on the nodes we manage, we observed storage usage drop from 700+ GB to under 10 GB). Additionally, running a node with less disk usage will likely improve performance. + +You have two options here: + +1) Reset via State Sync +2) Reset by using DB snapshot + +## State Sync + +Stop the systemd service of the node. `sudo systemctl stop cheqd-cosmovisor.service` -Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes.** -`cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json` +Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes:** + +```bash +cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json +``` Turn the pruning strategy to default or custom based on your preference in the `~/.cheqdnode/config/app.toml` file. -Reset the database. -`cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book` +Reset the database: -Restore the priv_validator_state.json . **This step is very Important for validator nodes** -`cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json` +```bash +cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book +``` -Enable statesync on the node and provide the required variables. +Restore the priv_validator_state.json. **This step is very Important for validator nodes:** + +```bash +cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json ``` + +Enable statesync on the node and provide the required variables: + +```bash STATESYNC_RPC="https://rpc.cheqd.net:443" LATEST_HEIGHT=$(curl -s $STATESYNC_RPC/block | jq -r .result.block.header.height) BLOCK_HEIGHT=$((LATEST_HEIGHT - 2000)) @@ -30,31 +49,56 @@ s|^(trust_height[[:space:]]+=[[:space:]]+).*$|\1$BLOCK_HEIGHT| ; \ s|^(trust_hash[[:space:]]+=[[:space:]]+).*$|\1\"$TRUST_HASH\"|" ~/.cheqdnode/config/config.toml ``` -Start the node -`sudo systemctl restart cheqd-cosmovisor.service` +Start the node: -The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. +```bash +sudo systemctl restart cheqd-cosmovisor.service +``` + +The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. After some time, it should catch up with the network and continue siging blocks. + +## Snapshot + +Stop the systemd service of the node: + +```bash +sudo systemctl stop cheqd-cosmovisor.service` +``` + +Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes:** + +```bash +cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json +``` -## Snapshot:- -Stop the systemd service of the node. -`sudo systemctl stop cheqd-cosmovisor.service` -  -Take a backup of the priv_validator_state.json. **This step is very Important for validator nodes.** -`cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json` -  Turn the pruning strategy to default or custom based on your preference in the `~/.cheqdnode/config/app.toml` file. -  -Reset the database. -`cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book` -  -Restore the priv_validator_state.json. **This step is very Important for validator nodes** -`cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json` - -Download the latest lz4 tar archive for mainnet from this link https://snapshots.cheqd.net/#mainnet/ -`wget https://cheqd-node-backups.ams3.digitaloceanspaces.com/mainnet//cheqd-mainnet-1_.tar.lz4` - -Unpack the tar archive and restore the db -`lz4 -c -d cheqd-mainnet-1_.tar.lz4 | tar -x -C ~/.cheqdnode` -  + +Reset the database: + +```bash +cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book` +``` + +Restore the priv_validator_state.json. **This step is very Important for validator nodes:** + +```bash +cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json +``` + +Download the latest lz4 tar archive for mainnet from [our snapshots page](https://snapshots.cheqd.net/#mainnet/) + +```bash +wget https://cheqd-node-backups.ams3.digitaloceanspaces.com/mainnet//cheqd-mainnet-1_.tar.lz4 +``` + +Unpack the tar archive and restore the db: + +```bash +lz4 -c -d cheqd-mainnet-1_.tar.lz4 | tar -x -C ~/.cheqdnode` +``` + Start the node -`sudo systemctl restart cheqd-cosmovisor.service` \ No newline at end of file + +```bash +sudo systemctl restart cheqd-cosmovisor.service +``` From 9b1e4e11bdd756dec116dc21301c9ea5706a763f Mon Sep 17 00:00:00 2001 From: Filip Djokic <87134019+filipdjokic@users.noreply.github.com> Date: Tue, 14 Oct 2025 15:27:26 +0200 Subject: [PATCH 3/3] Fix typo Co-authored-by: Kaustubh K <54210167+kaustubhkapatral@users.noreply.github.com> --- docs/validator-guide/reenable-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/validator-guide/reenable-pruning.md b/docs/validator-guide/reenable-pruning.md index cdbc47b..3312d97 100644 --- a/docs/validator-guide/reenable-pruning.md +++ b/docs/validator-guide/reenable-pruning.md @@ -55,7 +55,7 @@ Start the node: sudo systemctl restart cheqd-cosmovisor.service ``` -The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. After some time, it should catch up with the network and continue siging blocks. +The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. After some time, it should catch up with the network and continue signing blocks. ## Snapshot