Running a Validator
Getting a validator live is just the beginning
Last updated
Getting a validator live is just the beginning
Last updated
Getting a Validator live is just the beginning. Running a Production Validator requires far more than the initial steps. Preparing your architecture for maximum uptime and redundancy falls outside the scope of this documentation, but a few of the bare-minimum requirements are:
How will you handle chain upgrades?
consider: Cosmovisor
How will you know your node is up?
consider: Monitoring and Alerts
How will you mitigate DDOS attacks?
consider: Sentry Nodes
How much storage will you need and how will you grow storage?
Answering these questions can be daunting, so there is some advice below.
In order to streamline chain upgrades and minimize downtime, you may want to set up to manage your node.
Backups of chain state are possible using the commands specified . If you are using a recent version of Cosmovisor, then the default configuration is that a state backup will be created before upgrades are applied. .
Taking backups of the .starsd/data
folder is important for quick recovery if required
Alerting and monitoring is desirable as well - you are encouraged to explore solutions and find one that works for your setup. Prometheus is available out-of-the box, and there are a variety of open-source tools. Recommended reading:
And for real-time alerting, consider:
Simple setup using Grafana Cloud
Using only the raw metrics endpoint provided by starsd
you can get a working dashboard and alerting setup using Grafana Cloud. This means you don't have to run Grafana on the instance.
First, in config.toml
enable Prometheus. The default metrics port will be 26660
Download Prometheus - this is needed to ship logs to Grafana Cloud.
3. Set up a service file, with sudo nano /etc/systemd/system/prometheus.service
, replacing <your-user>
and <prometheus-folder>
with the location of Prometheus. This sets the Prometheus port to 6666
4. Enable and start the service.
5. Import a dashboard to your Grafana. Search for 'Cosmos Validator' to find several options. You should see logs arriving in the dashboard after a couple of minutes.
For more info:
Disk space is likely to fill up, so having a plan for managing storage is key.
If you are running sentry nodes:
512GB storage for the full node will give you a lot of runway
256GB each for the sentries with pruning should be sufficient
Managing backups is outside the scope of this documentation, but several validators keep public snapshots and backups.
To give you an idea of cost, on AWS EBS (other cloud providers are available, or you can run your own hardware), with two backups a day, this runs to roughly:
$150 for 1TB
$35 for 200GB
Total cost: $220
What approach you take for this will depend on whether you are running on physical hardware co-located with you, running in a data centre, or running on virtualized hardware.
It's extremely important to be vigilant in the monitoring of both your node(s) as well as the Stargaze validator ecosystem:
Watch for new governance votes and VOTE appropriately for you and your delegations
Create a prometheus.yml
file with your in the Prometheus folder. You can get these via the Grafana UI. Click 'details' on the Prometheus card:
If you are comfortable with server ops, you might want to build out a validator to protect against DDOS attacks.
The current best practice for running mainnet nodes is a Sentry Node Architecture. There are various approaches, as . Some validators advocate co-locating all three nodes in virtual partitions on a single box, using Docker or other virtualization tools. However, if in doubt, just run each node on a different server.
Bear in mind that Sentries can have pruning turned on, as outlined . It is desirable, but not essential, to have pruning disabled on the validator node itself.
Ensure you're alerted to all notifications on the #validator-announcements channel on the