Disasters can happen. We experienced data loss on our Elasticsearch cluster a few weeks ago after a failed upgrade. That’s why data redundancy isn’t enough: even when you data is replicated on multiple nodes, your data isn’t safe!
Backuping your elasticsearch cluster is another layer of security in case things go wrong:
- Failed Upgrade: in our case, that’s what happened. The data was upgraded but elasticsearch was unable to read it. Several nodes had corrupted data,
- Intrusions: what if a hacker gains access to your database,
- Multiple node failures: data is usually replicated on
1+
nodes, but what if several nodes fail simultaneously? It’s highly improbable, that’s true.
This tutorial is going to explain how we use a shared Network File System connected to all our Elasticsearch nodes to save incremental snapshots of the database every night. Let’s see how we can leverage NFS to store Elasticsearch snapshots.
NFS Setup
Prerequisites
We are going to use two servers:
- NFS Server: this server shares a folder on its local disk, (with IP
10.0.0.1
) - NFS Client: Elasticsearch node acting as an NFS client connected to the NFS Server. (with IP
10.0.0.2
)
Both are supposed to run on Ubuntu linux, with a non-root user with sudo privileges.
NFS Server
The NFS server is responsible of providing a shared folder accessible from all ElasticSearch nodes. Why? Because that’s the way Elasticsearch snapshots work. All nodes must have access to a shared storage to be able to store the snapshot data.
First, let’s install the NFS server packages:
|
|
It’s now time to create and share an NFS folder on this machine. Let’s suppose we want to share /var/nfs/elasticsearch
:
- Create the shared elasticsearch folder:
|
|
- Now let’s configure the NFS Server to share the folder with our NFS client:
|
|
- Here is an example showing how to share the folder with our client (suppose its IP is
127.0.0.1
):
|
|
Of course, replace 10.0.0.2
with the public IP of your NFS client.
Here, we’re using the same configuration options for both directories with the exception of no_root_squash. Let’s take a look at what each of these options mean:
- rw: client can both read and write files,
- sync: This option forces NFS to write changes to disk before replying. It improves consistency but reduces transfer speed,
- no_subtree_check: Disables whether a file is actually still available in the exported tree. This can cause many problems when a file is renamed while the client has it opened. In almost all cases, it is better to disable subtree checking,
- no_root_squash: By default, NFS translates requests from a root user remotely into a non-privileged user on the server. no_root_squash disables this behavior for certain shares.
Once the configuration is done, the NFS server must be restarted:
|
|
If you want to dig further (especially regarding security via firewalls like ufw), it’s worth reading How to Setup NFS mount on Ubuntu tutorial.
NFS Client
Each Elasticsearch node acts as an NFS Client. First, we need to install the ubuntu packages:
|
|
Then, we’re going to create and mount the shared NFS folder:
|
|
These commands mount the NFS share in folder /var/nfs/elasticsearch
on the client side. We can check if the mount has worked successfully by running this command:
|
|
As the folder has been mounted manually, the shared NFS folder will disappear on next reboot. To fix this, we need to add the NFS shared folder to /etc/fstab
. Your fstab
file should look like:
|
|
Fantastic! Now we have a shared NFS folder mounted on each Elasticsearch node in /var/nfs/elasticsearch
. We can test if it’s working by creating a file:
|
|
This should create a file named test.txt
which can be seen from any other NFS client. Now, we’re going to see how we can use this folder to store Elasticsearch snapshots.
Elasticsearch Setup
In this section, we’re going to use Kibana to administer the Elasticsearch cluster. We suppose Elasticsearch is installed directly on the server. For further information about Elasticsearch snapshots, refer to their documentation.
We’re now going to configure and create a snapshot repository mapped on the /var/nfs/elasticsearch
folder.
elasticsearch.yml
We need to declare the path.repo
configuration to allow fs
snapshot repositories to access the /var/nfs
folder:
|
|
This is mandatory otherwise the repository will fail to create. Restart Elasticsearch to apply the settings.
Snapshot Repository
Next, let’s now put the snapshot repository:
- Elastic-Search must be up and running,
- Start Kibana and open the
dev tools
, - Create the snapshot repository:
|
|
The server should answer:
|
|
Once created, let’s verify the snapshot repository is working properly:
|
|
The server should answer with something similar to:
|
|
If you encounter any exception while verifying the snapshot repository:
- Check mounts: double-check all elasticsearch nodes have the NFS folder mounted into the same location,
- Check elasticsearch.yml: make sure
path.repo
config is declared and properly set, - Restart Cluster: restart Elasticsearch cluster to make sure the settings declared in
elasticsearch.yml
, - Check Rights: it might be necessary to loosen rights on the NFS mount:
|
|
If this fixes the issue, try to chown the folder to the elasticsearch user from the client machine:
|
|
To list all the snapshots currently stored on the repository:
|
|
Conclusion
As we have seen, it’s pretty easy to setup a shared NFS mount to save and restore Elasticsearch database. I would definitely suggest to use a NFS Server with:
- 1Gbps+ Network Bandwidth: depending the amount of data you have, better make sure there is Gigabit connection between the nodes and the NFS server to speed up the process,
- RAID1 HDD or SSDs: the backup server should have RAID1 (aka Mirroring). Suppose your cluster data is corrupted and you need to restore a snapshot. You don’t want to have all your data stored on a non-redundant disk whose failure would be catastrophic,
- CPU: 4 cores or more. Better have an over-sized CPU to sustain high IOPS without any issue,
- RAM:
16GB+
of RAM is fine. The more RAM you have, the more the system can use to cache the filesystem.
From our experience, restoring 300GB
of data can take about an hour to be transferred from our NFS server to our 5 nodes Elasticsearch cluster. Each computer has a 1Gbps
network connection.
Once the snapshot has been restored, it still takes some time for Elasticsearch to replicate the primary shards. Make sure to check the process using Kibana:
|
|
I would advise to restart indexing documents only once the recovery process has completed (all indices shown as green by GET _cat/indices
).
How much time usually snapshot takes for storing GB of data to NFS drive?
Great post! Thank you.
On the elasticsearch nodes fstab you are using as mountpoint /var/nfs/elasticsearch-ovh then you do touch /var/nfs/elasticsearch/test.txt
Early in the article you mention “Both are supposed to run on Ubuntu linux, with a non-root user with sudo privileges.”
Later, you are configuring the nfs volume with no_root_squash, so that root users can have access, which is a security risk - use with caution.
Finally, in the section on configuring elasticsearch I see you are expecting to elastic to be running as a service user ( but with a generic 777 if permissions are getting in the way - good test that it’s a permissions issue to solve, but they should be hardened again afterwards).
It would be simpler to include that nfs is userid mapped from client to client & to check that userids match on all the client machines (and on the server if necessary).
Then permissions should work as expected & only add no_root_squash if you geniunely require the nfs permissions to be root (eg, if you are copying root access files between machines & need to preserve that).