It seams even the CryptPad team does not know how to do that.
They are currently doing what looks like a daily backup with manual failover :
"We have a replicate server. So if a server goes down with all the data of CryptPad, we have the data from the day before, that is ready on a 2nd server. So we are able to restore the service on a new server."
source
Until the application is designed to run concurrently on multiple servers, the most you can do safely is some kind of automatic failover.
1 main server
1 failover server
File replication between the two. ("unison" for example)
cryptpad app only started on the main server.
A failover/loadbalancing service, monitoring and forwarding web traffic to the main server.
When the failover service detects the main server goes down, it switches the web traffic to the failover server.
The tricky part is to never have the application running on both servers at the same time, to avoid data corruption from background tasks, like cleaning expired data.
So some other automation mechanism should be in place to start the app only on the active server, and stop it on the other server.
Taking and refreshing a lock on an (HA) external database could help on this matter.
If an application does not have built-in high availability, then its not production ready.
CryptPad could gain high availability by storing all data in a MongoDB cluster (with files in GridFS), with some kind of database locking mechanism to coordinate background tasks between app servers.
The main challenge would be websocket servers.
Because all users of a channelid (collaborating on the same document) must be able to exchange messages through the same websocket server. Without transiting via the database for performance reasons.
"ZeroMQ" could be a solution for that.
Are some kind of internal websocket channel-balancing.
In any case, until the CryptPad team is interested in going the HA road, we are stuck with wacky failover solutions.