Failover and Data Recovery in the Search Service

In the case of a failure, your ingress replicator(s) or search service nodes may be unreachable. This topic describes what happens during an outage.

Note: To avoid non-recoverable disk failures, Jive Software recommends that you configure the ingress replicator journals and search service indexes so that they are written to durable storage. For each ingress replicator, allocate at least 20GB for journal storage. For each search service, allocate at least 50GB for index storage. Monitor these storage volumes for remaining capacity, maintaining 25% free capacity.

In the case of a failure of any given node in your HA search configuration, here's what happens:

Ingress replicator node fails: The ingress replicator journals everything to disk to guarantee all ingressed activities will be delivered at least once. If the service fails or is stopped, it will send any remaining journaled events when it starts back up. If the service cannot come back up due to a non-recoverable disk failure, then a full rebuild will be required (see Rebuilding an On-Premise HA Search Service). If both ingress replicators fail (or you have only one and it fails), for the duration of the outage no new content will be indexed; but, when the ingress replicator comes back online, the search service will catch up with the indexed content (due to local caching on the web application nodes); therefore, the search service will not have missed anything.
Search service node fails: If search service 1 or 2 is offline for any reason, the ingress replicator will retain the undelivered activities. When search service 1 or 2 is restored to a healthy state, the undelivered activities will be sent to the restored service. While previously undelivered activities are being fed into the newly restored service, the search indexes will be out of sync. After all undelivered activities have been received by the restored service, the indexes will be synced. If the service cannot be restored due to a non-recoverable disk failure, then you'll need to remove and re-add the affected search service (see Adding an On-Premise HA Search Service Node). If you leave a search service down for a very long period of time (e.g., many weeks), you may run out of disk space because the ingress replicator services will be persisting to disk until the configured search service is restored. If you don't plan to restore the offline search service, then remove the offline search service from all ingress replicator configuration files and restart the ingress replicators.