Failover and data recovery in Search Service

In the case of a failure, your ingress replicators or search service nodes may be unreachable. This topic describes what happens during an outage.

Note: To avoid non-recoverable disk failures, we recommend that you configure the ingress replicator journals and search service indexes so that they are written to durable storage. For each ingress replicator, allocate at least 20 GB for journal storage. For each search service, allocate at least 50 GB for index storage. Monitor these storage volumes for remaining capacity, maintaining 25% free capacity.

In the case of a failure of any given node in your HA search configuration, here is an overview of what happens.

Ingress replicator node fails

The ingress replicator journals everything to disk to guarantee all ingressed activities are delivered at least once. If the service fails or is stopped, it sends any remaining journaled events when it starts back up. If the service cannot come back up due to a non-recoverable disk failure, then a full rebuild is required.

If both ingress replicators fail (or you have only one and it fails), for the duration of the outage no new content is indexed; but, when the ingress replicator comes back online, the search service catches up with the indexed content (due to local caching on the web application nodes); therefore, the search service does not miss anything.

For more information on rebuilding search index, see Rebuilding On-prem HA Search Service.

Search service node fails

If search service 1 or 2 is offline for any reason, the ingress replicator retains the undelivered activities. When search service 1 or 2 is restored to a healthy state, the undelivered activities are sent to the restored service. While previously undelivered activities are being fed into the newly restored service, the search indexes will be out of sync. After all undelivered activities have been received by the restored service, the indexes are synced.

If the service cannot be restored due to a non-recoverable disk failure, then you need to remove and re-add the affected search service.

If you leave a search service down for a very long period of time (such as for several weeks), you may run out of disk space because the ingress replicator services will be persisting to disk until the configured search service is restored. If you don't plan to restore the offline search service, then remove the offline search service from all ingress replicator configuration files and restart the ingress replicators.

For more information, see Adding an On-Premise HA Search server.