Redundant Server

A redundant server is a server currently running 0 cameras. Upon detecting a server as being down, a redundant server is used to replace the down server's camera and device processing in its entirety. Meaning, all cameras will be moved and run on the redundant server. Video loss will be a little as 15 seconds.

To enable redundancy:

        you must have at least 1 redundant server available at all times (one with 0 cameras).

       the redundant server must be in the same Redundancy Group as the potential down server.

       redundancy must be turned On for that Redundancy Group

 

Example

Server Farm configuration:

If either of the first 2 servers fail, their cameras will failover to the 3rd Redundant server.

If the first server (Redundancy Group 7) fails, no failover will occur, as there is no Redundant server in Group 7.

Different Redundancy Groups "1" and "7"

 

Example

Typical Symphony Server Farm:

This configuration depicts use of an external database cluster for configuration data redundancy, and a NAS or SAN for historical footage file access after failover.

Multi-server Farm with configuration database existing on one of the Symphony Servers:

If server redundancy is a requirement, this is not a recommended setup, since it involves a single point of failure, namely Server 1. If this server fails, configuration is not accessible by the remaining servers.

 

Redundancy Groups

Due to geographical constraints for file storage, it may be necessary for certain servers to failover only to specific servers. A redundancy group allows you to group your servers such that failover happens only amongst servers within the same group. Ensure that there is at least 1 redundant server within each server group.

Buddy System

A Redundancy Group uses a buddy neighbor system where each server monitors the health of its neighbors (or buddies). Each server broadcasts an Alive status every second to each of its buddy servers, and each server listens for Alive messages from other neighbors. It is a connected graph of neighbors such that if more than one server is down there will always be someone to detect them.

Each server runs a monitoring thread that receives UDP socket messages from each of its buddies.

        If the detection threshold time expires without receiving an Alive message from a particular buddy, then that server may be down. A possible down server message is sent to the Master server.

        If more than 1/2 of the buddies notify the Master of this down server, it is confirmed to be down. In this case a failover camera swapping algorithm takes place to transfer all the down server's camera processing to a redundant server if one is available.

Redundancy Configuration Settings

The following are the configurable farm redundancy settings.

 

Setting

Description

FarmHealthStartDelayMs

On server startup, it will delay by this amount before starting to monitor for one of its buddies being down.

FarmHealthSockTimeoutMs

UDP sockets are used to receive Alive messages from all buddies. Each will have this timeout. (You should not have to change this).

FarmHealthMissedUdpMs

The amount of time in milliseconds a server can be down before it is determined down and failover is performed. Some customers may want this to be several minutes to allow a windows update reboot to perform.

FarmHealthUdpPort

Only change this if failover is not working at all and the is* log files indicate there are port conflicts.

These settings are NOT in the database by default. To add them, use the following lines. The last parameter is the default used.

 

dbupdater "insert into Settings (Type,ID,Section,K,V) values ('Global','','Main','FarmHealthStartDelayMs',      '5000')"

dbupdater "insert into Settings (Type,ID,Section,K,V) values ('Global','','Main','FarmHealthSockTimeoutMs',  '1500')"

dbupdater "insert into Settings (Type,ID,Section,K,V) values ('Global','','Main','FarmHealthMissedUdpMs',   '30000')"

dbupdater "insert into Settings (Type,ID,Section,K,V) values ('Global','','Main','FarmHealthUdpPort',           '5045')"