Virtually every possible effort has been taken to ensure that an IncrediBuild environment retains the highest possible level of reliability and availability, sustaining scenarios such as network disconnects or server/client nodes becoming unavailable during the execution of a distributed job.
Agent Disconnect Recovery
XGE (Xoreax Grid Engine) technology utilizes a transactional model to prevent incomplete execution of build tasks. Accordingly, if an Agent executing a remote task becomes unavailable (for any reason) and is unable to complete the task execution or to send back its output, any output files created during this task's execution are discarded and the task is assigned to another Agent. The distributed job's integrity is thus fully kept.
Dynamic Resource Assignment
In the event that an Agent becomes unavailable during a distributed job's execution, the job will not simply "lose" a computing resource. Taking into account currently running jobs along with the connected Agents' processing power and availability, the Coordinator may dynamically re-assign Agents to running jobs in order to ensure all jobs are utilizing the optimal set of resources.
Backup Coordinator
Since IncrediBuild uses the central "Coordinator" component to handle resource assignment, it is crucial for the system to remain operational even if the Coordinator becomes unavailable. To achieve this, A Backup Coordinator may be set up. The Backup Coordinator assumes control whenever the primary Coordinator becomes unavailable for any reason, alerting users of the condition but otherwise maintaining all system functionality. Once the primary Coordinator is restored, normal operation is resumed.








