Recovering a Failed Physical Machine (Manual)
Recover a physical machine (PM), or node, when it cannot boot or if it fails to become a PM in the ztC Edge system. In some cases, the ztC Console displays the state of a failed PM as Unreachable (Syncing/Evacuating).
To recover a PM, you must reinstall the Stratus Redundant Linux release that the PM has been running. Recovering a failed PM, though, is different from installing the software for the first time. The recovery preserves all data, but it re-creates the /boot and root file systems, re-installs the Stratus Redundant Linux system software, and attempts to connect to the existing system. (If you need to replace the physical PM hardware instead of recovering the system software, see Replacing Physical Machines (Manual).)
To reinstall the system software, you can allow the system to automatically boot the replacement node from a temporary Preboot Execution Environment (PXE) server on the primary PM. As long as each PM contains a full copy of the most recently installed software kit (as displayed on the Upgrade Kits page of the ztC Console), either PM can initiate the recovery of its partner PM with PXE boot installation. If needed, you can also manually boot the replacement node from USB installation media.
Use one of the following procedures based on the media you want to use for the installation, either PXE or USB installation.
Caution: The recovery procedure deletes any software installed in the host operating system of the PM and all PM configuration information entered before the recovery. After you complete this procedure, you must manually re-install all of your host-level software and reconfigure the PM to match your original settings.
Prerequisites:
- Determine which PM you need to recover.
-
If you want to use a USB medium to install the system software on the replacement PM, create a bootable USB medium as described in Creating a USB Medium with System Software.
When creating the USB medium, ensure that it contains the most recently installed upgrade kit. For example, if the release shown in the masthead of the ztC Console window is version 1.2.0-550, where 550 is the build number, the kit you select to create the USB medium on the Upgrade Kits page must also be version 1.2.0-550. If the system detects a different build on the target PM, it automatically overrides the recovery process, initializes all data on the target PM, and uses PXE boot installation to reinstall the most recently installed software kit on the PM with no user interaction.
- If using a USB medium, connect a keyboard and monitor to the replacement PM to monitor the installation process and specify settings.
To recover a PM (with PXE boot installation)
Use the following procedure to recover a PM by using PXE boot installation to reinstall the system software from the software kit on the primary PM.
- In the ztC Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
-
After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click PXE PM Recover - Preserve Data.
Caution: It is important to select PXE PM Recover: Preserve data; otherwise, the installation process may delete data on the target PM.
-
Click Continue to begin the recovery process. The system reboots the target PM in preparation for the system software reinstallation.
-
The recovery process continues with no user interaction, as follows:
- The target PM begins to boot from a PXE server that temporarily runs on the primary node.
- The target PM automatically starts the system software installation, which runs from a copy of the installation kit on the primary node.
- The installation process reinstalls the system software, while preserving all data.
You do not need to monitor the progress of the software installation or respond to prompts at the physical console of the target PM. The recovery process is automated, and it is normal for the PM to display a blank screen for a long period of time during the software installation.
-
When the software installation is complete, the target PM reboots from the newly installed system software.
- As the target PM boots, you can view its activity on the Physical Machines page of the ztC Console. The Activity column displays the PM as (in Maintenance) after the recovery is complete.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the PM to match your original settings.
- When you are ready to bring the target PM online, click Finalize to exit maintenance mode. Verify that both PMs return to the running state and that the PMs finish synchronizing.
Note: When the target PM exits maintenance mode, the system automatically disables the PXE server on the primary node that was used for the recovery process.
To recover a PM (with
USB installation)
Use the following procedure to recover a PM by reinstalling the system software from a USB medium.
- In the ztC Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
- After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click USB PM Recover - Preserve Data.
Caution: It is important to select USB PM Recover: Preserve data; otherwise, the installation process may delete data on the target PM.
- Click Continue to begin the recovery process. The system shuts down the target PM in preparation for the system software reinstallation.
-
Connect the bootable USB medium to the target PM, and then manually power on the PM.
-
As the target PM powers on, enter the firmware (UEFI) setup utility. In the Save & Exit menu, under Boot Override, select the UEFI entry for the USB medium to boot from the device one time during the next boot sequence. The PM restarts.
Note: Use the Boot Override property to temporarily change the boot device instead of modifying the persistent BOOT ORDER Priorities in the Boot menu. The top boot priority must remain UEFI Network (default) to support the automated node replacement that is typically performed on ztC Edge systems.
- Monitor the installation process at the physical console of the target PM.
- At the Welcome screen, use the arrow keys to select the country keyboard map for the installation.
-
At the Install or Recovery screen, select Recover PM, Join system: Preserving data and press Enter. The recovery process continues with no user interaction.
Caution: It is important to select Recover PM, Join system: Preserving data; otherwise, the installation process may delete data on the target PM.
-
When the software installation is complete, the target PM reboots from the newly installed system software.
- As the target PM boots, you can view its activity on the Physical Machines page of the ztC Console. The Activity column displays the PM as (in Maintenance) after the recovery is complete.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the PM to match your original settings.
- When you are ready to bring the target PM online, click Finalize to exit maintenance mode. Verify that both PMs return to the running state and that the PMs finish synchronizing.