Replacing Physical Machines (Manual)
You replace a physical machine (PM), or node, of a dual-node ztC Edge system while the system is running. (If you need to recover the system software on a failed PM instead of replacing the PM hardware, see Recovering a Failed Physical Machine (Manual).)
When you remove and replace a PM, the system completely erases all of the disks in the replacement PM in preparation for a full installation of the Stratus Redundant Linux system software. To install the software, you can allow the system to automatically boot the replacement node from a temporary Preboot Execution Environment (PXE) server on the primary PM. As long as each PM contains a full copy of the most recently installed software kit (as displayed on the Upgrade Kits page of the ztC Console), either PM can initiate the replacement of its partner PM with PXE boot installation. If needed, you can also manually boot the replacement node from USB installation media.
Use one of the following procedures based on the media you want to use for the installation, either PXE or USB installation.
Caution: The replacement procedure deletes any software installed in the host operating system of the PM and all PM configuration information entered before the replacement. After you complete this procedure, you must manually re-install all of your host-level software and reconfigure the PM to match your original settings.
Caution: To prevent data loss, if the system log indicates that manual intervention is necessary to assemble a disk mirror, contact your authorized Stratus service representative for assistance. You may lose valuable data if you force a resynchronization and overwrite the most recent disk in the mirror.
Prerequisite: To request a replacement ztC Edge node, log on to the Stratus Customer Service Portal, expand Customer Support, and click Add Issue. When creating the issue, please have the following information ready:
- Asset ID—Locate the Asset ID for your system in the masthead of the ztC Console window.
- Diagnostic file—Generate and download a diagnostic file on the Support Logs page of the ztC Console, as described in Creating a Diagnostic File. Attach the diagnostic file to the issue that you add in the Service Portal.
A customer service representative will contact you to diagnose the issue and provide a replacement node, if necessary.
Prerequisites: If you want to use a USB medium to install the system software on the replacement PM:
-
Create a bootable USB medium as described in Creating a USB Medium with System Software.
When creating the USB medium, ensure that it contains the most recently installed upgrade kit. For example, if the release shown in the masthead of the ztC Console window is version 1.2.0-550, where 550 is the build number, the kit you select to create the USB medium on the Upgrade Kits page must also be version 1.2.0-550. If the system detects a different build on the replacement PM, it automatically restarts the replacement process, initializes all data on the replacement PM, and uses PXE boot installation to reinstall the most recently installed software kit on the PM with no user interaction.
- Connect a keyboard and monitor to the replacement PM to monitor the installation process and specify settings.
To remove and replace a failed PM
(with PXE boot installation)
Use the following procedure to replace a failed PM and reinstall the system software by using PXE boot installation from the software kit on the primary PM.
- In the ztC Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
-
After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click PXE PM Replace - Initialize All Disks.
Caution: Selecting PXE PM Replace - Initialize All Disks deletes all data on the replacement PM.
-
Select one of the following PXE Settings:
-
Only respond to PXE requests from the current partner node.
Waits for a PXE boot request from the MAC address of the current partner node. Select this option if you are recovering the existing PM by completely wiping and reinstalling it. This process deletes all data on the PM, but restores its current network configuration.
-
Only respond to PXE requests from the following MAC address.
Waits for a PXE boot request from the MAC address that you specify. Select this option if you are replacing the PM with a new PM. Enter the MAC address of the specific network adapter that will initiate PXE boot.
-
Accept PXE requests from any system on priv0.
Waits for a PXE boot request from priv0, the private network that connects the two ztC Edge nodes. Select this option if you are replacing the PM with a new PM, but you do not know the MAC address for the new PM.
- Click Continue to begin the replacement process. The system shuts down and powers off the PM.
-
After the PM is powered off, install the replacement PM, if applicable:
-
Disconnect and remove the old PM, and then install the replacement PM.
-
Reconnect the network cables to their original ports, and then reconnect power.
-
If the PM does not automatically power on, press the power button.
-
The replacement process continues with no user interaction, as follows:
- The replacement PM begins to boot from a PXE server that temporarily runs on the primary node.
- The system automatically deletes all of the data on disks in the replacement PM.
- The replacement PM reboots again and automatically starts the system software installation, which runs from a copy of the installation kit on the primary node.
You do not need to monitor the progress of the software installation or respond to prompts at the physical console of the replacement PM. The replacement process is automated, and it is normal for the PM to display a blank screen for a long period of time during the software installation.
-
When the software installation is complete, the replacement PM reboots from the newly installed system software.
Note: After the system software installation, the replacement PM may take up to 20 minutes to join the system and appear in the ztC Console.
- As the replacement PM joins the system, you can view its activity on the Physical Machines page of the ztC Console. The Activity column displays the PM as (in Maintenance), and then as running after the replacement is complete. The PM automatically exits maintenance mode and begins load balancing the VMs on the system.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the replacement PM to match your original settings.
Note: When the replacement PM exits maintenance mode, the system automatically disables the PXE server on the primary node that was used for the replacement process.
To remove and replace a failed PM
(with
USB installation)
Use the following procedure to replace a failed PM and reinstall the system software by using a USB medium.
- In the ztC Console, click Physical Machines in the left-hand navigation panel.
- Select the appropriate PM (node0 or node1) and then click Work On, which changes the PM’s Overall State to Maintenance Mode and the Activity state to running (in Maintenance).
- After the PM displays running (in Maintenance), click Recover.
-
When prompted to select the type of repair, click USB PM Replace - Initialize All Disks.
Caution: Selecting USB PM Replace - Initialize All Disks deletes all data on the replacement PM.
- Click Continue to begin the replacement process. The system shuts down the PM in preparation for the system software reinstallation.
-
After the PM is powered off, install the replacement PM, if applicable:
-
Disconnect and remove the old PM, and then install the replacement PM. Connect a monitor and keyboard.
-
Reconnect the network cables to their original ports.
- Connect the bootable USB medium to the replacement PM, and then reconnect the power cable. If the PM does not automatically power on, press the power button.
-
As the replacement PM powers on, enter the firmware (UEFI) setup utility. In the Save & Exit menu, under Boot Override, select the UEFI entry for the USB medium to boot from the device one time during the next boot sequence. The PM restarts.
Note: Use the Boot Override property to temporarily change the boot device instead of modifying the persistent BOOT ORDER Priorities in the Boot menu. The top boot priority must remain UEFI Network (default) to support the automated node replacement that is typically performed on ztC Edge systems.
-
Monitor the installation process at the physical console of the replacement PM.
- At the Welcome screen, use the arrow keys to select the country keyboard map for the installation.
-
At the Install or Recovery screen, select Replace PM, Join system: Initialize Data and press Enter. The replacement process continues with no user interaction.
Caution: Selecting Replace PM, Join system: Initialize data deletes all data on the replacement PM.
-
When the software installation is complete, the replacement PM reboots from the newly installed system software.
Note: After the system software installation, the replacement PM may take up to 20 minutes to join the system and appear in the ztC Console.
- As the replacement PM joins the system, you can view its activity on the Physical Machines page of the ztC Console. The Activity column displays the PM as (in Maintenance), and then as running after the replacement is complete. The PM automatically exits maintenance mode and begins load balancing the VMs on the system.
- If applicable, manually reinstall applications and any other host-level software, and reconfigure the replacement PM to match your original settings.