Blog

Troubleshooting Guide for TQ8x00 MetroX®-2 HDR 200 Gb/s InfiniBand Switch Systems

This guide helps identify and resolve common issues on MetroX-2 HDR 200 Gb/s InfiniBand Switch Systems, focusing on LED indicators and boot failures.

1.System Status LED is blinking for more than 5 minutes
Cause:
The MLNX-OS software did not boot properly; only firmware is running.
Solution:
Connect to the system through the console port and check the software status.
If MLNX-OS did not load properly, contact your Field Application Engineer (FAE) for assistance.

2.System Status LED is red
Cause:
A critical system fault has occurred, such as:
CPU error or corrupted firmware
Over-temperature condition
Solution:
Check environmental conditions — ensure room temperature and adequate airflow.
If the problem persists, contact technical support.

3.Fan Status LED is red
Cause:
A fan fault is detected.
Solution:
Verify that the fan is fully inserted and unobstructed.
Replace the fan FRU (Field Replaceable Unit) if necessary.

4.Front PSU Status LED is red
Cause:
A power-supply-unit (PSU) issue is present.
Solution:
Check or replace the power cable.
Replace the PSU FRU if required.

5.InfiniBand activity LED does not light up
Cause:
The Subnet Manager (SM) is not active in the fabric.
Solution:
Verify that an SM is running in your InfiniBand network.

6. System Boot Failure
Symptom:
The last software upgrade failed on an x86-based system.
Solution:
(1) Connect an RS232 console cable to a laptop.
(2) Press the system’s reset button.
(3) During boot, press the Arrow Up or Arrow Down key to open the GRUB menu.
You will see an example similar to:

(4) Use the arrow keys to select a previous working image, then press Enter to boot.

7. Preventive Recommendations
Keep the switch in a temperature-controlled environment.
Regularly check that fans and PSUs are fully seated.
Ensure firmware and MLNX-OS are up to date.
Maintain a console connection during upgrades to monitor boot progress.

8. Summary Table

Troubleshooting
WhatsApp TARLUZ