Blog

Troubleshooting for the NVIDIA Spectrum-4 SN5000 2U Switch System

This document outlines key troubleshooting procedures for the NVIDIA Spectrum-4 SN5000 switch. Refer to the following sections to address specific issues indicated by the system’s status LEDs.

  1. Symptom: System Status LED is blinking for more than 5 minutes

Cause: This pattern indicates that the main operating system (e.g., Cumulus Linux) has failed to boot properly. The system is running only its basic firmware.

Solution: Physical inspection is not enough. You need to connect to the device directly using a console cable. Access the system via the console port to check the software status and boot logs for errors. The specific procedure for Cumulus Linux can be found in its official „Monitoring and Troubleshooting“ documentation.

  1. Symptom: System Status LED is amber

Cause: A solid amber light signals a critical hardware fault. This could be a serious internal error such as a CPU malfunction or corrupted firmware. Alternatively, it may indicate that the system is operating at an dangerously high temperature.

Solution: First, check the environmental conditions. Ensure the server room or closet is within the recommended temperature range and that there is adequate airflow around the device. If the temperature is normal, the issue is likely an internal hardware fault requiring advanced technical support.

  1. Symptom: Fan Status LED is Amber

Cause: This warns of a potential cooling system failure.

Solution: 

  • Physical Check: Ensure the fan module is fully seated in its tray and that no cables or obstructions are blocking the air intake or exhaust.
  • Replacement: If the fan is properly installed but the error persists, the fan itself has likely failed. It should be replaced with a new Fan Field-Replaceable Unit (FRU).
  1. Symptom: PSU (Power Supply Unit) Status LED is Amber

Cause: This indicates a problem with the power supply.

Solution:

  • Check the Cable: The simplest fix is often to check the power cable at both ends (the outlet and the PSU) for a secure connection. Try replacing the cable with a known-good one.
  • Replace the PSU: If the cable is not the issue, the power supply unit itself is likely faulty and should be replaced.
  1. Symptom: Software upgrade failed on x86-based systems.

Cause: A failed software upgrade can leave the system in an unbootable state.

Solution: This is a software-related issue. The resolution steps are specific to the Cumulus Linux operating system and its installation/upgrade procedures. You should consult the dedicated „Monitoring and Troubleshooting“ guide for detailed instructions on recovering from a failed upgrade.

Troubleshooting
WhatsApp TARLUZ