华为防火墙单板故障

防火墙单板故障按如下方法排查和恢复。

序号

排查步骤

处理思路

1

主主控板故障,这里主要介绍两类故障,一类是主主控板状态异常,另外一类是内存使用率已经达到或者超过90%,都会影响主主控正常工作,可以使用如下方式进行故障检查:

  1. 执行命令display device,检查主主控板MPU(Master)硬件状态是否Abnormal。
    <sysname> display device
     xxxx's Device status:
     Slot #    Type       Online    Register      Status
    Primary
     - - - - - - - - - - - - - - - - - - - - -  - - - - - 
     1         SPU        Present   Registered    Normal
    NA
     2         LPU        Present   Registered    Normal
    NA         
     4         MPU        Present   Registered    Normal
          Slave     
     5         MPU        Present   NA            Abnormal
    Master   
     6         CLK        Present   Registered    Normal
    Master     
     8         PWR        Present   Registered    Normal
    NA         
     9         PWR        Present   Registered    Normal
    NA
     10        FAN        Present   Registered    Normal
    NA
  2. 执行命令display health,检查主主控板MPU(Master)内存使用率是否超过90%。
     <sysname> display health
      Slot                CPU Usage  Memory Usage(Used/Total)
     ------------------------------------------------------------
      5 MPU(Master)          6%           95%    1575MB/1712MB 
      1 SPU-CPU0             1%           35%   5517MB/15703MB
      1 SPU-CPU1             1%           34%   5471MB/15703MB
      1 SPU-CPU2             1%           27%   4277MB/15703MB
      1 SPU-CPU3             1%           27%   4282MB/15703MB
      1 SPU-CPU6             5%           16%     56MB/338MB  
      2 LPU                 20%           22%    468MB/2098MB 
     ------------------------------------------------------------
     SPU CPU Average Utilization: 1 %

主备主控倒换:

<sysname> system-view 
[sysname] slave switchover enable 
[sysname] slave switchover 
Caution!!! Confirm switch slave to master[Y/N] Y
说明:

当进行主备主控倒换时,原来处于主用状态的主控板将重启。

2

主机SPU/LPU单板反复复位:双机组网中,SPU/LPU单板复位或者某个CPU复位会导致HRP优先级降低,从而触发主备倒换,SPU/LPU单板复位完成后又会触发主备抢占,如果单板处于反复复位的状态,主备状态也在不断变化,会导致业务异常。

在故障管理视图下执行命令display alarm all history verbose,检查是否存在单板或者CPU反复复位的情况。

[sysname] fm
[sysname-fm] display alarm all history verbose
----------------------------------------------------------------------------    
Index  Level      Date      Time        ErrCode                      Info       
                                                                          
1      Major      18-07-09  05:13:43    0x11010a09    SPU 7/1 : LINK of ILK is  
                                                       abnormal, Resume[OID:1.  
                                                      3.6.1.4.1.2011.5.25.219.  
                                                      2.2.5,EntCode:140289]     
2      Major      18-07-09  05:13:43    0x11010a09    SPU 7/0 : LINK of ILK is  
                                                       abnormal, Resume[OID:1.  
                                                      3.6.1.4.1.2011.5.25.219.  
                                                      2.2.5,EntCode:140289]     
3      Major      18-07-09  05:18:27    0x11010a09    SPU 7/1 : LINK of ILK is  
                                                       abnormal[OID:1.3.6.1.4.  
                                                      1.2011.5.25.219.2.2.5,En  
                                                      tCode:140289]             
4      Major      18-07-09  05:18:27    0x11010a07    SPU 7/1 : Max3997_SubNod  
                                                      e of MAX3997 is abnormal  
                                                      [OID:1.3.6.1.4.1.2011.5.  
                                                      25.219.2.2.5,EntCode:132  
                                                      613]                      
5      Major      18-07-09  05:18:28    0x11010a09    SPU 7/0 : LINK of ILK is  
                                                       abnormal[OID:1.3.6.1.4.  
                                                      1.2011.5.25.219.2.2.5,En  
                                                      tCode:140289]             
6      Major      18-07-09  05:18:28    0x11010a07    SPU 7/0 : Max3997_SubNod  
                                                      e of MAX3997 is abnormal  
                                                      [OID:1.3.6.1.4.1.2011.5.  
                                                      25.219.2.2.5,EntCode:132  
                                                      613]                      
----------------------------------------------------------------------------

手工主备倒换,并将故障单板进行下电隔离处理:

须知:

主备切换的前提是主备机配置和软硬件状态一致,否则切换后可能导致业务异常。

  1. 执行命令display hrp state verbose,查看主备机优先级,如果优先级相同,主机执行命令

    hrp switch standby进行切换,如果主机优先级比备机高,需要hrp track

    一些状态为down的接口,每监控一个down的接口,优先级减2。

    # 查看主备机优先级。

    HRP_M<sysname> display hrp state verbose 
    Role: active, peer: standby  
    Running priority: 45000, peer: 45000  
    Backup channel usage: 0.00%  
    Stable time: 0 days, 0 hours, 11 minutes

    # 将防火墙切换为备用设备。

    HRP_M<sysname> system-view
    HRP_M[sysname] hrp switch standby
  2. 执行命令power off,将故障单板下电隔离。

    # 把防火墙上7号槽位的LPU板下电。

    HRP_M<sysname> system-view
    HRP_M<sysname> power off slot 7 
    Certain hardware faults may cause the slot 7 not to register.    
    //当前槽位存在硬件告警时才会打印该信息 
    Info:Caution!!! This command may affect operation by wrong use,
     please carefully use it with HUAWEI engineer's direction. Are you sure to do this operation?[Y/N]?
阅读剩余
THE END