lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 30 Nov 2020 18:29:10 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     George Cherian <george.cherian@...vell.com>
Cc:     <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <davem@...emloft.net>, <sgoutham@...vell.com>,
        <lcherian@...vell.com>, <gakula@...vell.com>,
        <masahiroy@...nel.org>, <willemdebruijn.kernel@...il.com>,
        <saeed@...nel.org>, <jiri@...nulli.us>
Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
 reporters for NPA

On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote:
> Add health reporters for RVU NPA block.
> NPA Health reporters handle following HW event groups
>  - GENERAL events
>  - ERROR events
>  - RAS events
>  - RVU event
> An event counter per event is maintained in SW.
> 
> Output:
>  # devlink health
>  pci/0002:01:00.0:
>    reporter hw_npa
>      state healthy error 0 recover 0
>  # devlink  health dump show pci/0002:01:00.0 reporter hw_npa
>  NPA_AF_GENERAL:
>         Unmap PF Error: 0
>         NIX:
>         0: free disabled RX: 0 free disabled TX: 0
>         1: free disabled RX: 0 free disabled TX: 0
>         Free Disabled for SSO: 0
>         Free Disabled for TIM: 0
>         Free Disabled for DPI: 0
>         Free Disabled for AURA: 0
>         Alloc Disabled for Resvd: 0
>   NPA_AF_ERR:
>         Memory Fault on NPA_AQ_INST_S read: 0
>         Memory Fault on NPA_AQ_RES_S write: 0
>         AQ Doorbell Error: 0
>         Poisoned data on NPA_AQ_INST_S read: 0
>         Poisoned data on NPA_AQ_RES_S write: 0
>         Poisoned data on HW context read: 0
>   NPA_AF_RVU:
>         Unmap Slot Error: 0

You seem to have missed the feedback Saeed and I gave you on v2.

Did you test this with the errors actually triggering? Devlink should
store only one dump, are the counters not going to get out of sync
unless something clears the dump every time it triggers?

Powered by blists - more mailing lists