[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20201130182910.49ea8c8c@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>
Date: Mon, 30 Nov 2020 18:29:10 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: George Cherian <george.cherian@...vell.com>
Cc: <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<davem@...emloft.net>, <sgoutham@...vell.com>,
<lcherian@...vell.com>, <gakula@...vell.com>,
<masahiroy@...nel.org>, <willemdebruijn.kernel@...il.com>,
<saeed@...nel.org>, <jiri@...nulli.us>
Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
reporters for NPA
On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote:
> Add health reporters for RVU NPA block.
> NPA Health reporters handle following HW event groups
> - GENERAL events
> - ERROR events
> - RAS events
> - RVU event
> An event counter per event is maintained in SW.
>
> Output:
> # devlink health
> pci/0002:01:00.0:
> reporter hw_npa
> state healthy error 0 recover 0
> # devlink health dump show pci/0002:01:00.0 reporter hw_npa
> NPA_AF_GENERAL:
> Unmap PF Error: 0
> NIX:
> 0: free disabled RX: 0 free disabled TX: 0
> 1: free disabled RX: 0 free disabled TX: 0
> Free Disabled for SSO: 0
> Free Disabled for TIM: 0
> Free Disabled for DPI: 0
> Free Disabled for AURA: 0
> Alloc Disabled for Resvd: 0
> NPA_AF_ERR:
> Memory Fault on NPA_AQ_INST_S read: 0
> Memory Fault on NPA_AQ_RES_S write: 0
> AQ Doorbell Error: 0
> Poisoned data on NPA_AQ_INST_S read: 0
> Poisoned data on NPA_AQ_RES_S write: 0
> Poisoned data on HW context read: 0
> NPA_AF_RVU:
> Unmap Slot Error: 0
You seem to have missed the feedback Saeed and I gave you on v2.
Did you test this with the errors actually triggering? Devlink should
store only one dump, are the counters not going to get out of sync
unless something clears the dump every time it triggers?
Powered by blists - more mailing lists