[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1627a52089f421c8594c9a99fd9137caa6485572.camel@kernel.org>
Date: Wed, 04 Nov 2020 21:08:52 -0800
From: Saeed Mahameed <saeed@...nel.org>
To: George Cherian <george.cherian@...vell.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Jiri Pirko <jiri@...dia.com>
Cc: kuba@...nel.org, davem@...emloft.net, sgoutham@...vell.com,
lcherian@...vell.com, gakula@...vell.com, masahiroy@...nel.org,
willemdebruijn.kernel@...il.com
Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
reporters for NIX
On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote:
> Add health reporters for RVU NPA block.
^^^ NIX ?
Cc: Jiri
Anyway, could you please spare some words on what is NPA and what is
NIX?
Regarding the reporters names, all drivers register well known generic
names such as (fw,hw,rx,tx), I don't know if it is a good idea to use
vendor specific names, if you are reporting for hw/fw units then just
use "hw" or "fw" as the reporter name and append the unit NPA/NIX to
the counter/error names.
> Only reporter dump is supported.
>
> Output:
> # ./devlink health
> pci/0002:01:00.0:
> reporter npa
> state healthy error 0 recover 0
> reporter nix
> state healthy error 0 recover 0
> # ./devlink health dump show pci/0002:01:00.0 reporter nix
> NIX_AF_GENERAL:
> Memory Fault on NIX_AQ_INST_S read: 0
> Memory Fault on NIX_AQ_RES_S write: 0
> AQ Doorbell error: 0
> Rx on unmapped PF_FUNC: 0
> Rx multicast replication error: 0
> Memory fault on NIX_RX_MCE_S read: 0
> Memory fault on multicast WQE read: 0
> Memory fault on mirror WQE read: 0
> Memory fault on mirror pkt write: 0
> Memory fault on multicast pkt write: 0
> NIX_AF_RAS:
> Poisoned data on NIX_AQ_INST_S read: 0
> Poisoned data on NIX_AQ_RES_S write: 0
> Poisoned data on HW context read: 0
> Poisoned data on packet read from mirror buffer: 0
> Poisoned data on packet read from mcast buffer: 0
> Poisoned data on WQE read from mirror buffer: 0
> Poisoned data on WQE read from multicast buffer: 0
> Poisoned data on NIX_RX_MCE_S read: 0
> NIX_AF_RVU:
> Unmap Slot Error: 0
>
Now i am a little bit skeptic here, devlink health reporter
infrastructure was never meant to deal with dump op only, the main
purpose is to diagnose/dump and recover.
especially in your use case where you only report counters, i don't
believe devlink health dump is a proper interface for this.
Many of these counters if not most are data path packet based and maybe
they should belong to ethtool.
Powered by blists - more mailing lists