[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1627a52089f421c8594c9a99fd9137caa6485572.camel@kernel.org>
Date:   Wed, 04 Nov 2020 21:08:52 -0800
From:   Saeed Mahameed <saeed@...nel.org>
To:     George Cherian <george.cherian@...vell.com>,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jiri Pirko <jiri@...dia.com>
Cc:     kuba@...nel.org, davem@...emloft.net, sgoutham@...vell.com,
        lcherian@...vell.com, gakula@...vell.com, masahiroy@...nel.org,
        willemdebruijn.kernel@...il.com
Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
 reporters for NIX
On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote:
> Add health reporters for RVU NPA block.
                               ^^^ NIX ?
Cc: Jiri 
Anyway, could you please spare some words on what is NPA and what is
NIX?
Regarding the reporters names, all drivers register well known generic
names such as (fw,hw,rx,tx), I don't know if it is a good idea to use
vendor specific names, if you are reporting for hw/fw units then just
use "hw" or "fw" as the reporter name and append the unit NPA/NIX to
the counter/error names.
> Only reporter dump is supported.
> 
> Output:
>  # ./devlink health
>  pci/0002:01:00.0:
>    reporter npa
>      state healthy error 0 recover 0
>    reporter nix
>      state healthy error 0 recover 0
>  # ./devlink  health dump show pci/0002:01:00.0 reporter nix
>   NIX_AF_GENERAL:
>          Memory Fault on NIX_AQ_INST_S read: 0
>          Memory Fault on NIX_AQ_RES_S write: 0
>          AQ Doorbell error: 0
>          Rx on unmapped PF_FUNC: 0
>          Rx multicast replication error: 0
>          Memory fault on NIX_RX_MCE_S read: 0
>          Memory fault on multicast WQE read: 0
>          Memory fault on mirror WQE read: 0
>          Memory fault on mirror pkt write: 0
>          Memory fault on multicast pkt write: 0
>    NIX_AF_RAS:
>          Poisoned data on NIX_AQ_INST_S read: 0
>          Poisoned data on NIX_AQ_RES_S write: 0
>          Poisoned data on HW context read: 0
>          Poisoned data on packet read from mirror buffer: 0
>          Poisoned data on packet read from mcast buffer: 0
>          Poisoned data on WQE read from mirror buffer: 0
>          Poisoned data on WQE read from multicast buffer: 0
>          Poisoned data on NIX_RX_MCE_S read: 0
>    NIX_AF_RVU:
>          Unmap Slot Error: 0
> 
Now i am a little bit skeptic here, devlink health reporter
infrastructure was never meant to deal with dump op only, the main
purpose is to diagnose/dump and recover.
especially in your use case where you only report counters, i don't
believe devlink health dump is a proper interface for this.
Many of these counters if not most are data path packet based and maybe
they should belong to ethtool.
Powered by blists - more mailing lists
 
