lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 5 Nov 2020 13:36:56 +0000
From:   George Cherian <gcherian@...vell.com>
To:     Saeed Mahameed <saeed@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Jiri Pirko <jiri@...dia.com>
CC:     "kuba@...nel.org" <kuba@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Sunil Kovvuri Goutham <sgoutham@...vell.com>,
        "Linu Cherian" <lcherian@...vell.com>,
        Geethasowjanya Akula <gakula@...vell.com>,
        "masahiroy@...nel.org" <masahiroy@...nel.org>,
        "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
 reporters for NIX

Hi Saeed,

Thanks for the review.

> -----Original Message-----
> From: Saeed Mahameed <saeed@...nel.org>
> Sent: Thursday, November 5, 2020 10:39 AM
> To: George Cherian <gcherian@...vell.com>; netdev@...r.kernel.org;
> linux-kernel@...r.kernel.org; Jiri Pirko <jiri@...dia.com>
> Cc: kuba@...nel.org; davem@...emloft.net; Sunil Kovvuri Goutham
> <sgoutham@...vell.com>; Linu Cherian <lcherian@...vell.com>;
> Geethasowjanya Akula <gakula@...vell.com>; masahiroy@...nel.org;
> willemdebruijn.kernel@...il.com
> Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
> reporters for NIX
> 
> On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote:
> > Add health reporters for RVU NPA block.
>                                ^^^ NIX ?
> 
Yes, it's NIX.

> Cc: Jiri
> 
> Anyway, could you please spare some words on what is NPA and what is
> NIX?
> 
> Regarding the reporters names, all drivers register well known generic names
> such as (fw,hw,rx,tx), I don't know if it is a good idea to use vendor specific
> names, if you are reporting for hw/fw units then just use "hw" or "fw" as the
> reporter name and append the unit NPA/NIX to the counter/error names.
Okay. These are hw units, I will rename them as hw_npa/hw_nix.
> 
> > Only reporter dump is supported.
> >
> > Output:
> >  # ./devlink health
> >  pci/0002:01:00.0:
> >    reporter npa
> >      state healthy error 0 recover 0
> >    reporter nix
> >      state healthy error 0 recover 0
> >  # ./devlink  health dump show pci/0002:01:00.0 reporter nix
> >   NIX_AF_GENERAL:
> >          Memory Fault on NIX_AQ_INST_S read: 0
> >          Memory Fault on NIX_AQ_RES_S write: 0
> >          AQ Doorbell error: 0
> >          Rx on unmapped PF_FUNC: 0
> >          Rx multicast replication error: 0
> >          Memory fault on NIX_RX_MCE_S read: 0
> >          Memory fault on multicast WQE read: 0
> >          Memory fault on mirror WQE read: 0
> >          Memory fault on mirror pkt write: 0
> >          Memory fault on multicast pkt write: 0
> >    NIX_AF_RAS:
> >          Poisoned data on NIX_AQ_INST_S read: 0
> >          Poisoned data on NIX_AQ_RES_S write: 0
> >          Poisoned data on HW context read: 0
> >          Poisoned data on packet read from mirror buffer: 0
> >          Poisoned data on packet read from mcast buffer: 0
> >          Poisoned data on WQE read from mirror buffer: 0
> >          Poisoned data on WQE read from multicast buffer: 0
> >          Poisoned data on NIX_RX_MCE_S read: 0
> >    NIX_AF_RVU:
> >          Unmap Slot Error: 0
> >
> 
> Now i am a little bit skeptic here, devlink health reporter infrastructure was
> never meant to deal with dump op only, the main purpose is to
> diagnose/dump and recover.
> 
> especially in your use case where you only report counters, i don't believe
> devlink health dump is a proper interface for this.
These are not counters. These are error interrupts raised by HW blocks.
The count is provided to understand on how frequently the errors are seen.
Error recovery for some of the blocks happen internally. That is the reason,
Currently only dump op is added.
> Many of these counters if not most are data path packet based and maybe
> they should belong to ethtool.

Regards,
-George



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ