lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 05 Nov 2020 11:23:54 -0800
From:   Saeed Mahameed <saeed@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>,
        George Cherian <gcherian@...vell.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Jiri Pirko <jiri@...dia.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Sunil Kovvuri Goutham <sgoutham@...vell.com>,
        Linu Cherian <lcherian@...vell.com>,
        Geethasowjanya Akula <gakula@...vell.com>,
        "masahiroy@...nel.org" <masahiroy@...nel.org>,
        "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
 reporters for NIX

On Thu, 2020-11-05 at 09:07 -0800, Jakub Kicinski wrote:
> On Thu, 5 Nov 2020 13:36:56 +0000 George Cherian wrote:
> > > Now i am a little bit skeptic here, devlink health reporter
> > > infrastructure was
> > > never meant to deal with dump op only, the main purpose is to
> > > diagnose/dump and recover.
> > > 
> > > especially in your use case where you only report counters, i
> > > don't believe
> > > devlink health dump is a proper interface for this.  
> > These are not counters. These are error interrupts raised by HW
> > blocks.
> > The count is provided to understand on how frequently the errors
> > are seen.
> > Error recovery for some of the blocks happen internally. That is
> > the reason,
> > Currently only dump op is added.
> 
> The previous incarnation was printing messages to logs, so I assume
> these errors are expected to be relatively low rate.
> 
> The point of using devlink health was that you can generate a netlink
> notification when the error happens. IOW you need some calls to
> devlink_health_report() or such.
> 
> At least that's my thinking, others may disagree.

If you report an error without recovering, devlink health will report a
bad device state

$ ./devlink health
   pci/0002:01:00.0:
     reporter npa
       state error error 1 recover 0

So you will need to implement an empty recover op.
so if these events are informational only and they don't indicate
device health issues, why would you report them via devlink health ?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ