[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201105124204.4dbea042@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Thu, 5 Nov 2020 12:42:04 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Saeed Mahameed <saeed@...nel.org>
Cc: George Cherian <gcherian@...vell.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Jiri Pirko <jiri@...dia.com>,
"davem@...emloft.net" <davem@...emloft.net>,
Sunil Kovvuri Goutham <sgoutham@...vell.com>,
Linu Cherian <lcherian@...vell.com>,
Geethasowjanya Akula <gakula@...vell.com>,
"masahiroy@...nel.org" <masahiroy@...nel.org>,
"willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
reporters for NIX
On Thu, 05 Nov 2020 11:23:54 -0800 Saeed Mahameed wrote:
> If you report an error without recovering, devlink health will report a
> bad device state
>
> $ ./devlink health
> pci/0002:01:00.0:
> reporter npa
> state error error 1 recover 0
Actually, the counter in the driver is unnecessary, right? Devlink
counts errors.
> So you will need to implement an empty recover op.
> so if these events are informational only and they don't indicate
> device health issues, why would you report them via devlink health ?
I see devlink health reporters a way of collecting errors reports which
for the most part are just shared with the vendor. IOW firmware (or
hardware) bugs.
Obviously as you say without recover and additional context in the
report the value is quite diminished. But _if_ these are indeed "report
me to the vendor" kind of events then at least they should use our
current mechanics for such reports - which is dl-health.
Without knowing what these events are it's quite hard to tell if
devlink health is an overkill or counter is sufficient.
Either way - printing these to the logs is definitely the worst choice
:)
Powered by blists - more mailing lists