[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210309145206.43091cdb@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Tue, 9 Mar 2021 14:52:06 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Eran Ben Elisha <eranbe@...dia.com>
Cc: <netdev@...r.kernel.org>, <jiri@...nulli.us>, <saeedm@...dia.com>,
<andrew.gospodarek@...adcom.com>, <jacob.e.keller@...el.com>,
<guglielmo.morandin@...adcom.com>, <eugenem@...com>,
<eranbe@...lanox.com>
Subject: Re: [RFC] devlink: health: add remediation type
On Tue, 9 Mar 2021 16:18:58 +0200 Eran Ben Elisha wrote:
> >> DLH_REMEDY_LOCAL_FIX: associated component will undergo a local
> >> un-harmful fix attempt.
> >> (e.g look for lost interrupt in mlx5e_tx_reporter_timeout_recover())
> >
> > Should we make it more specific? Maybe DLH_REMEDY_STALL: device stall
> > detected, resumed by re-trigerring processing, without reset?
>
> Sounds good.
FWIW I ended up calling it:
+ * @DLH_REMEDY_KICK: device stalled, processing will be re-triggered
> >> The assumption here is that a reporter's recovery function has one
> >> remedy. But it can have few remedies and escalate between them. Did you
> >> consider a bitmask?
> >
> > Yes, I tried to explain in the commit message. If we wanted to support
> > escalating remediations we'd also need separate counters etc. I think
> > having a health reporter per remediation should actually work fairly
> > well.
>
> That would require reporter's recovery procedure failure to trigger
> health flow for other reporter.
> So we can find ourselves with 2 RX reporters, sharing the same diagnose
> and dump callbacks, and each has other recovery flow.
> Seems a bit counterintuitive.
Let's talk about particular cases. Otherwise it's too easy to
misunderstand each other. I can't think of any practical case
where escalation makes sense.
> Maybe, per reporter, exposing a counter per each supported remedy is not
> that bad?
It's a large change to the uAPI, and it makes vendors more likely
to lump different problems under a single reporter (although I take
your point that it may cause over-splitting, but if we have to choose
between the two my preference is "too granular").
Powered by blists - more mailing lists