[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240703064347.1929a75b@kernel.org>
Date: Wed, 3 Jul 2024 06:43:47 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
pabeni@...hat.com, michael.chan@...adcom.com
Subject: Re: [PATCH net-next 01/11] net: ethtool: let drivers remove lost
RSS contexts
On Wed, 3 Jul 2024 12:08:36 +0100 Edward Cree wrote:
> On 03/07/2024 00:47, Jakub Kicinski wrote:
> > RSS contexts may get lost from a device, in various extreme circumstances.
> > Specifically if the firmware leaks resources and resets, or crashes and
> > either recovers in partially working state or the crash causes a
> > different FW version to run - creating the context again may fail.
>
> So, I deliberately *didn't* do this, on the grounds that if the user
> fixed things by updating FW and resetting again, their contexts could
> get restored. I suppose big users like Meta will have orchestration
> doing all that work anyway so it doesn't matter.
"We" don't reset FW while workload is running. I'm speculating why bnxt
may lose the contexts. From my perspective if contexts get lost the
machine should get taken out of production and at least power cycled.
> > Drivers should do their absolute best to prevent this from happening.
> > When it does, however, telling user that a context exists, when it can't
> > possibly be used any more is counter productive. Add a helper for
> > drivers to discard contexts. Print an error, in the future netlink
> > notification will also be sent.
>
> Possibility of a netlink notification makes the idea of a broken flag
> a bit more workable imho. But it's up to you which way to go.
Oh, have we talked about this? Now that you mention the broken flag
I recall talking about devlink health reporter.. a while back.
I don't have a preference on how we deal with the lost contexts.
The more obvious we make it to orchestration that the machine is broken
the better. Can you point me to the discussion / describe the broken
flag?
Powered by blists - more mailing lists