lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240703064347.1929a75b@kernel.org>
Date: Wed, 3 Jul 2024 06:43:47 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
 pabeni@...hat.com, michael.chan@...adcom.com
Subject: Re: [PATCH net-next 01/11] net: ethtool: let drivers remove lost
 RSS contexts

On Wed, 3 Jul 2024 12:08:36 +0100 Edward Cree wrote:
> On 03/07/2024 00:47, Jakub Kicinski wrote:
> > RSS contexts may get lost from a device, in various extreme circumstances.
> > Specifically if the firmware leaks resources and resets, or crashes and
> > either recovers in partially working state or the crash causes a
> > different FW version to run - creating the context again may fail.  
> 
> So, I deliberately *didn't* do this, on the grounds that if the user
>  fixed things by updating FW and resetting again, their contexts could
>  get restored.  I suppose big users like Meta will have orchestration
>  doing all that work anyway so it doesn't matter.

"We" don't reset FW while workload is running. I'm speculating why bnxt
may lose the contexts. From my perspective if contexts get lost the
machine should get taken out of production and at least power cycled.

> > Drivers should do their absolute best to prevent this from happening.
> > When it does, however, telling user that a context exists, when it can't
> > possibly be used any more is counter productive. Add a helper for
> > drivers to discard contexts. Print an error, in the future netlink
> > notification will also be sent.  
>
> Possibility of a netlink notification makes the idea of a broken flag
>  a bit more workable imho.  But it's up to you which way to go.

Oh, have we talked about this? Now that you mention the broken flag 
I recall talking about devlink health reporter.. a while back.

I don't have a preference on how we deal with the lost contexts.
The more obvious we make it to orchestration that the machine is broken
the better. Can you point me to the discussion / describe the broken
flag?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ