[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CO1PR11MB50894F87FB9571AD4C2FF4A2D6019@CO1PR11MB5089.namprd11.prod.outlook.com>
Date: Thu, 10 Nov 2022 21:13:56 +0000
From: "Keller, Jacob E" <jacob.e.keller@...el.com>
To: Leon Romanovsky <leon@...nel.org>, Jakub Kicinski <kuba@...nel.org>
CC: ivecera <ivecera@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"sassmann@...hat.com" <sassmann@...hat.com>,
"Piotrowski, Patryk" <patryk.piotrowski@...el.com>,
SlawomirX Laba <slawomirx.laba@...el.com>,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
open list <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net] iavf: Do not restart Tx queues after reset task
failure
> -----Original Message-----
> From: Leon Romanovsky <leon@...nel.org>
> Sent: Thursday, November 10, 2022 1:07 PM
> To: Jakub Kicinski <kuba@...nel.org>
> Cc: ivecera <ivecera@...hat.com>; Keller, Jacob E <jacob.e.keller@...el.com>;
> netdev@...r.kernel.org; sassmann@...hat.com; Piotrowski, Patryk
> <patryk.piotrowski@...el.com>; SlawomirX Laba <slawomirx.laba@...el.com>;
> Brandeburg, Jesse <jesse.brandeburg@...el.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; David S. Miller <davem@...emloft.net>; Eric
> Dumazet <edumazet@...gle.com>; Paolo Abeni <pabeni@...hat.com>; intel-
> wired-lan@...ts.osuosl.org; open list <linux-kernel@...r.kernel.org>
> Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
>
> On Thu, Nov 10, 2022 at 12:24:18PM -0800, Jakub Kicinski wrote:
> > On Thu, 10 Nov 2022 19:07:02 +0200 Leon Romanovsky wrote:
> > > > > Yes I think you're right. A ton of people check it without the
> > > > > lock but I think thats not strictly safe. Is dev_close safe to
> > > > > call when netif_running is false? Why not just remove the check
> > > > > and always call dev_close then.
> > > >
> > > > Check for a bit value (like netif_runnning()) is much cheaper than
> > > > unconditionally taking global lock like RTNL.
> > >
> > > This cheap operation is racy and performed in non-performance
> > > critical path.
> >
> > To be clear - the rtnl_lock around the entire if is still racy
> > in the grand scheme of things, no? What's stopping someone from
> > bringing the device right back up after you drop the lock?
>
I think the reset flow uses netif_device_detach() to detach the device before reset. Is that enough to prevent other calls to dev_close outside the driver?
Also, perhaps we should avoid re-attaching the device if the reset fails...
> I want to believe what there is some sort of state machine that won't
> allow simple toggling of dev_close/dev_open. If it doesn't, rtnl_lock
> users should audit their code for possible toggling right after that
> lock is dropped.
>
I think the key is that normally dev_open and dev_close are done by iproute2 netlink messages? so if we close it, its possible userspace could immediately open it.. though I think that isn't allowed while the device is detached, so we should stay closed until we re-attach, at which point dev_open can fail by noticing the VF is disabled...
> Anyway, this discussion reminds me our devl_lock debate where we had
> completely opposite views if rtnl_lock model is the right one.
> https://lore.kernel.org/netdev/20211101073259.33406da3@kicinski-fedora-
> PC1C0HJN/
>
> Let's not start argue again, we had enough back then. :)
>
> Thanks
Powered by blists - more mailing lists