[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230613071959.GU12152@unreal>
Date: Tue, 13 Jun 2023 10:19:59 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Asmaa Mnebhi <asmaa@...dia.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, cai.huoqing@...ux.dev, brgl@...ev.pl,
chenhao288@...ilicon.com, huangguangbin2@...wei.com,
David Thompson <davthompson@...dia.com>
Subject: Re: [PATCH net v2 1/1] mlxbf_gige: Fix kernel panic at shutdown
On Mon, Jun 12, 2023 at 05:05:21PM +0300, Vladimir Oltean wrote:
> On Mon, Jun 12, 2023 at 04:38:53PM +0300, Leon Romanovsky wrote:
> > On Mon, Jun 12, 2023 at 04:28:41PM +0300, Vladimir Oltean wrote:
> > > The sequence of operations is:
> > >
> > > * device_shutdown() walks the devices_kset backwards, thus shutting down
> > > children before parents
> > > * .shutdown() method of child gets called
> > > * .shutdown() method of parent gets called
> > > * parent implements .shutdown() as .remove()
> > > * the parent's .remove() logic calls device_del() on its children
> > > * .remove() method of child gets called
> >
> > But both child and parent are locked so they parent can't call to
> > child's remove while child is performing shutdown.
>
> Please view the call chain I've posted in an email client capable of
> showing the indentation correctly.
Thanks for the suggestion, right now I'm using mutt and lore to read
emails. Should I use another email client?
> The 2 lines:
>
> * .shutdown() method of child gets called
> * .shutdown() method of parent gets called
>
> have the same level of indentation because they occur sequentially
> within the same function.
Right
>
> This means 2 things:
>
> 1. when the parent runs its .shutdown(), the .shutdown() of the child
> has already finished
Right, it is done to make sure we release childs before parents.
>
> 2. device_shutdown() only locks "dev" and "dev->parent" for the duration
> of the "dev->driver->shutdown(dev)" procedure. However, the situation
> that you deem impossible due to locking is the dev->driver->shutdown(dev)
> of the parent device. That parent wasn't found through any dev->parent
> pointer, instead it is just another device in the devices_kset list.
> The logic of locking "dev" and "dev->parent" there won't help, since
> we would be locking the parent and the parent of the parent. This
> will obviously not prevent the original parent from calling any
> method of the original child - we're already one step higher in the
> hierarchy.
But once child finishes device_shutdown(), it will be removed from devices_kset
list and dev->driver should be NULL at that point for the child. In driver core,
dev->driver is the marker if driver is bound. It means parent/bus won't/shouldn't
call to anything driver related to child which doesn't have valid dev->driver pointer.
>
> So your objection above does not really apply.
We have a different opinion here.
>
> > BTW, I read the same device_shutdown() function before my first reply
> > and came to different conclusions from you.
>
> Well, you could try to experiment with putting ".shutdown = xxx_remove,"
> in some bus drivers and see what happens.
Like I said, this is a bug in bus logic which allows calls to device
which doesn't have driver bound to it.
>
> Admittedly it was a few years ago, but I did study this problem and I
> did have to fix real issues reported by real people based on the above
> observations (which here are reproduced only from memory):
> https://lore.kernel.org/all/20210920214209.1733768-2-vladimir.oltean@nxp.com/
I believe you, just think that behaviour found in i2c/spi isn't how
device model works.
Thanks
Powered by blists - more mailing lists