[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230612140521.tzhgliaok5u3q67o@skbuf>
Date: Mon, 12 Jun 2023 17:05:21 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Asmaa Mnebhi <asmaa@...dia.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, cai.huoqing@...ux.dev, brgl@...ev.pl,
chenhao288@...ilicon.com, huangguangbin2@...wei.com,
David Thompson <davthompson@...dia.com>
Subject: Re: [PATCH net v2 1/1] mlxbf_gige: Fix kernel panic at shutdown
On Mon, Jun 12, 2023 at 04:38:53PM +0300, Leon Romanovsky wrote:
> On Mon, Jun 12, 2023 at 04:28:41PM +0300, Vladimir Oltean wrote:
> > The sequence of operations is:
> >
> > * device_shutdown() walks the devices_kset backwards, thus shutting down
> > children before parents
> > * .shutdown() method of child gets called
> > * .shutdown() method of parent gets called
> > * parent implements .shutdown() as .remove()
> > * the parent's .remove() logic calls device_del() on its children
> > * .remove() method of child gets called
>
> But both child and parent are locked so they parent can't call to
> child's remove while child is performing shutdown.
Please view the call chain I've posted in an email client capable of
showing the indentation correctly. The 2 lines:
* .shutdown() method of child gets called
* .shutdown() method of parent gets called
have the same level of indentation because they occur sequentially
within the same function.
This means 2 things:
1. when the parent runs its .shutdown(), the .shutdown() of the child
has already finished
2. device_shutdown() only locks "dev" and "dev->parent" for the duration
of the "dev->driver->shutdown(dev)" procedure. However, the situation
that you deem impossible due to locking is the dev->driver->shutdown(dev)
of the parent device. That parent wasn't found through any dev->parent
pointer, instead it is just another device in the devices_kset list.
The logic of locking "dev" and "dev->parent" there won't help, since
we would be locking the parent and the parent of the parent. This
will obviously not prevent the original parent from calling any
method of the original child - we're already one step higher in the
hierarchy.
So your objection above does not really apply.
> BTW, I read the same device_shutdown() function before my first reply
> and came to different conclusions from you.
Well, you could try to experiment with putting ".shutdown = xxx_remove,"
in some bus drivers and see what happens.
Admittedly it was a few years ago, but I did study this problem and I
did have to fix real issues reported by real people based on the above
observations (which here are reproduced only from memory):
https://lore.kernel.org/all/20210920214209.1733768-2-vladimir.oltean@nxp.com/
Powered by blists - more mailing lists