lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230613071959.GU12152@unreal>
Date: Tue, 13 Jun 2023 10:19:59 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
	Asmaa Mnebhi <asmaa@...dia.com>, davem@...emloft.net,
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
	netdev@...r.kernel.org, cai.huoqing@...ux.dev, brgl@...ev.pl,
	chenhao288@...ilicon.com, huangguangbin2@...wei.com,
	David Thompson <davthompson@...dia.com>
Subject: Re: [PATCH net v2 1/1] mlxbf_gige: Fix kernel panic at shutdown

On Mon, Jun 12, 2023 at 05:05:21PM +0300, Vladimir Oltean wrote:
> On Mon, Jun 12, 2023 at 04:38:53PM +0300, Leon Romanovsky wrote:
> > On Mon, Jun 12, 2023 at 04:28:41PM +0300, Vladimir Oltean wrote:
> > > The sequence of operations is:
> > > 
> > > * device_shutdown() walks the devices_kset backwards, thus shutting down
> > >   children before parents
> > >   * .shutdown() method of child gets called
> > >   * .shutdown() method of parent gets called
> > >     * parent implements .shutdown() as .remove()
> > >       * the parent's .remove() logic calls device_del() on its children
> > >         * .remove() method of child gets called
> > 
> > But both child and parent are locked so they parent can't call to
> > child's remove while child is performing shutdown.
> 
> Please view the call chain I've posted in an email client capable of
> showing the indentation correctly. 

Thanks for the suggestion, right now I'm using mutt and lore to read
emails. Should I use another email client?

> The 2 lines:
> 
>    * .shutdown() method of child gets called
>    * .shutdown() method of parent gets called
> 
> have the same level of indentation because they occur sequentially
> within the same function.

Right

> 
> This means 2 things:
> 
> 1. when the parent runs its .shutdown(), the .shutdown() of the child
>    has already finished

Right, it is done to make sure we release childs before parents.

> 
> 2. device_shutdown() only locks "dev" and "dev->parent" for the duration
>    of the "dev->driver->shutdown(dev)" procedure. However, the situation
>    that you deem impossible due to locking is the dev->driver->shutdown(dev)
>    of the parent device. That parent wasn't found through any dev->parent
>    pointer, instead it is just another device in the devices_kset list.
>    The logic of locking "dev" and "dev->parent" there won't help, since
>    we would be locking the parent and the parent of the parent. This
>    will obviously not prevent the original parent from calling any
>    method of the original child - we're already one step higher in the
>    hierarchy.

But once child finishes device_shutdown(), it will be removed from devices_kset
list and dev->driver should be NULL at that point for the child. In driver core,
dev->driver is the marker if driver is bound. It means parent/bus won't/shouldn't
call to anything driver related to child which doesn't have valid dev->driver pointer.

> 
> So your objection above does not really apply.

We have a different opinion here.

> 
> > BTW, I read the same device_shutdown() function before my first reply
> > and came to different conclusions from you.
> 
> Well, you could try to experiment with putting ".shutdown = xxx_remove,"
> in some bus drivers and see what happens.

Like I said, this is a bug in bus logic which allows calls to device
which doesn't have driver bound to it.

> 
> Admittedly it was a few years ago, but I did study this problem and I
> did have to fix real issues reported by real people based on the above
> observations (which here are reproduced only from memory):
> https://lore.kernel.org/all/20210920214209.1733768-2-vladimir.oltean@nxp.com/

I believe you, just think that behaviour found in i2c/spi isn't how
device model works.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ