lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 12 Jun 2023 17:05:21 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
	Asmaa Mnebhi <asmaa@...dia.com>, davem@...emloft.net,
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
	netdev@...r.kernel.org, cai.huoqing@...ux.dev, brgl@...ev.pl,
	chenhao288@...ilicon.com, huangguangbin2@...wei.com,
	David Thompson <davthompson@...dia.com>
Subject: Re: [PATCH net v2 1/1] mlxbf_gige: Fix kernel panic at shutdown

On Mon, Jun 12, 2023 at 04:38:53PM +0300, Leon Romanovsky wrote:
> On Mon, Jun 12, 2023 at 04:28:41PM +0300, Vladimir Oltean wrote:
> > The sequence of operations is:
> > 
> > * device_shutdown() walks the devices_kset backwards, thus shutting down
> >   children before parents
> >   * .shutdown() method of child gets called
> >   * .shutdown() method of parent gets called
> >     * parent implements .shutdown() as .remove()
> >       * the parent's .remove() logic calls device_del() on its children
> >         * .remove() method of child gets called
> 
> But both child and parent are locked so they parent can't call to
> child's remove while child is performing shutdown.

Please view the call chain I've posted in an email client capable of
showing the indentation correctly. The 2 lines:

   * .shutdown() method of child gets called
   * .shutdown() method of parent gets called

have the same level of indentation because they occur sequentially
within the same function.

This means 2 things:

1. when the parent runs its .shutdown(), the .shutdown() of the child
   has already finished

2. device_shutdown() only locks "dev" and "dev->parent" for the duration
   of the "dev->driver->shutdown(dev)" procedure. However, the situation
   that you deem impossible due to locking is the dev->driver->shutdown(dev)
   of the parent device. That parent wasn't found through any dev->parent
   pointer, instead it is just another device in the devices_kset list.
   The logic of locking "dev" and "dev->parent" there won't help, since
   we would be locking the parent and the parent of the parent. This
   will obviously not prevent the original parent from calling any
   method of the original child - we're already one step higher in the
   hierarchy.

So your objection above does not really apply.

> BTW, I read the same device_shutdown() function before my first reply
> and came to different conclusions from you.

Well, you could try to experiment with putting ".shutdown = xxx_remove,"
in some bus drivers and see what happens.

Admittedly it was a few years ago, but I did study this problem and I
did have to fix real issues reported by real people based on the above
observations (which here are reproduced only from memory):
https://lore.kernel.org/all/20210920214209.1733768-2-vladimir.oltean@nxp.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ