lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Nov 2021 08:57:54 +0100
From:   Uwe Kleine-König <u.kleine-koenig@...gutronix.de>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "David S. Miller" <davem@...emloft.net>, kernel@...gutronix.de,
        netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
        Vivien Didelot <vivien.didelot@...il.com>
Subject: Re: [PATCH net-next] net: dsa: Some cleanups in remove code

Hello Vladimir,

On Thu, Nov 11, 2021 at 12:56:11AM +0200, Vladimir Oltean wrote:
> On Wed, Nov 10, 2021 at 10:03:46PM +0100, Uwe Kleine-König wrote:
> > Hello Vladimir,
> > 
> > On Wed, Nov 10, 2021 at 03:15:40PM +0200, Vladimir Oltean wrote:
> > > On Tue, Nov 09, 2021 at 06:50:55PM +0100, Uwe Kleine-König wrote:
> > > > On Tue, Nov 09, 2021 at 01:54:34PM +0200, Vladimir Oltean wrote:
> > > > > Your commit prefix does not reflect the fact that you are touching the
> > > > > vsc73xx driver. Try "net: dsa: vsc73xx: ".
> > > > 
> > > > Oh, I missed that indeed.
> > > > 
> > > > > On Tue, Nov 09, 2021 at 12:39:21PM +0100, Uwe Kleine-König wrote:
> > > > > > vsc73xx_remove() returns zero unconditionally and no caller checks the
> > > > > > returned value. So convert the function to return no value.
> > > > > 
> > > > > This I agree with.
> > > > > 
> > > > > > For both the platform and the spi variant ..._get_drvdata() will never
> > > > > > return NULL in .remove() because the remove callback is only called after
> > > > > > the probe callback returned successfully and in this case driver data was
> > > > > > set to a non-NULL value.
> > > > > 
> > > > > Have you read the commit message of 0650bf52b31f ("net: dsa: be
> > > > > compatible with masters which unregister on shutdown")?
> > > > 
> > > > No. But I did now. I consider it very surprising that .shutdown() calls
> > > > the .remove() callback and would recommend to not do this. The commit
> > > > log seems to prove this being difficult.
> > > 
> > > Why do you consider it surprising?
> > 
> > In my book .shutdown should be minimal and just silence the device, such
> > that it e.g. doesn't do any DMA any more.
> 
> To me, the more important thing to consider is that many drivers lack
> any ->shutdown hook at all, and making their ->shutdown simply call
> ->remove is often times the least-effort path of doing something
> reasonable towards quiescing the hardware. Not to mention the lesser
> evil compared to not having a ->shutdown at all.
> 
> That's not to say I am not in favor of a minimal shutdown procedure if
> possible. Notice how DSA has dsa_switch_shutdown() vs dsa_unregister_switch().
> But judging what should go into dsa_switch_shutdown() was definitely not
> simple and there might still be corner cases that I missed - although it
> works for now, knock on wood.
> 
> The reality is that you'll have a very hard time convincing people to
> write a dedicated code path for shutdown, if you can convince them to
> write one at all. They wouldn't even know if it does all the right
> things - it's not like you kexec every day (unless you're using Linux as
> a bootloader - but then again, if you do that you're kind of asking for
> trouble - the reason why this is the case is exactly because not having
> a ->shutdown hook implemented by drivers is an option, and the driver
> core doesn't e.g. fall back to calling the ->remove method, even with
> all the insanity that might ensue).

Maybe I'm missing an important point here, but I thought it to be fine
for most drivers not to have a .shutdown hook.

> > > Many drivers implement ->shutdown by calling ->remove for the simple
> > > reason that ->remove provides for a well-tested code path already, and
> > > leaves the hardware in a known state, workable for kexec and others.
> > > 
> > > Many drivers have buses beneath them. Those buses go away when these
> > > drivers unregister, and so do their children.
> > > 
> > > ==============================================
> > > 
> > > => some drivers do both => children of these buses should expect to be
> > > potentially unregistered after they've been shut down.
> > 
> > Do you know this happens, or do you "only" fear it might happen?
> 
> Are you asking whether there are SPI controllers that implement
> ->shutdown as ->remove?

No I ask if it happens a lot / sometimes / ever that a driver's remove
callback is run for a device that was already shut down.

> Just search for "\.shutdown" in drivers/spi.
> 3 out of 3 implementations call ->remove.
> 
> If you really have time to waste, here, have fun: Lino Sanfilippo had
> not one, but two (!!!) reboot problems with his ksz9897 Ethernet switch
> connected to a Raspberry Pi, both of which were due to other drivers
> implementing their ->shutdown as ->remove. First driver was the DSA
> master/host port (bcmgenet), the other was the bcm2835_spi controller.
> https://patchwork.kernel.org/project/netdevbpf/cover/20210909095324.12978-1-LinoSanfilippo@gmx.de/
> https://patchwork.kernel.org/project/netdevbpf/cover/20210912120932.993440-1-vladimir.oltean@nxp.com/
> https://patchwork.kernel.org/project/netdevbpf/cover/20210917133436.553995-1-vladimir.oltean@nxp.com/
> As soon as we implemented ->shutdown in DSA drivers (which we had mostly
> not done up until that thread) we ran into the surprise that ->remove
> will get called too. Yay. It wasn't trivial to sort out, but we did it
> eventually in a more systematic way. Not sure whether there's anything
> to change at the drivers/base/ level.

What I wonder is: There are drivers that call .remove from .shutdown. Is
the right action to make other parts of the kernel robust with this
behaviour, or should the drivers changed to not call .remove from
.shutdown?

IMHO this is a question of promises of/expectations against the core
device layer. It must be known if for a shut down device there is (and
should be) a possibility that .remove is called. Depending on that
device drivers must be ready for this to happen, or can rely on it not
to happen.

From a global maintenance POV it would be good if it could not happen,
because then the complexity is concentrated to a small place (i.e. the
driver core, or maybe generic code in all subsystems) instead of making
each and every driver robust to this possible event that a considerable
part of the driver writers isn't aware of.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ