lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35466c02-16da-0305-6d53-1c3bbf326418@gmail.com>
Date:   Thu, 9 Sep 2021 19:15:40 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>,
        Lino Sanfilippo <LinoSanfilippo@....de>,
        Saravana Kannan <saravanak@...gle.com>
Cc:     p.rosenberger@...bus.com, woojung.huh@...rochip.com,
        UNGLinuxDriver@...rochip.com, andrew@...n.ch,
        vivien.didelot@...il.com, davem@...emloft.net, kuba@...nel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown



On 9/9/2021 3:54 PM, Vladimir Oltean wrote:
[snip]
> Question 6: How about a patch on the device core that is more lightweight?
> Wouldn't it be sensible for device_shutdown() to just call ->remove if
> the device's bus has no ->shutdown, and the device's driver doesn't have
> a ->shutdown either?
> 
> Answer: This would sometimes work, the vast majority of DSA switch
> drivers, and Ethernet controllers (in this case used as DSA masters) do
> not have a .shutdown method implemented. But their bus does: PCI does,
> SPI controllers do, most of the time. So it would work for limited
> scenarios, but would be ineffective in the general sense.

Having wondered about that question as well, I don't really see a 
compelling reason as to why we do not default to calling .remove() when 
.shutdown() is not implemented. In almost all of the cases the semantics 
of .remove() are superior to those required by .shutdown().

> 
> Question 7: I said that .shutdown, as opposed to .remove, doesn't really
> care so much about the integrity of data structures. So how far should
> we really go to fix this issue? Should we even bother to unbind the
> whole DSA tree, when the sole problem is that we are the DSA master's
> upper, and that is keeping a reference on it?
> 
> Answer: Well, any solution that does unnecessary data structure teardown
> only delays the reboot for nothing. Lino's patch just bluntly calls
> dsa_tree_teardown() from the switch .shutdown method, and this leaks
> memory, namely dst->ports. But does this really matter? Nope, so let's
> extrapolate. In this case, IMO, the simplest possible solution would be
> to patch bcmgenet to not unregister the net device. Then treat every
> other DSA master driver in the same way as they come, one by one.
> Do you need to unregister_netdevice() at shutdown? No. Then don't.
> Is it nice? Probably not, but I'm not seeing alternatives.

It does not really scale but we also don't have that many DSA masters to 
support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, 
enetc, mv643xx_eth, cpsw, macb. If you want me to patch bcmgenet, give 
me a few days to test and make sure there is no power management 
regression, that's the primary concern I have.

> 
> Also, unless I'm missing something, Lino probably still sees the WARN_ON
> in bcmgenet's unregister_netdevice() about eth0 getting unregistered
> while having an upper interface. If not, it's by sheer luck that the DSA
> switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
> this reason, it isn't a great solution either. If the device links can't
> guarantee us some sort of shutdown ordering (what we ideally want, as
> mentioned, is for the DSA switch driver to get _unbound_ (->remove)
> before the DSA master gets unbound or shut down).
> 

All of your questions are good and I don't have answers to any of them, 
however I would like you and others to reason about .shutdown() not just 
in the context of a reboot, or kexec'd kernel but also in the context of 
putting the system into ACPI S5 (via poweroff). In that case the goal is 
not only to quiesce the device, the goal is also to put it in a low 
power mode.

For bcmgenet specifically the code path that leads to a driver remove is 
well tested and is guaranteeing the network device registration, thus 
putting the PHY into suspend, shutting down DMAs, turning off clocks. 
This is a big hammer, but it gets the job done and does not introduce 
yet another code path to test, it's the same as the module removal.
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ