[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35466c02-16da-0305-6d53-1c3bbf326418@gmail.com>
Date: Thu, 9 Sep 2021 19:15:40 -0700
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <olteanv@...il.com>,
Lino Sanfilippo <LinoSanfilippo@....de>,
Saravana Kannan <saravanak@...gle.com>
Cc: p.rosenberger@...bus.com, woojung.huh@...rochip.com,
UNGLinuxDriver@...rochip.com, andrew@...n.ch,
vivien.didelot@...il.com, davem@...emloft.net, kuba@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
On 9/9/2021 3:54 PM, Vladimir Oltean wrote:
[snip]
> Question 6: How about a patch on the device core that is more lightweight?
> Wouldn't it be sensible for device_shutdown() to just call ->remove if
> the device's bus has no ->shutdown, and the device's driver doesn't have
> a ->shutdown either?
>
> Answer: This would sometimes work, the vast majority of DSA switch
> drivers, and Ethernet controllers (in this case used as DSA masters) do
> not have a .shutdown method implemented. But their bus does: PCI does,
> SPI controllers do, most of the time. So it would work for limited
> scenarios, but would be ineffective in the general sense.
Having wondered about that question as well, I don't really see a
compelling reason as to why we do not default to calling .remove() when
.shutdown() is not implemented. In almost all of the cases the semantics
of .remove() are superior to those required by .shutdown().
>
> Question 7: I said that .shutdown, as opposed to .remove, doesn't really
> care so much about the integrity of data structures. So how far should
> we really go to fix this issue? Should we even bother to unbind the
> whole DSA tree, when the sole problem is that we are the DSA master's
> upper, and that is keeping a reference on it?
>
> Answer: Well, any solution that does unnecessary data structure teardown
> only delays the reboot for nothing. Lino's patch just bluntly calls
> dsa_tree_teardown() from the switch .shutdown method, and this leaks
> memory, namely dst->ports. But does this really matter? Nope, so let's
> extrapolate. In this case, IMO, the simplest possible solution would be
> to patch bcmgenet to not unregister the net device. Then treat every
> other DSA master driver in the same way as they come, one by one.
> Do you need to unregister_netdevice() at shutdown? No. Then don't.
> Is it nice? Probably not, but I'm not seeing alternatives.
It does not really scale but we also don't have that many DSA masters to
support, I believe I can name them all: bcmgenet, stmmac, bcmsysport,
enetc, mv643xx_eth, cpsw, macb. If you want me to patch bcmgenet, give
me a few days to test and make sure there is no power management
regression, that's the primary concern I have.
>
> Also, unless I'm missing something, Lino probably still sees the WARN_ON
> in bcmgenet's unregister_netdevice() about eth0 getting unregistered
> while having an upper interface. If not, it's by sheer luck that the DSA
> switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
> this reason, it isn't a great solution either. If the device links can't
> guarantee us some sort of shutdown ordering (what we ideally want, as
> mentioned, is for the DSA switch driver to get _unbound_ (->remove)
> before the DSA master gets unbound or shut down).
>
All of your questions are good and I don't have answers to any of them,
however I would like you and others to reason about .shutdown() not just
in the context of a reboot, or kexec'd kernel but also in the context of
putting the system into ACPI S5 (via poweroff). In that case the goal is
not only to quiesce the device, the goal is also to put it in a low
power mode.
For bcmgenet specifically the code path that leads to a driver remove is
well tested and is guaranteeing the network device registration, thus
putting the PHY into suspend, shutting down DMAs, turning off clocks.
This is a big hammer, but it gets the job done and does not introduce
yet another code path to test, it's the same as the module removal.
--
Florian
Powered by blists - more mailing lists