lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 30 Nov 2021 11:46:56 +0000
From:   "Russell King (Oracle)" <linux@...linux.org.uk>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        Andrew Lunn <andrew@...n.ch>, Yinbo Zhu <zhuyinbo@...ngson.cn>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Michal Marek <michal.lkml@...kovi.net>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-kbuild@...r.kernel.org
Subject: Re: [PATCH v2 1/2] modpost: file2alias: fixup mdio alias garbled
 code in modules.alias

On Fri, Nov 26, 2021 at 11:21:03AM +0100, Heiner Kallweit wrote:
> On 26.11.2021 10:45, Yinbo Zhu wrote:
> > After module compilation, module alias mechanism will generate a ugly
> > mdio modules alias configure if ethernet phy was selected, this patch
> > is to fixup mdio alias garbled code.
> > 
> > In addition, that ugly alias configure will cause ethernet phy module
> > doens't match udev, phy module auto-load is fail, but add this patch
> > that it is well mdio driver alias configure match phy device uevent.
> > 
> I think Andrew asked you for an example already.
> For which PHY's the driver isn't auto-loaded?
> 
> In addition your commit descriptions are hard to read, especially the
> one for patch 2. Could you please try to change them to proper English?
> Not being a native speaker myself ..

Let's clear this up. PHY module loading is quite different - it does
_not_ use MODALIAS nor does it use the usual udev approach.

The modalias strings use aliases of the form "mdio:<semi-binary-string>"
with "?" used as a wildcard for each bit not in the mask. This is an
entirely appropriate scheme to use, as it allows matching an ID with
an arbitary mask. There is nothing wrong with this format - it may be
a bit on the long side, but it is an entirely valid solution.

The kernel has never generated a MODALIAS of this form, which is fine,
because we don't use MODALIAS or the uevent/udev approach to loading
the modules.

We instead use phy_request_driver_module() at PHY device creation time
to explicitly request modprobe to load a module of the form
"mdio:<binary-id>" which we know works (I have had the marvell10g and
bcm84881 modules autoloaded as a result of inserting SFPs.)

However, this won't work for PHY devices created _before_ the kernel
has mounted the rootfs, whether or not they end up being used. So,
every PHY mentioned in DT will be created before the rootfs is mounted,
and none of these PHYs will have their modules loaded.

I believe this is the root cause of Yinbo Zhu's issue.

However, changing the modalias format that we use is not a solution -
it _will_ cause DSA module loading to break. We've been here with the
SPI subsystem, where a patch was merged to change the modalias format
allegedly to fix loading of one or two modules, resulting in the
spi-nor driver failing to load (as it had done for years) - and the
resulting change was reverted and the revert backported to all the
stable trees. It created quite a mess. Linus has always been very clear
that if fixing one issue causes regressions, then the fix is wrong and
needs to be reverted. This is exactly what happened in the case of SPI.

This teaches us a lesson: changes to any modalias scheme that has been
in use for years need _extremely_ careful consideration and thorough
testing as they risk causing regressions. Without that, such changes
can result in difficult decisions where no matter what decision is
made, some breakage occurs as a result of sorting out the resulting
mess from not having considered the change carefully enough. It is far
better to avoid boxing oneself into a corner.

We can see that Yinbo Zhu's changes to fix his issue will cause
regressions with DSA, so it is simply an unacceptable fix. Reposting
the same code will _never_ change that fact. So please, Yinbo Zhu, stop
reposting your change. It is provably regression-creating and as such
will never be accepted. You also seem to be resistant to feedback -
I've asked you to separate out the "mdio_bus" change but you still have
not in your version 3 posted today. Therefore, I will assume that you
won't read this email, and in future if I see those patches again, I
will reply with a one-line "NAK" and a reference to this email.

We instead need a different approach to solving this issue. What that
approach is, I'm not sure right now - the tooling is setup to only
permit one MODALIAS published by the kernel, so we can't publish both
a DT based modalias and a mdio: based modalias together. It's one or
the other.

What we _could_ do is review all device trees and PHY drivers to see
whether DT modaliases are ever used for module loading. If they aren't,
then we _could_ make the modalias published by the kernel conditional
on the type of mdio device - continue with the DT approach for non-PHY
devices, and switch to the mdio: scheme for PHY devices. I repeat, this
can only happen if no PHY drivers match using the DT scheme, otherwise
making this change _will_ cause a regression.

The alternative is we simply declare that udev based module auto-loading
of PHY drivers required for PHYs in DT is simply not supported, and is
something we are unable to support. For something like root-NFS or IP
autoconfiguration by the kernel, that is already the case - the PHY
driver modules _must_ be built-in to the kernel in just the same way as
the network driver modules must be.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ