lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z3fTS5hOGawu18aH@shell.armlinux.org.uk>
Date: Fri, 3 Jan 2025 12:08:43 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Niklas Cassel <cassel@...nel.org>
Cc: Andrew Lunn <andrew@...n.ch>, Francesco Valla <francesco@...la.it>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	Heiner Kallweit <hkallweit1@...il.com>, netdev@...r.kernel.org,
	Anand Moon <linux.amoon@...il.com>
Subject: Re: [PATCH] net: phy: don't issue a module request if a driver is
 available

On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote:
> I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip),
> which on the radxa rock5b has a RTL8125 NIC connected to it.
> 
> By enabling async probe for the PCIe driver I get the same splat as Francesco.
> 
> Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840
> This PHY ID is defined in: drivers/net/phy/realtek.c.
> 
> Looking at my .config I have:
> CONFIG_REALTEK_PHY=y
> 
> So this is not built as a module, so I am a bit surprised to see this
> splat (since the driver is built as built-in).
> 
> 
> I think it would be nice if the phylib core could be fixed so that
> it does not try to load modules for drivers which are built as built-in.
> 
> 
> Also see this old thread that tries to enable async probe by default on
> DT systems:
> https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u
> 
> AFAICT, it seems that the phylib core is one of the biggest blockers from
> being able to enable async probe by default on DT systems.

Yes, we accept that phylib is incompatible with async probing. I don't
think that's going to change, because it's fundamentally baked in with
the way the whole fallback driver stuff works.

We *certainly* don't want to move the request_module() into
phy_attach*() (which is the point where we require the driver to be
bound or we fallback to the generic feature-reduced driver). First,
that *will* break SFP modules, no ifs or buts.

Second, moving it there would mean calling request_module() in many
cases with the RTNL held, which blocks things like new connections
network establishing while the module is requested (I've run into this
problem when the TI Wilink driver locks up holding the RTNL lock making
the platform impossible to remotely resolve if there isn't an already
open SSH connection.) We certainly don't want userspace to be doing
stuff while holding the "big" RTNL that affects much of networking.

Third, I suspect phylib already has a race between the PHY driver /
driver core binding the appropriate driver and phy_attach_direct()
attaching the fallback generic driver to the driverless PHY device,
and making this more "async" is going to open that race possibly to
the point where it becomes a problem. (At the moment, it doesn't
seem to cause any issue, so is theoretical right now - but if one
reads the code, it's obvious that there is no locking that prevents
a race there.)

What saves phylib right now is that by issueing the request_module(),
that will wait for the module to be loaded and initialised. The
initialisation function will register the PHY drivers in this module.
As this is synchronous, it will happen before request_module() returns,
and thus before phy_device_create() returns. Thus, if there is a module
available for the PHY, it will be loaded and available to be bound
to the PHY device by the time phy_device_register() is called. This
ensures that - in the case of an auto-loaded module, the race will
never happen.

Yes, it's weak. A scenario that could trigger this is loading PHY
driver modules in parallel with a call to the phy_attach*() functions,
e.g. when bringing up a network interface where the network driver
calls through to phy_attach*() from its .ndo_open() method. If we
simply make phylib's request_module() async, then this race will be
opened for auto-loaded modules as well.

Closing this race to give consistent results is impossible, even if
we add locking. If phy_attach*() were to complete first, the generic
driver would be used despite the PHY specific driver module being
loaded. Alternatively, if the PHY specific driver module finishes
being loaded before phy_attach*() is called, then the PHY specific
driver will be used for the device. So... it needs to be synchronous.

I also don't think "make a list of built-in drivers and omit the
request module" is an acceptable workaround - it's a sticky plaster
for the problem. If the PHY driver isn't built-in, then you have the
same problem with request_module() being issued. You could work around
that by ensuring that the PHY driver is built-in, but then we're
relying on multiple different things all being correct in diverse
areas, which is fragile.

All in all, I don't see at the moment any simple way to make phylib
async-probe compatible.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ