lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2193347.yiUUSuA9gR@fedora.fritz.box>
Date: Fri, 03 Jan 2025 18:57:17 +0100
From: Francesco Valla <francesco@...la.it>
To: Niklas Cassel <cassel@...nel.org>, Andrew Lunn <andrew@...n.ch>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>,
 Andrew Lunn <andrew+netdev@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
 netdev@...r.kernel.org, Anand Moon <linux.amoon@...il.com>,
 Luis Chamberlain <mcgrof@...nel.org>, Saravana Kannan <saravanak@...gle.com>
Subject:
 Re: [PATCH] net: phy: don't issue a module request if a driver is available

Hi,

On Friday, 3 January 2025 at 16:37:33 Andrew Lunn <andrew@...n.ch> wrote:
> On Fri, Jan 03, 2025 at 03:34:26PM +0100, Niklas Cassel wrote:
> > On Fri, Jan 03, 2025 at 03:12:14PM +0100, Andrew Lunn wrote:
> > > > FWIW, the patch in $subject does make the splat go away for me.
> > > > (I have the PHY driver built as built-in).
> > > > 
> > > > The patch in $subject does "Add a list of registered drivers and check
> > > > if one is already available before resorting to call request_module();
> > > > in this way, if the PHY driver is already there, the MDIO bus can perform
> > > > the async probe."
> > > 
> > > Lets take a step backwards.
> > > 
> > > How in general should module loading work with async probe? Lets
> > > understand that first.
> > > 
> > > Then we can think about what is special about PHYs, and how we can fit
> > > into the existing framework.
> > 
> > I agree that it might be a good idea, if possible, for request_module()
> > itself to see if the requested module is already registered, and then do
> > nothing.
> > 
> > Adding Luis (modules maintainer) to CC.
> > 
> > Luis, phylib calls request_module() unconditionally, regardless if there
> > is a driver registered already (phy driver built as built-in).
> > 
> > This causes the splat described here to be printed:
> > https://lore.kernel.org/netdev/7103704.9J7NaK4W3v@fedora.fritz.box/T/#u
> > 
> > But as far as I can tell, this is a false positive, since the call cannot
> > possibly load any module (since the driver is built as built-in and not
> > as a module).
> 
> Please be careful here. Just because it is built in for your build
> does not mean it is built in for everybody. The code needs to be able
> to load the module if it is not built in.
> 

I fully agree here - moreover, the same WARN would be triggered if async probe
is used for a NIC driver which is built as a module, with PHY drivers
themselves being modules.

> Also, i've seen broken builds where the driver is both built in and
> exists as a module. User space will try to load the module, and the
> linker will then complain about duplicate symbols and the load fails.
> So it is not a false positive. There is also the question of what
> exactly causes the deadlock. Is simply calling user space to load a
> module which does not exist sufficient to cause the deadlock? That is
> what is happening in your system, with built in modules.
> 

The deadlock (which I did NOT observed) would be caused by a module init
function calling async_synchronize_full(); quoting from __request_module():

    /*
     * We don't allow synchronous module loading from async.  Module
     * init may invoke async_synchronize_full() which will end up
     * waiting for this task which already is waiting for the module
     * loading to complete, leading to a deadlock.
     */
    WARN_ON_ONCE(wait && current_is_async());

The deadlock was observed way back and the code above was added in 2013 [1];
some more work has been done during the years [2].

> Also, the kernel probably has no idea if the module has already been
> loaded nor not when request_module() is called. What we pass to the
> request_module() is not the name of the module, but a string which
> represents one of the IDs the PHY has. It is then userspace that maps
> that string to a kernel module via modules.alias. And there is a many
> to one mapping, many IDs map to one module. So it could be the module
> has already been loaded due to some other string.
> 
> So, back to my original question. How should module loading work with
> async probe?
>

Today it is not meant to work [1]; that was mainly the reason for my
original patch. Maybe the best solution would be to work on that.


[1] https://lore.kernel.org/all/20130118221227.GG24579@htj.dyndns.org/
[2] https://lore.kernel.org/lkml/YfsruBT19o3j0KD2@bombadil.infradead.org/T/#m4a8579a9e9a2508bea66223906e1917e164e6c70


Thank you!


Regards,
Francesco




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ