lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y3IonmwrJ3aqDbAw@kroah.com>
Date:   Mon, 14 Nov 2022 12:38:06 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Javier Martinez Canillas <javierm@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Saravana Kannan <saravanak@...gle.com>,
        Peter Robinson <pbrobinson@...hat.com>,
        Steev Klimaszewski <steev@...i.org>,
        Rob Herring <robh@...nel.org>,
        Sergio Lopez Pascual <slp@...hat.com>,
        Enric Balletbo i Serra <eballetbo@...hat.com>,
        John Stultz <jstultz@...gle.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH] driver core: Disable driver deferred probe timeout by
 default

On Mon, Nov 14, 2022 at 12:13:15PM +0100, Javier Martinez Canillas wrote:
> Hello Greg,
> 
> Thanks a lot for your feedback.
> 
> On 11/14/22 11:54, Greg Kroah-Hartman wrote:
> 
> [...]
> 
> >>
> >> This default value of 0 was reverted again by commit f516d01b9df2 ("Revert
> >> "driver core: Set default deferred_probe_timeout back to 0."") and set to
> >> 10 seconds instead. Which was still less than the 30 seconds that was set
> >> at some point to allow systems with drivers built as modules and loaded by
> >> user-land to probe drivers that were previously deferred.
> >>
> >> The 10 seconds timeout isn't enough for the mentioned systems, for example
> >> general purpose distributions attempt to build all the possible drivers as
> >> a module to keep the Linux kernel image small. But that means that in very
> >> likely that the probe deferral mechanism will timeout and drivers won't be
> >> probed correctly.
> > 
> > What specific "mentioned systems" have deferred probe drivers that are
> 
> The "mentioned systems" are the ones mentioned in the paragraph above:
> 
> "to allow systems with drivers built as modules and loaded by user-land to
> probe drivers that were previously deferred."
> 
> I even gave an example about general purpose distributions that build as
> much as possible as a module. What more info do you think that is missing?

Exact systems that this is failing on would be great to have.

> > failing on the current value?  What drivers are causing the long delay
> > here?  No one should be having to wait 10 seconds for a deferred delay
> > on a real system as that feels really wrong.
> >
> 
> Not really, it depends if the drivers are built-in, built as modules, in
> the initramfs or in the rootfs. A 10 seconds might not be enough if these
> modules are in the root partition and need to wait for this to be mounted
> and udev to load the modules, etc.

How does it take 10 seconds to load the initramfs for a system that
requires deferred probe devices?  What typs of hardware is this?

> Also, it may even be that the module alias is not correct and then users
> have to load them by explicitly have /etc/modules-load.d/ configs and so
> on.

Then that's a totally different issue and the module alias needs to be
fixed and is not relevant here.

> > Why not fix the drivers that are causing this delay and maybe move them
> > to be async so as to not slow down the whole boot process?
> >
> 
> Yes, these drivers could be fixed to report a proper module alias or the
> dependencies can be built-in or added to the initramfs and that does not
> change the fact that by default the kernel shouldn't make assumptions
> about when is safe to just timeout instead of -EPROBE_DEFER.

Please let me know which drivers these are that are causing problems so
we can fix them.

> Because with modules the kernel has no way to know when all the modules
> have been already been loaded by user-space or more drivers are going to
> be registered in the future.

Of course that is true, so we guess, and so far, 10 seconds is a good
enough guess for normal systems out there that use deferred probe.  What
exact system and drivers do not work with this today?

> Also, that's how probe deferral always worked since the mechanism was
> introduced. It's just recently that the behavior was changed to timeout.
> 
> A nice feature of the probe deferral mechanism is that it was simple and
> reliable. Adding a timeout makes it non-deterministic and more fragile IMO.

deferred probe was never simple or reliable or determinisitic.  It was a
hack we had to implement to handle complex hardware situations and
loadable drivers.  Let's not try to paper over driver bugs here by
making the timeout "forever" but rather fix the root problem in the
broken drivers.

So, what drivers do we need to fix up?

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ