[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e35b1031-5663-4d7f-b689-73cc25a9ecfb@oss.qualcomm.com>
Date: Thu, 5 Feb 2026 10:17:40 +0100
From: Hans de Goede <johannes.goede@....qualcomm.com>
To: Bjorn Andersson <andersson@...nel.org>
Cc: Saravana Kannan <saravanak@...nel.org>, Rob Herring <robh@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Danilo Krummrich
<dakr@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] driver core: Make deferred_probe_timeout default a
Kconfig option
Hi Bjorn,
Thank you for your comments.
On 4-Feb-26 22:52, Bjorn Andersson wrote:
> On Wed, Feb 04, 2026 at 04:00:45PM +0100, Hans de Goede wrote:
>
> Thanks for posting this, Hans. Let's loop in Saravana and Rob as well,
> who looked at this subject in the past.
>
>> Code using driver_deferred_probe_check_state() differs from most
>> EPROBE_DEFER handling in the kernel. Where other EPROBE_DEFER handling
>> (e.g. clks, gpios and regulators) waits indefinitely for suppliers to
>> show up, code using driver_deferred_probe_check_state() will fail
>> after the deferred_probe_timeout.
>>
>> This is a problem for generic distro kernels which want to support many
>> boards using a single kernel build. These kernels want as much drivers to
>> be modular as possible. The initrd also should be as small as possible,
>> so the initrd will *not* have drivers not needing to get the rootfs.
>>
>
> This problem manifests itself in the upstream kernel, for upstream
> developers as well.
>
> On some platforms we have intermittent boot failures even when testing
> with a minimal ramdisk (with kernel modules overlaid), because of the
> non-deterministic module loading order it might take time before we get
> the providers lined up.
>
> Another concrete issue is that the Qualcomm CPUfreq driver, while
> builtin, on many targets has dependencies on drivers that we today mark
> as modules. So with a decently sized ramdisk we don't have time to
> unpack the ramdisk before things start breaking.
>
>
> The typical symptom I see when this happens is that the SMMU fails to
> find its power-domain provider, in some cases the result is
> non-functional system, but often the hardware state ends up such that
> the board resets...
>
>> Combine this with waiting for a full-disk encryption password in
>> the initrd and it is pretty much guaranteed that the default 10s timeout
>> will be hit, causing probe() failures when drivers on the rootfs happen
>> to get modprobe-d before other rootfs modules providing their suppliers.
>>
>
> Indeed, LUKS is a challenge, performing any form of debugging of what
> kernel modules you forgot to inject into your ramdisk is impossible.
>
>> Make the default timeout configurable from Kconfig to allow distro kernel
>> configs where many of the supplier drivers are modules to set the default
>> through Kconfig and allow using a value of -1 to disable the timeout
>> (wait indefinitely).
>>
>
> The timeout mechanism was introduced to handle those exceptional cases
> where distro-kernels are missing specific provider drivers but still
> want to roll the dice and try to reach a functional user space to allow
> the user to correct the issue.
>
> There's clearly many situations where that will not work in today's
> kernel - and as we evolve sync_state, this problem is going to grow.
>
>
> I therefor would, once again, like to see the default value to be "no
> timeout". We can keep the option for the user to opt-in to the
> alternative (riskier) path. For this the command line option would
> suffice, but with a new default.
>
>
> The added Kconfig option of course would allow distributions to set the
> default to -1, but I'd prefer to provide a sane default value.
AFAICT when this was discussed before opinions on this were divided.
Which is why I've chosen to just make the default configurable so
that distros/people can chose.
I'm not necessarily against making -1 the default, but I think that
might be a hard to sell to some people.
Note that if this lands you can always make the default -1 for
qcom specific defconfigs.
Regards,
Hans
>> Signed-off-by: Hans de Goede <johannes.goede@....qualcomm.com>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 2 +-
>> drivers/base/Kconfig | 9 +++++++++
>> drivers/base/dd.c | 9 ++++-----
>> 3 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 1058f2a6d6a8..80d300c4e16b 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1250,7 +1250,7 @@ Kernel parameters
>> out hasn't expired, it'll be restarted by each
>> successful driver registration. This option will also
>> dump out devices still on the deferred probe list after
>> - retrying.
>> + retrying. Set to -1 to wait indefinitely.
>>
>> delayacct [KNL] Enable per-task delay accounting
>>
>> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
>> index 1786d87b29e2..f7d385cbd3ba 100644
>> --- a/drivers/base/Kconfig
>> +++ b/drivers/base/Kconfig
>> @@ -73,6 +73,15 @@ config DEVTMPFS_SAFE
>> with the PROT_EXEC flag. This can break, for example, non-KMS
>> video drivers.
>>
>> +config DRIVER_DEFERRED_PROBE_TIMEOUT
>> + int "Default value for deferred_probe_timeout"
>> + default 0 if !MODULES
>> + default 10 if MODULES
>> + help
>> + Set the default value for the deferred_probe_timeout kernel parameter.
>> + See Documentation/admin-guide/kernel-parameters.txt for a description
>> + of the deferred_probe_timeout kernel parameter.
>> +
>> config STANDALONE
>> bool "Select only drivers that don't need compile-time external firmware"
>> default y
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index bea8da5f8a3a..e57144aa168d 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -257,11 +257,7 @@ static int deferred_devs_show(struct seq_file *s, void *data)
>> }
>> DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>>
>> -#ifdef CONFIG_MODULES
>> -static int driver_deferred_probe_timeout = 10;
>> -#else
>> -static int driver_deferred_probe_timeout;
>> -#endif
>> +static int driver_deferred_probe_timeout = CONFIG_DRIVER_DEFERRED_PROBE_TIMEOUT;
>>
>> static int __init deferred_probe_timeout_setup(char *str)
>> {
>> @@ -323,6 +319,9 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
>>
>> void deferred_probe_extend_timeout(void)
>> {
>> + if (driver_deferred_probe_timeout < 0)
>> + return;
>> +
>> /*
>> * If the work hasn't been queued yet or if the work expired, don't
>> * start a new one.
>> --
>> 2.52.0
>>
Powered by blists - more mailing lists