[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <471bf84d-9d58-befc-8224-359a62e29786@collabora.com>
Date: Thu, 17 Aug 2023 17:06:57 +0530
From: Shreeya Patel <shreeya.patel@...labora.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: saravanak@...gle.com, stable@...r.kernel.org,
John Stultz <jstultz@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Jakub Kicinski <kuba@...nel.org>,
Rob Herring <robh@...nel.org>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>,
Robin Murphy <robin.murphy@....com>,
Andy Shevchenko <andy.shevchenko@...il.com>,
Sudeep Holla <sudeep.holla@....com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Naresh Kamboju <naresh.kamboju@...aro.org>,
Basil Eljuse <Basil.Eljuse@....com>,
Ferry Toth <fntoth@...il.com>, Arnd Bergmann <arnd@...db.de>,
Anders Roxell <anders.roxell@...aro.org>,
linux-pm@...r.kernel.org, Nathan Chancellor <nathan@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Geert Uytterhoeven <geert+renesas@...der.be>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Sasha Levin <sashal@...nel.org>, linux-kernel@...r.kernel.org,
"gustavo.padovan@...labora.com" <gustavo.padovan@...labora.com>,
Ricardo CaƱuelo Navarro
<ricardo.canuelo@...labora.com>,
Guillaume Charles Tucker <guillaume.tucker@...labora.com>,
usama.anjum@...labora.com, kernelci@...ts.linux.dev
Subject: Re: [PATCH 5.17 127/298] driver core: Fix wait_for_device_probe() &
deferred_probe_timeout interaction
Hi Greg,
On 16/08/23 20:33, Greg Kroah-Hartman wrote:
> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
>> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
>>> From: Saravana Kannan<saravanak@...gle.com>
>>>
>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
>>>
>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was
>>> non-zero [1]. This was because ip_auto_config() initcall times out
>>> waiting for the network interfaces to show up when
>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls
>>> wait_for_device_probe() to make sure any currently running deferred
>>> probe work or asynchronous probe finishes, that wasn't sufficient to
>>> account for devices being deferred until deferred_probe_timeout.
>>>
>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
>>> until the deferred_probe_timeout fires") tried to fix that by making
>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire
>>> before returning.
>>>
>>> However, if wait_for_device_probe() is called from the kernel_init()
>>> context:
>>>
>>> - Before deferred_probe_initcall() [2], it causes the boot process to
>>> hang due to a deadlock.
>>>
>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from
>>> continuing till deferred_probe_timeout expires and beats the point of
>>> deferred_probe_timeout that's trying to wait for userspace to load
>>> modules.
>>>
>>> Neither of this is good. So revert the changes to
>>> wait_for_device_probe().
>>>
>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
>> Hi Saravana, Greg,
>>
>>
>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
>> see the following details for more information.
>>
>> KernelCI dashboard link:
>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>>
>> Error messages from the logs :-
>>
>> + UUID=11236495_1.5.2.4.5
>> + set +x
>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
>> + cd /opt/bootrr/libexec/bootrr
>> + sh helpers/bootrr-auto
>> e6800000.ethernet
>> e6700000.dma-controller
>> e7300000.dma-controller
>> e7310000.dma-controller
>> ec700000.dma-controller
>> ec720000.dma-controller
>> fea20000.vsp
>> feb00000.display
>> fea28000.vsp
>> fea30000.vsp
>> fe9a0000.vsp
>> fe9af000.fcp
>> fea27000.fcp
>> fea2f000.fcp
>> fea37000.fcp
>> sound
>> ee100000.mmc
>> ee140000.mmc
>> ec500000.sound
>> /lava-11236495/1/../bin/lava-test-case
>> <8>[ 17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>>
>> Test case failing :-
>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>>
>> Regression Reproduced :-
>>
>> Lava job after reverting the commit 5ee76c256e92
>> https://lava.collabora.dev/scheduler/job/11292890
>>
>>
>> Bisection report from KernelCI can be found at the bottom of the email.
>>
>> Thanks,
>> Shreeya Patel
>>
>> #regzbot introduced: 5ee76c256e92
>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
>> * If you do send a fix, please include this trailer: *
>> * Reported-by: "kernelci.org bot" <bot@...> *
>> * *
>> * Hope this helps! *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
>> r8a77960-ulcb
> You are testing 5.10.y, yet the subject says 5.17?
>
> Which is it here?
Sorry, I accidentally used the lore link for 5.17 while reporting this
issue,
but this test does fail on all the stable releases from 5.10 onwards.
stable 5.15 :-
https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/
mainline :-
https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/
Thanks,
Shreeya Patel
>
> confused,
>
> greg k-h
>
Powered by blists - more mailing lists