[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <625d7464-cc31-4cac-bd50-7bed75212143@huawei.com>
Date: Fri, 15 Aug 2025 09:43:29 +0800
From: "wangwensheng (C)" <wangwensheng4@...wei.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: <rafael@...nel.org>, <dakr@...nel.org>, <robh@...nel.org>,
<broonie@...nel.org>, <saravanak@...gle.com>, <linux-kernel@...r.kernel.org>,
<chenjun102@...wei.com>
Subject: Re: [PATCH v2] driver core: Fix concurrent problem of
deferred_probe_extend_timeout()
在 2025/8/14 22:19, Greg KH 写道:
> On Thu, Aug 14, 2025 at 08:29:49PM +0800, Wang Wensheng wrote:
>> The deferred_probe_timeout_work may be canceled forever unexpected when
>> deferred_probe_extend_timeout() executes concurrently. Start with
>> deferred_probe_timeout_work pending, and the problem would
>> occur after the following sequence.
>>
>> CPU0 CPU1
>> deferred_probe_extend_timeout
>> -> cancel_delayed_work => true
>> deferred_probe_extend_timeout
>> -> cancel_delayed_wrok
>> -> __cancel_work
>> -> try_grab_pending
>> -> schedule_delayed_work
>> -> queue_delayed_work_on
>> since pending bit is grabbed,
>> just return without doing anything
>> -> set_work_pool_and_clear_pending
>> this __cancel_work return false and
>> the work would never be queued again
>>
>> The root cause is that the PENDING_BIT of the work_struct would be set
>> temporaily in __cancel_work and this bit could prevent the work_struct
>> to be queued in another CPU.
>>
>> Use deferred_probe_mutex to protect the cancel and queue operations for
>> the deferred_probe_timeout_work to fix this problem.
>>
>> Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
>> Cc: <stable@...r.kernel.org>
>> Signed-off-by: Wang Wensheng <wangwensheng4@...wei.com>
>> ---
>> drivers/base/dd.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 13ab98e033ea..00419d2ee910 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
>>
>> void deferred_probe_extend_timeout(void)
>> {
>> + guard(mutex)(&deferred_probe_mutex);
>
> But if you grab the lock here, in the probe timeout function, the lock
> will be grabbed again, causing a deadlock, right? If not, why not?
It's not a sync version of cancel_work, so the execuation of the work
function doesn't block us here, nor does the schedule_delayed_work does.
Indead, deferred_probe_mutex is used to protect the
deferred_probe_*_list, it looks better to use a new lock here. Right?
>
> Have you run this patch with lockdep enabled?
>
> This feels broken to me, what am I missing?
>
> thanks,
>
> greg k-h
>
Powered by blists - more mailing lists