linux-kernel - Re: [PATCH v3] workqueue: introduce queue_delayed_work_on_offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c842fb2d-47e9-4b20-a5db-f55730b2c093@oracle.com>
Date: Wed, 5 Feb 2025 11:54:20 +1100
From: imran.f.khan@...cle.com
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, haakon.bugge@...cle.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] workqueue: introduce
 queue_delayed_work_on_offline_safe

Hello Tejun,
Thanks for getting back on this.
On 5/2/2025 6:17 am, Tejun Heo wrote:
> On Tue, Feb 04, 2025 at 10:36:35PM +1100, Imran Khan wrote:
>> Currently users of queue_delayed_work_on, need to ensure
>> that specified cpu is and remains online. The failure to
>> do so may result in delayed_work getting queued on an
>> offlined cpu and hence never getting executed.
>>
>> The current users of queue_delayed_work_on, seem to ensure
>> the above mentioned criteria but for those, unknown amongst
>> current users or new users, who can't confirm to this
>> we need another interface.
>>
>> So introduce queue_delayed_work_on_offline_safe, which
>> is a wrapper around queue_delayed_work_on to ensure that
>> the specified cpu is and remains online.
>>
>> Signed-off-by: Imran Khan <imran.f.khan@...cle.com>
>> Acked-by: Haakon Bugge <haakon.bugge@...cle.com>
> 
> So, idk, do we really need this? Can't we just add a debug warning which
> triggers when CPU goes down with delayed works queued on it?
> 
Actually, we are good for cases where a CPU goes offline with delayed
works queued on it, because the associated timers will migrate to other
cpu (BP).
The problem is for the cases where a CPU is already offline or in the
middle of being offlined and past timers_dead_cpu callback. In such a
scenario if someone puts a delayed work on this CPU, we have problem. 
The WARN_ON_ONCE in [1] can indicate  this but the dwork's timer would
still end up on the offlined cpu and will not be migrated (since CPU was
past timer_dead state , when dwork was queued).

One way to avoid this is that we ask callers to do the needful (hotplug lock,
hotplug callbacks) and ensure dwork does not end up on such offlined CPU.
The other way (as attempted in this patch) would be to give such users an
interface, that can ensure that  dwork never ends on on offlined CPU.

Thanks,
Imran

[1]: https://lore.kernel.org/all/20250109232711.2081259-1-imran.f.khan@oracle.com/
> Thanks.
>