lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c842fb2d-47e9-4b20-a5db-f55730b2c093@oracle.com>
Date: Wed, 5 Feb 2025 11:54:20 +1100
From: imran.f.khan@...cle.com
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, haakon.bugge@...cle.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] workqueue: introduce
 queue_delayed_work_on_offline_safe

Hello Tejun,
Thanks for getting back on this.
On 5/2/2025 6:17 am, Tejun Heo wrote:
> On Tue, Feb 04, 2025 at 10:36:35PM +1100, Imran Khan wrote:
>> Currently users of queue_delayed_work_on, need to ensure
>> that specified cpu is and remains online. The failure to
>> do so may result in delayed_work getting queued on an
>> offlined cpu and hence never getting executed.
>>
>> The current users of queue_delayed_work_on, seem to ensure
>> the above mentioned criteria but for those, unknown amongst
>> current users or new users, who can't confirm to this
>> we need another interface.
>>
>> So introduce queue_delayed_work_on_offline_safe, which
>> is a wrapper around queue_delayed_work_on to ensure that
>> the specified cpu is and remains online.
>>
>> Signed-off-by: Imran Khan <imran.f.khan@...cle.com>
>> Acked-by: Haakon Bugge <haakon.bugge@...cle.com>
> 
> So, idk, do we really need this? Can't we just add a debug warning which
> triggers when CPU goes down with delayed works queued on it?
> 
Actually, we are good for cases where a CPU goes offline with delayed
works queued on it, because the associated timers will migrate to other
cpu (BP).
The problem is for the cases where a CPU is already offline or in the
middle of being offlined and past timers_dead_cpu callback. In such a
scenario if someone puts a delayed work on this CPU, we have problem. 
The WARN_ON_ONCE in [1] can indicate  this but the dwork's timer would
still end up on the offlined cpu and will not be migrated (since CPU was
past timer_dead state , when dwork was queued).

One way to avoid this is that we ask callers to do the needful (hotplug lock,
hotplug callbacks) and ensure dwork does not end up on such offlined CPU.
The other way (as attempted in this patch) would be to give such users an
interface, that can ensure that  dwork never ends on on offlined CPU.

Thanks,
Imran

[1]: https://lore.kernel.org/all/20250109232711.2081259-1-imran.f.khan@oracle.com/
> Thanks.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ