lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Jan 2024 08:38:25 -1000
From: Tejun Heo <tj@...nel.org>
To: Leonardo Bras <leobras@...hat.com>
Cc: Lai Jiangshan <jiangshanlai@...il.com>,
	Marcelo Tosatti <mtosatti@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 1/1] wq: Avoid using isolated cpus' timers on
 unbounded queue_delayed_work

Hello,

On Wed, Jan 24, 2024 at 10:45:50PM -0300, Leonardo Bras wrote:
> That's a good suggestion, but looking at workqueue_init_early() I see that, 
> in short:
> wq_unbound_cpumask = 	cpu_possible_mask & 
> 			housekeeping_cpumask(HK_TYPE_WQ) & 
> 			housekeeping_cpumask(HK_TYPE_DOMAIN) &
> 			wq_cmdline_cpumask
> 
> So wq_unbound_cpumask relates to domain and workqueue cpu isolation.
> 
> In our case, we are using this to choose in which cpu is the timer we want 
> to use, so it makes sense to use timer-related cpu isolation, instead.

- In the proposed code, when cpu == WORK_CPU_UNBOUND, it's always setting
  cpu to housekeeping_any_cpu(HK_TYPE_TIMER). This may unnecessarily move
  the timer and task away from local CPU. Preferring the local CPU would
  likely make sense.

- If HK_TYPE_TIMER and workqueue masks may not agree, setting dwork->cpu to
  the one returned from HK_TYPE_TIMER is likely problematic. That would
  force __queue_work() to use that CPU instead of picking one from
  wq_unbound_cpumask.

> As of today, your suggestion would work the same, as the only way to enable 
> WQ cpu isolation is to use nohz_full, which also enables TIMER cpu 
> isolation. But since that can change in the future, for any reason, I would 
> suggest that we stick to using the HK_TYPE_TIMER cpumask.
> 
> I can now notice that this can end up introducing an issue: possibly 
> running on a workqueue on a cpu outside of a valid wq_cmdline_cpumask.

Yeap.

> I would suggest fixing this in a couple ways:
> 1 - We introduce a new cpumask which is basically 
>     housekeeping_cpumask(HK_TYPE_DOMAIN) & wq_cmdline_cpumask, allowing us 
>     to keep the timer interrupt in the same cpu as the scheduled function,
> 2- We use the resulting cpu only to pick the right timer.
> 
> What are your thouhts on that?

How about something like the following instead?

- If current CPU is in HK_TYPE_TIMER, pick that CPU.

- If not, pick a CPU from HK_TYPE_TIMER.

- Do add_timer_on() on the selected CPU but leave dwork->cpu as
  WORK_CPU_UNBOUND and leave that part to __queue_work().

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ