lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yt9dzftp3gh2.fsf@linux.ibm.com>
Date: Fri, 19 Apr 2024 10:27:05 +0200
From: Sven Schnelle <svens@...ux.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: Lai Jiangshan <jiangshanlai@...il.com>, linux-kernel@...r.kernel.org,
        Heiko Carstens <hca@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] workqueue: fix selection of wake_cpu in kick_pool()

Hi Tejun,

Tejun Heo <tj@...nel.org> writes:

> On Wed, Apr 17, 2024 at 05:36:38PM +0200, Sven Schnelle wrote:
>> > This generally seems like a good idea but isn't this still racy? The CPU may
>> > go down between setting p->wake_cpu and wake_up_process().
>> 
>> Don't know without reading the source, but how does this code normally
>> protect against that?
>
> Probably by wrapping determining the wake_cpu and the wake_up inside
> cpu_read_lock() section.

Do you mean rcu_read_lock()? cpus_read_lock() takes a mutex, and the
crash happens in softirq context - so cpus_read_lock() can't be the
correct lock.

If i read the code correctly, cpu hotplug uses stop_machine_cpuslocked()
- so rcu_read_lock() should be sufficient for non-atomic context.

Looking at the backtrace the crash is actually happening in
arch_vpu_is_preempted(). I don't know the semantics of that function,
whether it is ok to call it for offline CPUs, or whether the calling
code should make sure that the cpu is online (which would be my guess).

Following the backtrace from my initial mail, I can't find a place where
a check is done whether p->wake_cpu is actually online. Eventually
available_idle_cpu() is calling vcpu_is_preempted(). I wonder whether
available_idle_cpu() should do a cpu_online() check right at the
beginning?

Adding Peter to CC, he probably knows.

Thanks,
Sven

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ