lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12d876ba-a325-4442-9526-3ea9e2117c0b@gmx.de>
Date: Mon, 5 Feb 2024 10:58:26 +0100
From: Helge Deller <deller@....de>
To: Tejun Heo <tj@...nel.org>
Cc: Helge Deller <deller@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
 linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug

Hi Tejun,

On 2/2/24 18:29, Tejun Heo wrote:
> Hello, Helge.
>
> On Fri, Feb 02, 2024 at 09:41:38AM +0100, Helge Deller wrote:
>> In a second step I extended your patch to print the present
>> and online CPUs too. Below is the relevant dmesg part.
>>
>> Note, that on parisc the second CPU will be activated later in the
>> boot process, after the kernel has the inventory.
>> This I think differs vs x86, where all CPUs are available earlier
>> in the boot process.
>> ...
>> [    0.000000] XXX workqueue_init_early: possible_cpus=ffff  present=0001  online=0001
> ...
>> [    0.228080] XXX workqueue_init: possible_cpus=ffff  present=0001  online=0001
> ...
>> [    0.263466] XXX workqueue_init_topology: possible_cpus=ffff  present=0001  online=0001
>
> So, what's bothersome is that when the wq_dump.py script printing each cpu's
> pwq, it's only printing for CPU 0 and 1. The for_each_possible_cpu() drgn
> helper reads cpu_possible_mask from the kernel and iterates that, so that
> most likely indicates at some point the cpu_possible_mask becomes 0x3
> instead of the one used during boot - 0xffff, which is problematic.
>
> Can you please sprinkle more printks to find out whether and when the
> cpu_possible_mask changes during boot?

It seems the commit 0921244f6f4f ("parisc: Only list existing CPUs in cpu_possible_mask")
is the culprit. Reverting that patch makes cpu hot-unplug work again.
Furthermore this commit breaks the cpumask Kunit test as reported by Guenter:
https://lkml.org/lkml/2024/2/4/146

So, I've added the revert to the parisc git tree and if my further tests
go well I'll push it upstream.

Thanks for your help!!
Helge

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ