[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANZk6aTS9BODJiqtDSHxwhz2dV3RmaxRautR8WZfH5aYYhcQJw@mail.gmail.com>
Date: Tue, 21 Nov 2023 12:01:56 +0800
From: zhuangel570 <zhuangel570@...il.com>
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: fix invalid cpu in kick_pool
Thanks, I have uploaded my configuration and console logs to the following
links, please check.
https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/console.log
https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/config-6.7.rc1
https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/config-4.18.0-348.el8.x86_64
The issue was first discovered in my BM machine and for ease of debugging,
I ran a virtual machine of the same case and reproduced it. My test virtual
machine was installed from centos 8.5.2111 DVD (origin kernel is 4.18.0-348)
and then the kernel was updated from the 6.7.rc1 source code. The virtual
machine ran on 4 CPU, 8G memory and some virtio devices.
My investigation show, when "workqueue.unbound_cpus" and "isolcpus" are
configured as same cpuset, this will make the "wq_unbound_cpumask" as an
empty set, when some idle work task try to set "wake_cpu" from
"cpumask_any_distribute", an invalid CPU will be set, then may trigger
panic.
To be honestly, I am not really known why there is a "not-present page"
exception, after I remove "workqueue.unbound_cpus" from command line or
apply this patch to the running kernel, the system could boot successfully.
On Tue, Nov 21, 2023 at 3:07 AM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Mon, Nov 20, 2023 at 08:16:23PM +0800, Yong He wrote:
> > With incorrect unbound workqueue configurations, this may introduce kernel
> > panic, because cpumask_any_distribute() will not always return a valid cpu,
> > such as one set the 'isolcpus' and 'workqueue.unbound_cpus' into the same
> > cpuset, and this will make the @pool->attrs->__pod_cpumask an empty set,
> > then trigger panic like this:
>
> This shouldn't have happened. Can you share the configuration and the full
> dmesg? Let's fix the problem at the source.
>
> Thanks.
>
> --
> tejun
--
——————————
zhuangel570
——————————
Powered by blists - more mailing lists