[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd5a27de-c8a9-892c-f413-66ea41221fdd@amd.com>
Date: Fri, 9 Jun 2023 09:13:15 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Tejun Heo <tj@...nel.org>
Cc: Sandeep Dhavale <dhavale@...gle.com>, jiangshanlai@...il.com,
torvalds@...ux-foundation.org, peterz@...radead.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com,
joshdon@...gle.com, brho@...gle.com, briannorris@...omium.org,
nhuck@...gle.com, agk@...hat.com, snitzer@...nel.org,
void@...ifault.com, kernel-team@...roid.com,
Swapnil Sapkal <swapnil.sapkal@....com>
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods
Hello Tejun,
On 6/9/2023 4:20 AM, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 08, 2023 at 08:31:34AM +0530, K Prateek Nayak wrote:
>> [..snip..]
>> o I consistently see a WARN_ON_ONCE() in kick_pool() being hit when I
>> run "sudo ./stress-ng --iomix 96 --timeout 1m". I've seen few
>> different stack traces so far. Including all below just in case:
> ...
>> This is the same WARN_ON_ONCE() you had added in the HEAD commit:
>>
>> $ scripts/faddr2line vmlinux kick_pool+0xdb
>> kick_pool+0xdb/0xe0:
>> kick_pool at kernel/workqueue.c:1130 (discriminator 1)
>>
>> $ sed -n 1130,1132p kernel/workqueue.c
>> if (!WARN_ON_ONCE(wake_cpu >= nr_cpu_ids))
>> p->wake_cpu = wake_cpu;
>> get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
>>
>> Let me know if you need any more data from my test setup.
>> P.S. The kernel is still up and running (~30min) despite hitting this
>> WARN_ON_ONCE() in my case :)
>
> Okay, that was me being stupid and not initializing the new fields for
> per-cpu workqueues. Can you please test the following branch? It should have
> both bugs fixed properly.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2
I've not run into any panics or warnings with this one. Kernel has been
stable for ~30min while running stress-ng iomix. We'll resume the testing
with v2 :)
>
> If that doesn't crash, I'd love to hear how it affects the perf regressions
> reported over that past few months.>
> Thanks.
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists