linux-kernel - Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd5a27de-c8a9-892c-f413-66ea41221fdd@amd.com>
Date:   Fri, 9 Jun 2023 09:13:15 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Sandeep Dhavale <dhavale@...gle.com>, jiangshanlai@...il.com,
        torvalds@...ux-foundation.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org, kernel-team@...a.com,
        joshdon@...gle.com, brho@...gle.com, briannorris@...omium.org,
        nhuck@...gle.com, agk@...hat.com, snitzer@...nel.org,
        void@...ifault.com, kernel-team@...roid.com,
        Swapnil Sapkal <swapnil.sapkal@....com>
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

Hello Tejun,

On 6/9/2023 4:20 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jun 08, 2023 at 08:31:34AM +0530, K Prateek Nayak wrote:
>> [..snip..]
>> o I consistently see a WARN_ON_ONCE() in kick_pool() being hit when I
>>   run "sudo ./stress-ng --iomix 96 --timeout 1m". I've seen few
>>   different stack traces so far. Including all below just in case:
> ...
>> This is the same WARN_ON_ONCE() you had added in the HEAD commit:
>>
>>     $ scripts/faddr2line vmlinux kick_pool+0xdb
>>     kick_pool+0xdb/0xe0:
>>     kick_pool at kernel/workqueue.c:1130 (discriminator 1)
>>
>>     $ sed -n 1130,1132p kernel/workqueue.c
>>     if (!WARN_ON_ONCE(wake_cpu >= nr_cpu_ids))
>>         p->wake_cpu = wake_cpu;
>>     get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
>>
>> Let me know if you need any more data from my test setup.
>> P.S. The kernel is still up and running (~30min) despite hitting this
>> WARN_ON_ONCE() in my case :)
> 
> Okay, that was me being stupid and not initializing the new fields for
> per-cpu workqueues. Can you please test the following branch? It should have
> both bugs fixed properly.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2

I've not run into any panics or warnings with this one. Kernel has been
stable for ~30min while running stress-ng iomix. We'll resume the testing
with v2 :)

> 
> If that doesn't crash, I'd love to hear how it affects the perf regressions
> reported over that past few months.> 
> Thanks.
> 

--
Thanks and Regards,
Prateek