linux-kernel - Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyCXYhfPX-_TivLP9rs8AYPnF8qBZKgp-4yZ-aJun6DHfg@mail.gmail.com>
Date: Thu, 25 Jul 2024 08:11:34 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: Marc Hartmayer <mhartmay@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, Lai Jiangshan <jiangshan.ljs@...group.com>, 
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>, Heiko Carstens <hca@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Mete Durlu <meted@...ux.ibm.com>
Subject: Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion

Hello Marc

Thank you for the report.

On Wed, Jul 24, 2024 at 12:19 AM Marc Hartmayer <mhartmay@...ux.ibm.com> wrote:

> Hi Lai,
>
> a bisect of a regression in our CI on s390x led to this patch. The bug
> is pretty easy to reproduce (currently, I only tested it on s390x - will
> try to test it on x86 as well):

I can't reproduce it in x86 after testing it for only 30 minutes.
It can definitely theoretically happen in x86.

>
> 1. Start a Linux QEMU/KVM guest with 2 cores using this patch and enable
>    `panic_on_warn=1` for the guest kernel.
> 2. Run the following command in the KVM guest:
>
>   $  dd if=/dev/zero of=/dev/null & while : ; do chcpu -d 1; chcpu -e 1; done
>
> 3. Wait for the crash. e.g.:
>
> 2024/07/23 18:01:21 [M83LP63]: [  157.267727] ------------[ cut here ]------------
> 2024/07/23 18:01:21 [M83LP63]: [  157.267735] WARNING: CPU: 21 PID: 725 at kernel/workqueue.c:3340 worker_thread+0x54e/0x558


> @@ -3330,7 +3338,6 @@ static int worker_thread(void *__worker)
>                 ida_free(&pool->worker_ida, worker->id);
>                 worker_detach_from_pool(worker);
>                 WARN_ON_ONCE(!list_empty(&worker->entry));
> -               kfree(worker);
>                 return 0;
>         }

The condition "!list_empty(&worker->entry)" can be true when the
worker is still in the cull_list awaiting being reaped by
reap_dying_workers() after
this change.

I will remove the WARN_ON_ONCE().

Thanks
Lai