linux-kernel - Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyBVq84_q5FevmpHPeseAudyf2gOX2LcWt0fMhNddkiz1w@mail.gmail.com>
Date: Wed, 11 Sep 2024 11:32:17 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: Marc Hartmayer <mhartmay@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, Lai Jiangshan <jiangshan.ljs@...group.com>, 
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>, Heiko Carstens <hca@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Mete Durlu <meted@...ux.ibm.com>, 
	Christian Borntraeger <borntraeger@...ux.ibm.com>
Subject: Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion

On Wed, Sep 11, 2024 at 11:23 AM Lai Jiangshan <jiangshanlai@...il.com> wrote:
>
> Hello, Marc
>
> On Wed, Sep 11, 2024 at 12:29 AM Marc Hartmayer <mhartmay@...ux.ibm.com>
> > Code starting with the faulting instruction
> > ===========================================
> > 000002d8c205ef20: a7180000            lhi     %r1,0
> > #000002d8c205ef24: 582083ac            l       %r2,940(%r8)
> > >000002d8c205ef28: ba12a000            cs      %r1,%r2,0(%r10)
> > 000002d8c205ef2c: a77400cf            brc     7,000002d8c205f0ca
> > 000002d8c205ef30: 5800b078            l       %r0,120(%r11)
> > 000002d8c205ef34: a7010002            tmll    %r0,2
> > 000002d8c205ef38: a77400d4            brc     7,000002d8c205f0e0
> > [   14.271766] Call Trace:
> > [   14.271769] worker_thread (./arch/s390/include/asm/atomic_ops.h:198 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
> > [   14.271774] worker_thread (./arch/s390/include/asm/lowcore.h:226 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
> > [   14.271777] kthread (kernel/kthread.c:389)
> > [   14.271781] __ret_from_fork (arch/s390/kernel/process.c:62)
> > [   14.271784] ret_from_fork (arch/s390/kernel/entry.S:309)
> > [   14.271806] Last Breaking-Event-Address:
> > [   14.271807] mutex_unlock (kernel/locking/mutex.c:549)
> >
> > So it seems to me that `worker->pool` is NULL in the
> > `workqueue.c:worker_thread` function and this leads to the crash.
> >
>
> I'm not familiar with s390 asm code, but it might be the case that
> `worker->pool` is NULL in the in worker_thread() since detach_worker()
> resets worker->pool to NULL.
>
> If it is the case, READ_ONCE(worker->pool) should be used in worker_thread()
> to fix the problem.
>
> (It is weird to me if worker->pool is read multi-time in worker_thread()
> since it is used many times, but since READ_ONCE() is not used, it can
> be possible).

Oh, it can be possible that the worker is created and then destroyed before
being waken-up. And if it is the case, READ_ONCE() won't help. I'm going to
explore if "worker->pool = NULL;" can be moved out from detach_worker().

Thanks
Lai