lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YpMKY88/2tTK319E@geday>
Date:   Sun, 29 May 2022 02:53:39 -0300
From:   Geraldo Nascimento <geraldogabriel@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] workqueue: missing NOT while checking if Workqueue is
 offline

On Sat, May 28, 2022 at 07:24:41PM -1000, Tejun Heo wrote:
> On Sun, May 29, 2022 at 01:29:32AM -0300, Geraldo Nascimento wrote:
> > I would like very much to hear the opinion of the maintainers!
> 
> I have a hard time understanding what you're trying to do. Can you please
> slow down and start from describing the problem itself?

Hi Tejun,

Sorry for the hurry.

The problem is best described in https://gitlab.freedesktop.org/drm/amd/-/issues/1898

>From my understanding from the context of __cancel_work_timer() we should not
ever call __flush_work() but I may be wrong. In the present case as
described in AMD's GitLab __cancel_work_timer() is being called by
cancel_delayed_work_sync() inside kfd_process_notifier_release()
from drivers/gpu/drm/amd/amdkfd/kfd_process.c:1157 (Linux 5.18).

We should only call __flush_work() from __cancel_work_timer() if
workqueue_init() is not yet initialized, that's possible during
early boot though not very likely. Anyway that's before kthreads are
spwaned, so we are sure that particular work isn't executing, hence
why it's safe to call __flush_work() in this particular case.
The comment on kernel/workqueue.c:3157 (for Linux 5.18) says it best:	

	/*
	 * This allows canceling during early boot.  We know that @work
	 * isn't executing.
	 */
	 	if (wq_online)
		__flush_work(work, true);

If __flush_work() is ever called during early boot it will result in a
WARN_ON because workqueue is not online. I have no idea if that's OK
though it hasn't harmed my machine. Of course I don't want to introduce
bugs, I wanna solve them, and I appreciate your cautious approach. Thank
you for the work.

What is not OK apparently is trying to use amdkfd without HSA_AMD_SVM configured! :)

Thank you,
Geraldo Nascimento

> 
> Thanks.
> 
> -- 
> tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ