lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YpL2rHUXd0vf8IML@geday>
Date:   Sun, 29 May 2022 01:29:32 -0300
From:   Geraldo Nascimento <geraldogabriel@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] workqueue: missing NOT while checking if Workqueue is
 offline

On Sat, May 28, 2022 at 10:25:55PM -0300, Geraldo Nascimento wrote:
> On Sat, May 28, 2022 at 05:07:08PM -0300, Geraldo Nascimento wrote:
> > Greetings,
> > 
> 
> Hi, again,
> 

And again, Hi!

Doing my due dilligence, it seems.

> > This is a one-character patch but very important as the kernel workqueue
> > __cancel_work_timer will cancel active work

It won't cancel important work, seems it's just a WARN_ON but it's very
annoying. My understanding was the NOT was needed to call __flush_work()
before kthreads are spawned. During early boot, as the comment says, before
workqueue_init() fires.

There's a bug report at https://gitlab.freedesktop.org/drm/amd/-/issues/1898
and Felix Kuehling is right that this bug is only triggered when you try
to use amdkfd ( and the Kconfigs that implies) without HSA_AMD_SVM
configured.

It makes sense to me that NOT operator is missing however, since in the warning
I was coming from _cancel_work_timer() to __flush_work(), something that
should not be done?

I would like very much to hear the opinion of the maintainers!

Thanks,
Geraldo Nascimento

>> without the NOT operator
> > added.
> > 
> > During early boot wq_online is false so with the NOT added it will evaluate
> > to true. Conversely, after boot is done, workqueue
> 
> I meant wq_online. After boot, wq_online will evaluate to true, current
> code might as well have an if (true) there. I hurried up the patch
> because if I'm right this is a major show stopper to drivers that make
> use of cancel_work_timer(). I hit it through amdgpu in conjuction with amdkfd.
> 
> > is now true and we want
> > it to evaluate to false because otherwise it will cancel important work.
> > 
> > Signed-off-by: Geraldo Nascimento <geraldogabriel@...il.com>
> > 
> > --- workqueue.c	2022-05-28 16:54:12.024176123 -0300
> > +++ workqueue.c	2022-05-28 16:54:37.698176135 -0300
> > @@ -3158,7 +3158,7 @@ static bool __cancel_work_timer(struct w
> >  	 * This allows canceling during early boot.  We know that @work
> >  	 * isn't executing.
> >  	 */
> > -	if (wq_online)
> > +	if (!wq_online)
> >  		__flush_work(work, true);
> >  
> >  	clear_work_data(work);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ