[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YpMPPyIZVlBwUrNe@slm.duckdns.org>
Date: Sat, 28 May 2022 20:14:23 -1000
From: Tejun Heo <tj@...nel.org>
To: Geraldo Nascimento <geraldogabriel@...il.com>
Cc: Lai Jiangshan <jiangshanlai@...il.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] workqueue: missing NOT while checking if Workqueue is
offline
On Sun, May 29, 2022 at 02:53:39AM -0300, Geraldo Nascimento wrote:
> On Sat, May 28, 2022 at 07:24:41PM -1000, Tejun Heo wrote:
> > On Sun, May 29, 2022 at 01:29:32AM -0300, Geraldo Nascimento wrote:
> > > I would like very much to hear the opinion of the maintainers!
> >
> > I have a hard time understanding what you're trying to do. Can you please
> > slow down and start from describing the problem itself?
>
> Hi Tejun,
>
> Sorry for the hurry.
>
> The problem is best described in https://gitlab.freedesktop.org/drm/amd/-/issues/1898
>
> From my understanding from the context of __cancel_work_timer() we should not
> ever call __flush_work() but I may be wrong. In the present case as
Yeah, you're wrong.
> described in AMD's GitLab __cancel_work_timer() is being called by
> cancel_delayed_work_sync() inside kfd_process_notifier_release()
> from drivers/gpu/drm/amd/amdkfd/kfd_process.c:1157 (Linux 5.18).
Have you confirmed that that actually is the warning which is triggering? I
don't see how that condition would trigger that late during the boot and the
warning line being reported doesn't match v5.16 source code, so I'm not sure
but skimming the instructon sequence, that's the second UD2 sequence, so I'm
gonna guess that's the second WARN_ON - the !work->func one and someone else
on the gitlab bug report seems to agree too.
It's usually a lot more helpful if the bug report is complete - include the
full warning message with some context at least, make sure that the kernel
you're using is an upstream one or something close enough. If not, point to
the source tree. Also, try to clearly distinguish what you know and what you
suspect. Both can help but mixing them up together tends to cause confusion
for everyone involved.
It just looks like the code is trying to cancel a work item which hasn't
been initialized and what it prolly needs is an ifdef around that cancel
call depending on the config option.
Thanks.
--
tejun
Powered by blists - more mailing lists