linux-kernel - Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yn08TdfdsFjKx+qw@slm.duckdns.org>
Date:   Thu, 12 May 2022 06:56:45 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Dmitry Torokhov <dmitry.torokhov@...il.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global
 workqueues

Hello, Dmitry.

On Thu, May 12, 2022 at 06:13:35AM -0700, Dmitry Torokhov wrote:
> > > This means that now the code has to keep track of all work items that it
> > > allocated, instead of being able "fire and forget" works (when dealing
> > > with extremely infrequent events) and rely on flush_workqueue() to
> > > cleanup.
> > 
> > Yes. Moreover, a patch to catch and refuse at compile time was proposed at
> > https://lkml.kernel.org/r/738afe71-2983-05d5-f0fc-d94efbdf7634@I-love.SAKURA.ne.jp .
> 
> My comment was not a wholesale endorsement of Tejun's statement, but
> rather a note of the fact that it again adds complexity (at least as far
> as driver writers are concerned) to the kernel code.

I was more thinking about cases where there are a small number of static
work items. If there are multiple dynamic work items, creating a workqueue
as a flush domain is the way to go. It does add a bit of complexity but
shouldn't be too bad - e.g. it just adds the alloc_workqueue() during init
and the exit path can simply use destroy_workqueue() instead of flush.

> > >          That flush typically happens in module unload path, and I
> > > wonder if the restriction on flush_workqueue() could be relaxed to allow
> > > calling it on unload.
> > 
> > A patch for drivers/input/mouse/psmouse-smbus.c is waiting for your response at
> > https://lkml.kernel.org/r/25e2b787-cb2c-fb0d-d62c-6577ad1cd9df@I-love.SAKURA.ne.jp .
> > Like many modules, flush_workqueue() happens on only module unload in your case.
> 
> Yes, I saw that patch, and that is what prompted my response. I find it
> adding complexity and I was wondering if it could be avoided. It also
> unclear to me if there is an additional cost coming from allocating a
> dedicated workqueue.

A workqueue without WQ_RECLAIM is really cheap. All it does is tracking
what's in flight for that particular frontend while interfacing with the
shared worker pool.

> I understand that for some of them the change makes sense, but it would
> be nice to continue using simple API under limited circumstances.

Hmmm... unfortunately, I can't think of a way to guarantee that a module
unloading path can't get involved in a deadlock scenario through system_wq.
Given that the added complexity should be something like half a dozen lines
of code, switching to separte workqueues feels like the right direction to
me.

Thanks.

-- 
tejun