linux-kernel - Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e4203c4f-9c3d-6e33-06e8-052676cc5af1@I-love.SAKURA.ne.jp>
Date:   Sun, 20 Mar 2022 15:06:30 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global
 workqueues

On 2022/03/20 2:15, Linus Torvalds wrote:
> On Fri, Mar 18, 2022 at 9:43 PM Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
>>
>> Since flush operation synchronously waits for completion, flushing
>> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
>> due to unexpected locking dependency. Tejun Heo commented that it makes no
>> sense at all to call flush_workqueue() on the shared WQs as the caller has
>> no idea what it's gonna end up waiting for.
> 
> NAK on this patch for a very simple reason:
> 
>   static inline void flush_scheduled_work(void)
>   {
>         flush_workqueue(system_wq);
>   }
> 
> and now grep for flush_scheduled_work().

I know there are in-tree flush_scheduled_work() users.

> 
> The *other* system workqueue flushes may be rare and "that subsystem
> should just be converted to do its own workqueue".
> 
> But flush_scheduled_work() is literally exported as an explicit and
> separate interface,
> 
> The fact that the function has a big warning in the comment above it
> doesn't change that fact. At the very least, this patch has to be
> preceded by a couple of other patches that fix a couple of subsystems
> and document "this is what you should do".

This change was in Tejun's todo list
( https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org ).
But scrubbing the existing users will need some time.

This patch is intended for

  (a) preventing developers from adding more flush_workqueue() calls on
      kernel-global WQs.

  (b) serving as an anchor to be referenced from individual patches

.

> 
> Because suddenly saying "hey, we gave you this interface, now we're
> warning about it because it's going to go away" without actually
> showing how to do it instead is not acceptable.

This patch avoids emitting "WARNING:" in order not to disturb kernel testing.
This change is not as urgent as security fix. We can wait for several release
cycles until all in-tree users update their modules. I don't want to send
blind conversion patches, for what the proper fix is depends on what that
module is doing. For example, commit 081bdc9fe05bb232 ("RDMA/ib_srp: Fix a
deadlock") chose just removing flush_workqueue(system_long_wq) call. This
change is expected to be carefully handled by each module's maintainers.

> 
> And honestly, I don't personally see a good conversion. We literally
> have all those "schedule_{delayed_}work{_on}()" etc helper functions
> that are *designed* to use this system_wq. People *need* the ability
> to flush those things, even if it's only for module unload.

Users of all those "schedule_{delayed_}work{_on}()" etc helper functions
need to be updated only if flush_workqueue() is needed. And even if
flush_workqueue() happens only upon module unload, there is possibility
of deadlock.

> 
> So I really think this patch on its own is completely broken. It'd not
> pointing to a way forward, it's just saying "don't do this" with no
> really acceptable way to not do it.

If you want how to convert, I can include it into patch description.

> 
> Removing flush_scheduled_work() needs to be paired with removing the
> "schedule_{delayed_}work{_on}()" helpers too.

No need to remove the "schedule_{delayed_}work{_on}()" helpers.
Those who don't need flush_workqueue() can continue using these helpers.

Those who need flush_workqueue() can convert

  schedule_work(some_work);

into

  queue_work(some_workqueue_for_that_module, some_work);

.

> 
> And I don't see you having a good alternative.

What alternative are you expecting? We already have alternatives.
This change (replacing system_wq with module's local workqueue as
an example) is a matter of doing

  (1) add

        some_workqueue_for_that_module = alloc_workqueue("some_name", 0, 0);

      into module's __init function.

  (2) add

        destroy_workqueue(some_workqueue_for_that_module);

      into module's __exit function.

  (3) replace

        schedule_work(some_work);

      with

        queue_work(some_workqueue_for_that_module, some_work);

      throughout that module.

  (4) replace

        flush_scheduled_work();

      with

        flush_workqueue(some_workqueue_for_that_module);

      throughout that module.

if flush_scheduled_work() cannot be avoided.

> 
> So until that clear way forward exists, NAK.

We know that allocating a !WQ_MEM_RECLAIM workqueue consumes little
resource ( https://lore.kernel.org/all/YhUq70Y%2FtKGGNbR0@slm.duckdns.org ).
This is a step by step conversion if flush_workqueue() is unavoidable.

Again, we already have alternatives. Why NAK?

> 
>              Linus