linux-kernel - Re: [PATCH 16/17] workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZdUFEGm9PYfmdQYX@slm.duckdns.org>
Date: Tue, 20 Feb 2024 10:01:20 -1000
From: Tejun Heo <tj@...nel.org>
To: Lai Jiangshan <jiangshanlai@...il.com>
Cc: torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
	allen.lkml@...il.com, kernel-team@...a.com
Subject: Re: [PATCH 16/17] workqueue: Allow cancel_work_sync() and
 disable_work() from atomic contexts on BH work items

Hello,

On Tue, Feb 20, 2024 at 03:33:34PM +0800, Lai Jiangshan wrote:
> Hello, Tejun
> 
> On Sat, Feb 17, 2024 at 2:06 AM Tejun Heo <tj@...nel.org> wrote:
> 
> > @@ -4072,7 +4070,32 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
> >         if (!pool)
> >                 return false;
> >
> > -       wait_for_completion(&barr.done);
> > +       if ((pool->flags & POOL_BH) && from_cancel) {
> 
> pool pointer might be invalid here, please check POOL_BH before
> rcu_read_unlock()

Right, it had a local variable caching the test result from inside the
rcu_read_lock() section and I removed it by mistake while splitting patches.
Will fix.

> > +               /*
> > +                * We're flushing a BH work item which is being canceled. It
> > +                * must have been executing during start_flush_work() and can't
> > +                * currently be queued. If @work is still executing, we know it
> > +                * is running in the BH context and thus can be busy-waited.
> > +                *
> > +                * On RT, prevent a live lock when current preempted soft
> > +                * interrupt processing or prevents ksoftirqd from running by
> > +                * keeping flipping BH. If the tasklet runs on a different CPU
> > +                * then this has no effect other than doing the BH
> > +                * disable/enable dance for nothing. This is copied from
> > +                * kernel/softirq.c::tasklet_unlock_spin_wait().
> > +                */
> 
> s/tasklet/BH work/g

Updated.

> Although the comment is copied from kernel/softirq.c, but I can't
> envision what the scenario is when the current task
> "prevents ksoftirqd from running by keeping flipping BH"

Yeah, that sentence is not the easiest to parse. The following parentheses
might be helpful:

 prevent a live lock (when current (preempted soft interrupt processing) or
 (prevents ksoftirqd from running)) by keeping flipping BH.

> since the @work is still executing or the tasklet is running.

eb2dafbba8b8 ("tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT")
is the commit which added the flipping to tasklet_unlock_spin_wait(). My
understanding of RT isn't great but it sounds like BH execution can be
preempted by someone else who does the busy wait which would be sad. IIUC,
it looks like flipping BH off/on makes the busy waiting one yield to BH
processing.

> > @@ -4179,6 +4203,11 @@ static bool __cancel_work_sync(struct work_struct *work, u32 cflags)
> >
> >         ret = __cancel_work(work, cflags | WORK_CANCEL_DISABLE);
> >
> > +       if (*work_data_bits(work) & WORK_OFFQ_BH)
> > +               WARN_ON_ONCE(in_hardirq());
> 
> When !PREEMPT_RT, this check is sufficient.
> 
> But when PREEMP_RT, it should be only in the contexts that allow
> local_bh_disable() for synching a BH work, although I'm not sure
> what check code is proper.
> 
> In PREEMPT_RT, local_bh_disable() is disallowed in not only hardirq
> context but also !preemptible() context (I'm not sure about it).
>
> __local_bh_disable_ip() (PREEMP_RT version) doesn't contain
> full check except for "WARN_ON_ONCE(in_hardirq())" either.
> 
> Since the check is just for debugging, I'm OK with the current check.

We should have enough test coverage with !RT kernels. If you can think of a
better notation for RT, please be my guest.

Thanks.

-- 
tejun