[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250902155634.enTifVKX@linutronix.de>
Date: Tue, 2 Sep 2025 17:56:34 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-rt-devel@...ts.linux.dev, linux-kernel@...r.kernel.org,
Clark Williams <clrkwllms@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>, Tejun Heo <tj@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 1/3] workqueue: Provide a handshake for canceling BH
workers
On 2025-09-02 22:19:26 [+0800], Lai Jiangshan wrote:
> Hello, Sebastian
Hi Lai,
> On Tue, Sep 2, 2025 at 7:17 PM Sebastian Andrzej Siewior
> <bigeasy@...utronix.de> wrote:
> > > Is it possible to use rt_mutex_init_proxy_locked(), rt_mutex_proxy_unlock()
> > > and rt_mutex_wait_proxy_lock()?
> > >
> > > Or is it possible to add something like rt_spinlock_init_proxy_locked(),
> > > rt_spinlock_proxy_unlock() and rt_spinlock_wait_proxy_lock() which work
> > > the same as the rt_mutex's proxy lock primitives but for non-sleep context?
> >
> > I don't think so. I think non-sleep context is the killer part. Those
> > are for PI and this works by assigning waiter's priority, going to sleep
> > until "it" is done. Now if you want non-sleep then you would have to
> > remain on the CPU and spin until the "work" is done. This spinning would
> > work if the other task is on a remote CPU. But if both are on the same
> > CPU then spinning is not working.
> >
>
> I meant to say that the supposed rt_spinlock_wait_proxy_lock() would
> work similarly to the rt_mutex proxy lock, which would wait until the
> boosted task (in this case, the kthread running the BH work) calls
> rt_spinlock_proxy_unlock(). It would also behave like the PREEMPT_RT
> version of spin_lock, where the task blocked on a spin_lock has a
> special style of blocked/sleep instead of spinning on the CPU and this
> is what the prefix "rt_spinlock" means.
That interface is actually implementing that boosting for users which
don't use it directly. By "it" I mean the proper rtmutex.
This is used by the PI/ futex code where a rtmutex is created as a
substitute for the lock that is held by the user in userland. The lock
is acquired in userland without kernel's doing. So in case of contention
the user goes to kernel and creates a rtmutex as a representation of the
userland lock in the kernel and assign it to the task that is holding
the userland lock. Now you can block on the lock and userland tasks is
forced to go to the kernel for unlocking.
For RCU, as far as I remember, a task within an RCU can get preempted
and in that case it adds itself to a list during schedule() so it can be
identified later on. There can be more than one task which is preempted
within a RCU section and so block a GP. The boost mutex is assigned to
the first task currently blocking the GP and then next one if needed.
A poor substitute would be something like taking a lock during
schedule() and having a list of all those locks in case boosting is
needed so it be acquired one by one.
> By the way, I’m not objecting to this patch — I just want to explore
> whether there might be other options.
Right. So you would avoid taking the cb_lock in bh_worker(). Instead you
would need to assign the "acquired" lock to the bh_work() task in
__flush_work() and then block on that lock in __flush_work(). In order
to figure out which task it is, you need some bookkeeping for it. And a
lock to synchronise adding/ removing tasks on that list (bookkeeping) as
well as accessing the lock itself in case of "contention".
So given all this, that approach looks slightly more complicated. You
would avoid the need to acquire the lock in bh_worker() but you would
also substitute it with bookkeeping and its locking elsewhere. So I am
not sure it is worth it.
On !RT you can have only one running softirq at a time. On RT with the
removal of the lock in local_bh_disable() (patch #3) there can be
multiple softirqs instances in parallel on the same CPU. The primary
goal is avoid center bottleneck and make it possible to have one NIC
doing throughput and another NIC doing low latency packets and allowing
the low latency NIC preempt the throughput NIC in the middle of sending/
receiving packets instead of waiting for NAPI doing a handover.
The lock I'm adding here adds some synchronisation here. So you see how
this requirement for the three legacy users makes it slightly more
complicated especially after the cleanup years ago…
However I hope now to come up with a atomic API as Tejun suggested and
push it behind Kconfig bars or so.
> Thanks
> Lai
Sebastian
Powered by blists - more mailing lists