[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYxx6cq6he6jTIZI@tpad>
Date: Wed, 11 Feb 2026 09:11:21 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Michal Hocko <mhocko@...e.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-mm@...ck.org, Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...ux.com>, Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Vlastimil Babka <vbabka@...e.cz>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Leonardo Bras <leobras@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
Frederic Weisbecker <fweisbecker@...e.de>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
On Wed, Feb 11, 2026 at 09:01:12AM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 10, 2026 at 03:01:10PM +0100, Michal Hocko wrote:
> > On Fri 06-02-26 11:34:30, Marcelo Tosatti wrote:
> > > The problem:
> > > Some places in the kernel implement a parallel programming strategy
> > > consisting on local_locks() for most of the work, and some rare remote
> > > operations are scheduled on target cpu. This keeps cache bouncing low since
> > > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > > kernels, even though the very few remote operations will be expensive due
> > > to scheduling overhead.
> > >
> > > On the other hand, for RT workloads this can represent a problem: getting
> > > an important workload scheduled out to deal with remote requests is
> > > sure to introduce unexpected deadline misses.
> > >
> > > The idea:
> > > Currently with PREEMPT_RT=y, local_locks() become per-cpu spinlocks.
> > > In this case, instead of scheduling work on a remote cpu, it should
> > > be safe to grab that remote cpu's per-cpu spinlock and run the required
> > > work locally. That major cost, which is un/locking in every local function,
> > > already happens in PREEMPT_RT.
> > >
> > > Also, there is no need to worry about extra cache bouncing:
> > > The cacheline invalidation already happens due to schedule_work_on().
> > >
> > > This will avoid schedule_work_on(), and thus avoid scheduling-out an
> > > RT workload.
> > >
> > > Proposed solution:
> > > A new interface called Queue PerCPU Work (QPW), which should replace
> > > Work Queue in the above mentioned use case.
> > >
> > > If PREEMPT_RT=n this interfaces just wraps the current
> > > local_locks + WorkQueue behavior, so no expected change in runtime.
> > >
> > > If PREEMPT_RT=y, or CONFIG_QPW=y, queue_percpu_work_on(cpu,...) will
> > > lock that cpu's per-cpu structure and perform work on it locally.
> > > This is possible because on functions that can be used for performing
> > > remote work on remote per-cpu structures, the local_lock (which is already
> > > a this_cpu spinlock()), will be replaced by a qpw_spinlock(), which
> > > is able to get the per_cpu spinlock() for the cpu passed as parameter.
> >
> > What about !PREEMPT_RT? We have people running isolated workloads and
> > these sorts of pcp disruptions are really unwelcome as well. They do not
> > have requirements as strong as RT workloads but the underlying
> > fundamental problem is the same. Frederic (now CCed) is working on
> > moving those pcp book keeping activities to be executed to the return to
> > the userspace which should be taking care of both RT and non-RT
> > configurations AFAICS.
>
> Michal,
>
> For !PREEMPT_RT, _if_ you select CONFIG_QPW=y, then there is a kernel
> boot option qpw=y/n, which controls whether the behaviour will be
> similar (the spinlock is taken on local_lock, similar to PREEMPT_RT).
>
> If CONFIG_QPW=n, or kernel boot option qpw=n, then only local_lock
> (and remote work via work_queue) is used.
OK, this is not true. There is only CONFIG_QPW and the qpw=yes/no kernel
boot option for control.
CONFIG_PREEMPT_RT should probably select CONFIG_QPW=y and
CONFIG_QPW_DEFAULT=y.
> What "pcp book keeping activities" you refer to ? I don't see how
> moving certain activities that happen under SLUB or LRU spinlocks
> to happen before return to userspace changes things related
> to avoidance of CPU interruption ?
>
> Thanks
>
Powered by blists - more mailing lists