[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAO_YeogikhpZjg4Nhcdd0AKjRFCtZ4ohvVN5Y9DZgqmNiP8FRg@mail.gmail.com>
Date: Thu, 26 Jun 2025 19:30:23 +0100
From: Dylan <dyudaken@...il.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: paulmck@...nel.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
shuah@...nel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 1/2] membarrier: allow cpu_id to be set on more commands
On Thu, Jun 26, 2025 at 5:07 PM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 2025-06-26 11:52, Dylan Yudaken wrote:
> > No reason to not allow MEMBARRIER_CMD_FLAG_CPU on
> > MEMBARRIER_CMD_PRIVATE_EXPEDITED or
> > MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.
> >
> > If it is known specifically what cpu you want to interrupt then there
> > is a decent efficiency saving in not interrupting all the other ones.
> >
> > Also - the code already works as is for them.
>
> Can you elaborate on a concrete use-case justifying adding this ?
>
> Thanks,
>
> Mathieu
>
So my use case is for core-local data such as performance counters.
I have a library that allows a fast thread to "lock" a core -> do
some work (probably incrementing some performance counters) -> unlock.
The "lock" uses restartable sequences (ie no serializing
instructions), and the unlock just writes a 0 to memory (again, no
serializing instructions).
A slow thread will occasionally (say every few minutes) try and read
data computed in the work section.
It does this by disabling locking and firing off a membarrier(RSEQ) on
that core to be sure that the core is either "locked" or "unlocked".
It then spins waiting for it to be unlocked.
At this point my understanding is a bit fuzzy - but I believe you need
that core to have a memory barrier since there is no serializing
instruction and the processor would happily reorder some "work" after
the "unlock" instruction.
That serializing instruction is what I want from this. But since I
know the cpu_id that I am working with I don't need to do a barrier on
_all_ the cores.
To be clear: (1) I don't have a current real world use case, and (2)
my library/design/understanding might be buggy.
(3) I don't have a use case for the SYNC_CORE part, but again it
seemed easy enough to add and I presume others might have a use case.
Powered by blists - more mailing lists