[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ad9b8e41-5897-4ad6-8113-2812301c0c93@app.fastmail.com>
Date: Thu, 10 Aug 2023 21:13:20 +0200
From: "Arnd Bergmann" <arnd@...db.de>
To: "Palmer Dabbelt" <palmer@...osinc.com>,
"Leonardo Bras" <leobras@...hat.com>
Cc: "Will Deacon" <will@...nel.org>,
"Peter Zijlstra" <peterz@...radead.org>,
"Boqun Feng" <boqun.feng@...il.com>,
"Mark Rutland" <mark.rutland@....com>,
"Paul Walmsley" <paul.walmsley@...ive.com>,
"Albert Ou" <aou@...s.berkeley.edu>,
"Andrea Parri" <parri.andrea@...il.com>,
"Andrzej Hajda" <andrzej.hajda@...el.com>,
guoren <guoren@...nel.org>, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1
and 2
On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@...hat.com wrote:
>> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
>>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
>>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
>>> > riscv, even though its present in other architectures such as arm64 and
>>> > x86. This could lead to not being able to implement some locking mechanisms
>>> > or requiring some rework to make it work properly.
>>> >
>>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
>>> > architectures.
>>
>>> Parity with other architectures by itself is not a reason to do this,
>>> in particular the other architectures you listed have the instructions
>>> in hardware while riscv does not.
>>
>> Sure, I understand RISC-V don't have native support for xchg on variables of
>> size < 4B. My argument is that it's nice to have even an emulated version for
>> this in case any future mechanism wants to use it.
>>
>> Not having it may mean we won't be able to enable given mechanism in RISC-V.
>
> IIUC the ask is to have a user within the kernel for these functions.
> That's the general thing to do, and last time this came up there was no
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> it yet because we're worried about the performance/fairness stuff that
> other ports have seen and nobody's got concrete benchmarks yet (though
> there's another patch set out that I haven't had time to look through,
> so that may have changed).
Right. In particular the qspinlock is a good example for something
where having the emulated 16-bit xchg() may end up less efficient
than a natively supported instruction.
The xchg() here is a performance optimization for CPUs that can
do this without touching the other half of the 32-bit word.
>>
>> Didn't get this part:
>> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
>> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
>>
>> If so, yeah, it's a fair point: in some extreme case we could have multiple
>> threads accessing given cacheline and have sc always failing. On the other hand,
>> there are 2 arguments on that:
>>
>> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
>> also seem to rely in this mechanism for every xchg size. Another archs like csky
>> and loongarch use asm that look like mine to handle size < 4B xchg.
I think you misread the arm64 code, which should use native instructions
for all sizes, in both the armv8.0 and LSE atomics.
PowerPC does use the masking for xchg, but I suspect there are no
actual users, at least it actually has its own qspinlock implementation
that avoids xchg().
>>> This is also something that almost no architecture
>>> specific code relies on (generic qspinlock being a notable exception).
>>>
>>
>> 2 - As you mentioned, there should be very little code that will actually make
>> use of xchg for vars < 4B, so it should be safe to assume its fine to not
>> guarantee forward progress for those rare usages (like some of above mentioned
>> archs).
I don't this this is a safe assumption, we've had endless discussions
about using qspinlock on architectures without a native xchg(), which
needs either hardware guarantees or special countermeasures in xchg() itself
to avoid this.
What I'd actually like to do here is to remove the special 8-bit and
16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
only fixed 32-bit and native wordsize (either 32 or 64) as the option,
while dealing with the others the same way we treat the fixed
64-bit cases that hardcode the 64-bit argument types and are only
usable on architectures that provide them.
Arnd
Powered by blists - more mailing lists