[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com>
Date: Wed, 20 Apr 2022 13:03:12 -0400
From: Dan Lustig <dlustig@...dia.com>
To: Guo Ren <guoren@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>,
Andrea Parri <parri.andrea@...il.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Arnd Bergmann <arnd@...db.de>,
Palmer Dabbelt <palmer@...belt.com>,
Mark Rutland <mark.rutland@....com>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
linux-arch <linux-arch@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-riscv <linux-riscv@...ts.infradead.org>,
Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH V2 0/3] riscv: atomic: Optimize AMO instructions usage
On 4/20/2022 1:33 AM, Guo Ren wrote:
> Thx Dan,
>
> On Wed, Apr 20, 2022 at 1:12 AM Dan Lustig <dlustig@...dia.com> wrote:
>>
>> On 4/17/2022 12:51 AM, Guo Ren wrote:
>>> Hi Boqun & Andrea,
>>>
>>> On Sun, Apr 17, 2022 at 10:26 AM Boqun Feng <boqun.feng@...il.com> wrote:
>>>>
>>>> On Sun, Apr 17, 2022 at 12:49:44AM +0800, Guo Ren wrote:
>>>> [...]
>>>>>
>>>>> If both the aq and rl bits are set, the atomic memory operation is
>>>>> sequentially consistent and cannot be observed to happen before any
>>>>> earlier memory operations or after any later memory operations in the
>>>>> same RISC-V hart and to the same address domain.
>>>>> "0: lr.w %[p], %[c]\n"
>>>>> " sub %[rc], %[p], %[o]\n"
>>>>> " bltz %[rc], 1f\n".
>>>>> - " sc.w.rl %[rc], %[rc], %[c]\n"
>>>>> + " sc.w.aqrl %[rc], %[rc], %[c]\n"
>>>>> " bnez %[rc], 0b\n"
>>>>> - " fence rw, rw\n"
>>>>> "1:\n"
>>>>> So .rl + fence rw, rw is over constraints, only using sc.w.aqrl is more proper.
>>>>>
>>>>
>>>> Can .aqrl order memory accesses before and after it (not against itself,
>>>> against each other), i.e. act as a full memory barrier? For example, can
>>> From the RVWMO spec description, the .aqrl annotation appends the same
>>> effect with "fence rw, rw" to the AMO instruction, so it's RCsc.
>>>
>>> Not only .aqrl, and I think the below also could be an RCsc when
>>> sc.w.aq is executed:
>>> A: Pre-Access
>>> B: lr.w.rl ADDR-0
>>> ...
>>> C: sc.w.aq ADDR-0
>>> D: Post-Acess
>>> Because sc.w.aq has overlap address & data dependency on lr.w.rl, the
>>> global memory order should be A->B->C->D when sc.w.aq is executed. For
>>> the amoswap
>>
>> These opcodes aren't actually meaningful, unfortunately.
>>
>> Quoting the ISA manual chapter 10.2: "Software should not set the rl bit
>> on an LR instruction unless the aq bit is also set, nor should software
>> set the aq bit on an SC instruction unless the rl bit is also set."
> 1. Oh, I've missed the behind half of the ISA manual. But why can't we
> utilize lr.rl & sc.aq in software programming to guarantee the
> sequence?
lr.aq and sc.rl map more naturally to hardware than lr.rl and sc.aq.
Plus, they just aren't common operations to begin with, e.g., there
is no smp_store_acquire() or smp_load_release(), nor are there
equivalents in C/C++ atomics.
> 2. Using .aqrl to replace the fence rw, rw is okay to ISA manual,
> right? And reducing a fence instruction to gain better performance:
> "0: lr.w %[p], %[c]\n"
> " sub %[rc], %[p], %[o]\n"
> " bltz %[rc], 1f\n".
> - " sc.w.rl %[rc], %[rc], %[c]\n"
> + " sc.w.aqrl %[rc], %[rc], %[c]\n"
> " bnez %[rc], 0b\n"
> - " fence rw, rw\n"
Yes, using .aqrl is valid.
Dan
>>
>> Dan
>>
>>> The purpose of the whole patchset is to reduce the usage of
>>> independent fence rw, rw instructions, and maximize the usage of the
>>> .aq/.rl/.aqrl aonntation of RISC-V.
>>>
>>> __asm__ __volatile__ ( \
>>> "0: lr.w %0, %2\n" \
>>> " bne %0, %z3, 1f\n" \
>>> " sc.w.rl %1, %z4, %2\n" \
>>> " bnez %1, 0b\n" \
>>> " fence rw, rw\n" \
>>> "1:\n" \
>>>
>>>> we end up with u == 1, v == 1, r1 on P0 is 0 and r1 on P1 is 0, for the
>>>> following litmus test?
>>>>
>>>> C lr-sc-aqrl-pair-vs-full-barrier
>>>>
>>>> {}
>>>>
>>>> P0(int *x, int *y, atomic_t *u)
>>>> {
>>>> int r0;
>>>> int r1;
>>>>
>>>> WRITE_ONCE(*x, 1);
>>>> r0 = atomic_cmpxchg(u, 0, 1);
>>>> r1 = READ_ONCE(*y);
>>>> }
>>>>
>>>> P1(int *x, int *y, atomic_t *v)
>>>> {
>>>> int r0;
>>>> int r1;
>>>>
>>>> WRITE_ONCE(*y, 1);
>>>> r0 = atomic_cmpxchg(v, 0, 1);
>>>> r1 = READ_ONCE(*x);
>>>> }
>>>>
>>>> exists (u=1 /\ v=1 /\ 0:r1=0 /\ 1:r1=0)
>>> I think my patchset won't affect the above sequence guarantee. Current
>>> RISC-V implementation only gives RCsc when the original value is the
>>> same at least once. So I prefer RISC-V cmpxchg should be:
>>>
>>>
>>> - "0: lr.w %0, %2\n" \
>>> + "0: lr.w.rl %0, %2\n" \
>>> " bne %0, %z3, 1f\n" \
>>> " sc.w.rl %1, %z4, %2\n" \
>>> " bnez %1, 0b\n" \
>>> - " fence rw, rw\n" \
>>> "1:\n" \
>>> + " fence w, rw\n" \
>>>
>>> To give an unconditional RSsc for atomic_cmpxchg.
>>>
>>>>
>>>> Regards,
>>>> Boqun
>>>
>>>
>>>
>
>
>
Powered by blists - more mailing lists