linux-kernel - Re: [PATCH V2 0/3] riscv: atomic: Optimize AMO instructions usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAJF2gTSmVa9iKzojvrrQghkzBZao4eQwKBTKQfSPFqSR667bTg@mail.gmail.com>
Date:   Sun, 24 Apr 2022 16:33:53 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Andrea Parri <parri.andrea@...il.com>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Arnd Bergmann <arnd@...db.de>,
        Palmer Dabbelt <palmer@...belt.com>,
        Mark Rutland <mark.rutland@....com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH V2 0/3] riscv: atomic: Optimize AMO instructions usage

On Tue, Apr 19, 2022 at 7:41 AM Andrea Parri <parri.andrea@...il.com> wrote:
>
> > > Seems to me that you are basically reverting 5ce6c1f3535f
> > > ("riscv/atomic: Strengthen implementations with fences"). That commit
> > > fixed an memory ordering issue, could you explain why the issue no
> > > longer needs a fix?
> >
> > I'm not reverting the prior patch, just optimizing it.
> >
> > In RISC-V “A” Standard Extension for Atomic Instructions spec, it said:
>
> With reference to the RISC-V herd specification at:
>
>   https://github.com/riscv/riscv-isa-manual.git
>
> the issue, better, lr-sc-aqrl-pair-vs-full-barrier seems to _no longer_
> need a fix since commit:
                        "0:     lr.w %0, %2\n"                          \
                        "       bne  %0, %z3, 1f\n"                     \
                        "       sc.w.rl %1, %z4, %2\n"                  \
                        "       bnez %1, 0b\n"                          \
                        "       fence rw, rw\n"                         \
Above is the current implementation, and the logic is in conflict. If
we want full-barrier, we should implement like below:
                        fence rw, w
                        "0:     lr.w %0, %2\n"                          \
                        "       bne  %0, %z3, 1f\n"                     \
                        "       sc.w %1, %z4, %2\n"                  \
                        "       bnez %1, 0b\n"                          \
                        "       fence rw, rw\n"                         \
Above we could let lr.w & sc.w executed fastest. If we think .aq/.rl
won't affect forward guarantee, we should implement like below:
                        "0:     lr.w %0, %2\n"                          \
                        "       bne  %0, %z3, 1f\n"                     \
                        "       sc.w.aqrl %1, %z4, %2\n"                  \
                        "       bnez %1, 0b\n"                          \

Using .aqrl is better than sc.w.rl + fence rw, rw, because lr/sc.rl
pair forward guarantee is the same with lr/sw.aqrl and only sc.rl part
would affect the speed of lr/sc speed. Second, it could reduce one
fence rw, rw overhead. So for riscv, we needn't put a full-barrier
after sc like arm64 and use .aqrl instead.

>
>   03a5e722fc0f ("Updates to the memory consistency model spec")
>
> (here a template, to double check:
>
>   https://github.com/litmus-tests/litmus-tests-riscv/blob/master/tests/non-mixed-size/HAND/LR-SC-NOT-FENCE.litmus )
>
> I defer to Daniel/others for a "bi-section" of the prose specification.
> ;-)
>
>   Andrea



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/