linux-kernel - Re: [PATCH V2 3/3] riscv: xchg: Prefetch the destination word for sc.w

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZa96DEQhYnnGi51@LeoBras>
Date: Thu,  4 Jan 2024 11:17:12 -0300
From: Leonardo Bras <leobras@...hat.com>
To: Guo Ren <guoren@...nel.org>
Cc: Leonardo Bras <leobras@...hat.com>,
	Andrew Jones <ajones@...tanamicro.com>,
	paul.walmsley@...ive.com,
	palmer@...belt.com,
	panqinglin2020@...as.ac.cn,
	bjorn@...osinc.com,
	conor.dooley@...rochip.com,
	peterz@...radead.org,
	keescook@...omium.org,
	wuwei2016@...as.ac.cn,
	xiaoguang.xing@...hgo.com,
	chao.wei@...hgo.com,
	unicorn_wang@...look.com,
	uwu@...nowy.me,
	jszhang@...nel.org,
	wefu@...hat.com,
	atishp@...shpatra.org,
	linux-riscv@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH V2 3/3] riscv: xchg: Prefetch the destination word for sc.w

On Thu, Jan 04, 2024 at 04:14:27PM +0800, Guo Ren wrote:
> On Thu, Jan 4, 2024 at 11:56 AM Leonardo Bras <leobras@...hat.com> wrote:
> >
> > On Thu, Jan 04, 2024 at 09:24:40AM +0800, Guo Ren wrote:
> > > On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@...hat.com> wrote:
> > > >
> > > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote:
> > > > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@...tanamicro.com> wrote:
> > > > > >
> > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@...nel.org wrote:
> > > > > > > From: Guo Ren <guoren@...ux.alibaba.com>
> > > > > > >
> > > > > > > The cost of changing a cacheline from shared to exclusive state can be
> > > > > > > significant, especially when this is triggered by an exclusive store,
> > > > > > > since it may result in having to retry the transaction.
> > > > > > >
> > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write
> > > > > > > prior to lr/sc loops when using the xchg_small atomic routine.
> > > > > > >
> > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics:
> > > > > > > prefetch the destination word for write prior to stxr").
> > > > > > >
> > > > > > > Signed-off-by: Guo Ren <guoren@...ux.alibaba.com>
> > > > > > > Signed-off-by: Guo Ren <guoren@...nel.org>
> > > > > > > ---
> > > > > > >  arch/riscv/include/asm/cmpxchg.h | 4 +++-
> > > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > > > > > index 26cea2395aae..d7b9d7951f08 100644
> > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > > > > @@ -10,6 +10,7 @@
> > > > > > >
> > > > > > >  #include <asm/barrier.h>
> > > > > > >  #include <asm/fence.h>
> > > > > > > +#include <asm/processor.h>
> > > > > > >
> > > > > > >  #define __arch_xchg_masked(prepend, append, r, p, n)                 \
> > > > > >
> > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro.
> > > > > Oh, it is based on Leobras' patches. I would remove it in the next of version.
> > > >
> > > > I would say this next :)
> > > Thx for the grammar correction.
> >
> > Oh, I was not intending to correct grammar.
> > I just meant the next thing I would mention is that it was based on top of
> > my patchset instead of v6.7-rc7:
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >  ({                                                                   \
> > > > > > > @@ -23,6 +24,7 @@
> > > > > > >                                                                       \
> > > > > > >       __asm__ __volatile__ (                                          \
> > > > > > >              prepend                                                  \
> > > > > > > +            PREFETCHW_ASM(%5)                                        \
> > > > > > >              "0:      lr.w %0, %2\n"                                  \
> > > > > > >              "        and  %1, %0, %z4\n"                             \
> > > > > > >              "        or   %1, %1, %z3\n"                             \
> > > > > > > @@ -30,7 +32,7 @@
> > > > > > >              "        bnez %1, 0b\n"                                  \
> > > > > > >              append                                                   \
> > > > > > >              : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))       \
> > > > > > > -            : "rJ" (__newx), "rJ" (~__mask)                          \
> > > > > > > +            : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b)         \
> > > > > >
> > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b.
> > > > > >
> > > > > > >              : "memory");                                             \
> > > > > > >                                                                       \
> > > > > > >       r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> > > > > > > --
> > > > > > > 2.40.1
> > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > drew
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >  Guo Ren
> > > > >
> > > >
> > > > Nice patch :)
> > > > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ?
> > > CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or
> > > they have a problem with the hardware.
> >
> > Sorry, I was unable to fully understand the reason here.
> >
> > You suggest that the PREFETCH.W was inserted on xchg_masked because it will
> > always switch the variable (no compare, blind CAS), but not on cmpxchg.
> >
> > Is this because cmpxchg will depend on a compare, and thus it does not
> > garantee a write? so it would be unwise to always prefetch cacheline
> Yes, it has a comparison, so a store may not exist there.
> 
> > exclusiveness for this cpu, where shared state would be enough.
> > Is that correct?
> Yes, exclusiveness would invalidate other harts' cache lines.

I see.

I recall a previous discussion on computer arch which stated that any LR 
would require to get a cacheline in exclusive state for lr/sc to work, but
I went through the RISC-V lr/sc documentation and could not find any info 
about its cacheline behavior. 

If this stands correct, the PREFETCH.W could be useful before every lr, 
right? 
(maybe that's the case for arm64 that you mentioned before)

Thanks!
Leo

> 
> >
> > Thanks!
> > Leo
> >
> >
> > >
> > > >
> > > > Thanks!
> > > > Leo
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >  Guo Ren
> > >
> >
> 
> 
> -- 
> Best Regards
>  Guo Ren
>