lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKRTRyQAaWFtRvDv@arm.com>
Date: Tue, 19 Aug 2025 11:34:47 +0100
From: Catalin Marinas <catalin.marinas@....com>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, bpf@...r.kernel.org,
	arnd@...db.de, will@...nel.org, peterz@...radead.org,
	akpm@...ux-foundation.org, mark.rutland@....com,
	harisokn@...zon.com, cl@...two.org, ast@...nel.org,
	memxor@...il.com, zhenglifeng1@...wei.com,
	xueshuai@...ux.alibaba.com, joao.m.martins@...cle.com,
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
	rafael@...nel.org, daniel.lezcano@...aro.org
Subject: Re: [PATCH v3 1/5] asm-generic: barrier: Add
 smp_cond_load_relaxed_timewait()

On Mon, Aug 18, 2025 at 12:15:29PM -0700, Ankur Arora wrote:
> Catalin Marinas <catalin.marinas@....com> writes:
> > On Sun, Aug 17, 2025 at 03:14:26PM -0700, Ankur Arora wrote:
> >> __cmpwait_relaxed() will need adjustment to set a deadline for WFET.
> >
> > Yeah, __cmpwait_relaxed() doesn't use WFET as it doesn't need a timeout
> > (it just happens to have one with the event stream).
> >
> > We could extend this or create a new one that uses WFET and takes an
> > argument. If extending this one, for example a timeout argument of 0
> > means WFE, non-zero means WFET cycles. This adds a couple of more
> > instructions.
> 
> Though then we would need an ALTERNATIVE for WFET to fallback to WFE where
> not available. This is a minor point, but how about just always using
> WFE or WFET appropriately instead of choosing between the two based on
> etime.
> 
>   static inline void __cmpwait_case_##sz(volatile void *ptr,              \
>                                   unsigned long val,                      \
>                                   unsigned long etime)                    \
>                                                                           \
>           unsigned long tmp;                                              \
>                                                                           \
>           const unsigned long ecycles = xloops_to_cycles(nsecs_to_xloops(etime)); \
>           asm volatile(                                                   \
>           "       sevl\ n"                                                \
>           "       wfe\ n"                                                 \
>           "       ldxr" #sfx "\ t%" #w "[tmp], %[v]\n"                    \
>           "       eor     %" #w "[tmp], %" #w "[tmp], %" #w "[val]\ n"    \
>           "       cbnz    %" #w "[tmp], 1f\ n"                            \
>           ALTERNATIVE("wfe\ n",                                           \
>                   "msr s0_3_c1_c0_0, %[ecycles]\ n",                      \
>                   ARM64_HAS_WFXT)                                         \
>           "1:"                                                            \
>           : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr)                   \
>           : [val] "r" (val), [ecycles] "r" (ecycles));                    \
>   }
> 
> This would cause us to compute the end time unnecessarily for WFE but,
> given that nothing will use the output of that computation, wouldn't
> WFE be able to execute before the result of that computation is available?
> (Though I guess WFE is somewhat special, so the usual rules might not
> apply.)

The compiler cannot tell what's happening inside the asm block, so it
will compute ecycles, place it in a register before the asm. The
hardware won't do anything smarter like skip the computation because the
register holding ecycles is not going to be used (or it is going to be
re-written later). So I wouldn't want to penalise the existing
smp_cond_load_acquire() which only needs a WFE.

We could patch WFET in and always pass -1UL in the non-timeout case but
I think we are better off just duplicating the whole thing. It's going
to be inlined anyway, so it's not like we end up with lots of these
functions.

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ