[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87cy47t0ua.fsf@oracle.com>
Date: Sun, 21 Dec 2025 23:47:09 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-pm@...r.kernel.org,
bpf@...r.kernel.org, arnd@...db.de, catalin.marinas@....com,
will@...nel.org, peterz@...radead.org, akpm@...ux-foundation.org,
mark.rutland@....com, harisokn@...zon.com, cl@...two.org,
ast@...nel.org, rafael@...nel.org, daniel.lezcano@...aro.org,
memxor@...il.com, zhenglifeng1@...wei.com, xueshuai@...ux.alibaba.com,
joao.m.martins@...cle.com, boris.ostrovsky@...cle.com,
konrad.wilk@...cle.com,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [PATCH v8 01/12] asm-generic: barrier: Add
smp_cond_load_relaxed_timeout()
Ankur Arora <ankur.a.arora@...cle.com> writes:
> Add smp_cond_load_relaxed_timeout(), which extends
> smp_cond_load_relaxed() to allow waiting for a duration.
>
> We loop around waiting for the condition variable to change while
> peridically doing a time-check. The loop uses cpu_poll_relax()
> to slow down the busy-waiting, which, unless overridden by the
> architecture code, amounts to a cpu_relax().
>
> Note that there are two ways for the time-check to fail: once we have
> timed out or, if the time_expr_ns returns an invalid value (negative
> or zero).
>
> The number of times we spin before doing the time-check is specified
> by SMP_TIMEOUT_POLL_COUNT (chosen to be 200 by default) which,
> assuming each cpu_poll_relax() iteration takes ~20-30 cycles (measured
> on a variety of x86 platforms), for a total of ~4000-6000 cycles.
>
> That is also the outer limit of the overshoot when working with the
> parameters above. This might be higher or lesser depending on the
> implementation of cpu_poll_relax() across architectures.
>
> Cc: Arnd Bergmann <arnd@...db.de>
> Cc: Will Deacon <will@...nel.org>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: linux-arch@...r.kernel.org
> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> ---
>
> Notes:
>
> - the interface now breaks the time_check_expr into two parts:
> time_expr_ns (evaluates to current time) and remaining_ns. The main
> reason for doing this was to support WFET and similar primitives
> which can do timed waiting.
>
> - cpu_poll_relax() now takes an additional paramater to handle that.
>
> - time_expr_ns can now return failure which needs a little more change
> in organization. This was needed because rqspinlock check_timeout()
> logic mapped naturally to the unified check in time_check_expr.
> Breaking up the time_check_expr, however needed check_timeout() to
> separate a clock interface (which could fail on deadlock or its
> internal timeout check) and the timeout duration.
>
> - given the changes in logic I've removed Catalin and Haris' R-by and
> Tested-by.
>
> include/asm-generic/barrier.h | 58 +++++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index d4f581c1e21d..e25592f9fcbf 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -273,6 +273,64 @@ do { \
> })
> #endif
>
> +/*
> + * Number of times we iterate in the loop before doing the time check.
> + */
> +#ifndef SMP_TIMEOUT_POLL_COUNT
> +#define SMP_TIMEOUT_POLL_COUNT 200
> +#endif
> +
> +#ifndef cpu_poll_relax
> +#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
> +#endif
> +
> +/**
> + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
> + * guarantees until a timeout expires.
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
> + * on failure, returns a negative value.
> + * @timeout_ns: timeout value in ns
> + * (Both of the above are assumed to be compatible with s64.)
> + *
> + * Equivalent to using READ_ONCE() on the condition variable.
> + */
> +#ifndef smp_cond_load_relaxed_timeout
> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
> + time_expr_ns, timeout_ns) \
> +({ \
> + __label__ __out, __done; \
> + typeof(ptr) __PTR = (ptr); \
> + __unqual_scalar_typeof(*ptr) VAL; \
> + u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
> + s64 __time_now = (s64)(time_expr_ns); \
> + s64 __timeout = (s64)timeout_ns; \
> + s64 __time_end = __time_now + __timeout; \
> + \
> + if (__time_now <= 0) \
> + goto __out; \
> + \
> + for (;;) { \
> + VAL = READ_ONCE(*__PTR); \
> + if (cond_expr) \
> + goto __done; \
> + cpu_poll_relax(__PTR, VAL, __timeout); \
> + if (++__n < __spin) \
> + continue; \
> + __time_now = (s64)(time_expr_ns); \
> + __timeout = __time_end - __time_now; \
> + if (__time_now <= 0 || __timeout <= 0) \
> + goto __out; \
> + __n = 0; \
> + } \
> +__out: \
> + VAL = READ_ONCE(*__PTR); \
> +__done: \
> + (typeof(*ptr))VAL; \
> +})
> +#endif
> +
> /*
> * pmem_wmb() ensures that all stores for which the modification
> * are written to persistent storage by preceding instructions have
Alexei Starovoitov pointed out in his review comment on patch-10 that
evaluating time_expr_ns on entry means that this is unusable in the
fastpath of something like lock acquisition (as rqspinlock does).
Something like the following should work better (it also gets rid of the
gotos.) The only negative is that in the slowpath this would have a
larger overshoot but I think that's a reasonable tradeoff.
I'll be sending out the next version with this changed but just wanted to
gauge any preliminary opinions on this.
Thanks
Ankur
---
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d4f581c1e21d..278eb51cf354 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -273,6 +273,73 @@ do { \
})
#endif
+/*
+ * Number of times we iterate in the loop before doing the time check.
+ */
+#ifndef SMP_TIMEOUT_POLL_COUNT
+#define SMP_TIMEOUT_POLL_COUNT 200
+#endif
+
+#ifndef cpu_poll_relax
+#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
+#endif
+
+/**
+ * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
+ * guarantees until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
+ * on failure, returns a negative value.
+ * @timeout_ns: timeout value in ns
+ * (Both of the above are assumed to be compatible with s64.)
+ *
+ * Equivalent to using READ_ONCE() on the condition variable.
+ *
+ * Note on timeout overshoot: in the fastpath, we want to keep this as close
+ * to performance as smp_cond_load_relaxed(). To do that we want to defer
+ * the evaluation of the potentially expensive @time_expr_ns to the
+ * earliest time that is needed.
+ *
+ * This means that there will always be some hardware dependent duration
+ * that has already passed inside cpu_poll_relax() at the time of first
+ * evaluation. In addition, cpu_poll_relax() is not guaranteed to return
+ * at timeout boundary.
+ *
+ * In sum, expect timeout overshoot when we return due to expiration of
+ * the timeout.
+ */
+#ifndef smp_cond_load_relaxed_timeout
+#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
+ time_expr_ns, timeout_ns) \
+({ \
+ typeof(ptr) __PTR = (ptr); \
+ __unqual_scalar_typeof(*ptr) VAL; \
+ u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
+ s64 __timeout = (s64)timeout_ns; \
+ s64 __time_now, __time_end = 0; \
+ \
+ for (;;) { \
+ VAL = READ_ONCE(*__PTR); \
+ if (cond_expr) \
+ break; \
+ cpu_poll_relax(__PTR, VAL, __timeout); \
+ if (++__n < __spin) \
+ continue; \
+ __time_now = (s64)(time_expr_ns); \
+ if (unlikely(__time_end == 0)) \
+ __time_end = __time_now + __timeout; \
+ __timeout = __time_end - __time_now; \
+ if (__time_now <= 0 || __timeout <= 0) { \
+ VAL = READ_ONCE(*__PTR); \
+ break; \
+ } \
+ __n = 0; \
+ } \
+ (typeof(*ptr))VAL; \
+})
+#endif
+
/*
* pmem_wmb() ensures that all stores for which the modification
* are written to persistent storage by preceding instructions have
Powered by blists - more mailing lists