linux-kernel - Re: [RESEND PATCH v7 2/7] arm64: barrier: Support smp_cond_load_relaxed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikfqesr2.fsf@oracle.com>
Date: Mon, 03 Nov 2025 13:00:33 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Arnd Bergmann <arnd@...db.de>
Cc: Catalin Marinas <catalin.marinas@....com>,
        Ankur Arora
 <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
        Linux-Arch
 <linux-arch@...r.kernel.org>,
        linux-arm-kernel@...ts.infradead.org, linux-pm@...r.kernel.org,
        bpf@...r.kernel.org, Will Deacon
 <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton
 <akpm@...ux-foundation.org>,
        Mark Rutland <mark.rutland@....com>,
        Haris
 Okanovic <harisokn@...zon.com>,
        "Christoph Lameter (Ampere)"
 <cl@...two.org>,
        Alexei Starovoitov <ast@...nel.org>,
        "Rafael J . Wysocki"
 <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Kumar
 Kartikeya Dwivedi <memxor@...il.com>, zhenglifeng1@...wei.com,
        xueshuai@...ux.alibaba.com, Joao Martins <joao.m.martins@...cle.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Konrad Rzeszutek Wilk
 <konrad.wilk@...cle.com>
Subject: Re: [RESEND PATCH v7 2/7] arm64: barrier: Support
 smp_cond_load_relaxed_timeout()


Arnd Bergmann <arnd@...db.de> writes:

> On Tue, Oct 28, 2025, at 22:17, Catalin Marinas wrote:
>> On Tue, Oct 28, 2025 at 11:01:22AM -0700, Ankur Arora wrote:
>>> Arnd Bergmann <arnd@...db.de> writes:
>>> > On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:
>>> >> +
>>> >
>>> > Since the caller knows exactly how long it wants to wait for,
>>> > we should be able to fit a 'wfet' based primitive in here and
>>> > pass the timeout as another argument.
>>>
>>> Per se, I don't disagree with this when it comes to WFET.
>>>
>>> Handling a timeout, however, is messier when we use other mechanisms.
>>>
>>> Some problems that came up in my earlier discussions with Catalin:
>>>
>>>   - when using WFE, we also need some notion of slack
>>>     - and if a caller specifies only a small or no slack, then we need
>>>       to combine WFE+cpu_relax()
>
> I don't see the difference to what you have: with the event stream,
> you implicitly define a slack to be the programmed event stream rate
> of ~100µs.

True. The thinking was that an adding an explicit timeout just begs the
question of how closely the interface adheres to the timeout and I guess
the final interface tried to sidestep all of that.

> I'm not asking for anything better in this case, only for machines
> with WFET but no event stream to also avoid the spin loop.

That makes sense. It's a good point that the WFET+event-stream-off case
would just end up using the spin lock which is quite suboptimal.

>>>   - for platforms that only use a polling primitive, we want to check
>>>     the clock only intermittently for power reasons.
>
> Right, I missed that bit.
>
>>>     Now, this could be done with an architecture specific spin-count.
>>>     However, if the caller specifies a small slack, then we might need
>>>     to we check the clock more often as we get closer to the deadline etc.
>
> Again, I think this is solved by defining the slack as architecture
> specific as well rather than an explicit argument, which is essentially
> what we already have.

Great. I think that means that I can keep more or less the same interface
with an explicit time_end. Which allows WFET to do the right thing.
And, WFE can have an architecture specific slack (event-stream period).

>>> A smaller problem was that different users want different clocks and so
>>> folding the timeout in a 'timeout_cond_expr' lets us do away with the
>>> interface having to handle any of that.
>>>
>>> I had earlier versions [v2] [v3] which had rather elaborate policies for
>>> handling timeout, slack etc. But, given that the current users of the
>>> interface don't actually care about precision, all of that seemed
>>> a little overengineered.
>>
>> Indeed, we've been through all these options and without a concrete user
>> that needs a more precise timeout, we decided it's not worth it. It can,
>> however, be improved later if such users appear.
>
> The main worry I have is that we get too many users of cpu_poll_relax()
> hardcoding the use of the event stream without a timeout argument, it
> becomes too hard to change later without introducing regressions
> from the behavior change.

True.

> As far as I can tell, the only place that currently uses the
> event stream on a functional level is the delay() loop, and that
> has a working wfet based version.

Will send out the next version with an interface on the following lines:

    /**
    * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
    * guarantees until a timeout expires.
    * @ptr: pointer to the variable to wait on
    * @cond: boolean expression to wait for
    * @time_expr: time expression in caller's preferred clock
    * @time_end: end time in nanosecond (compared against time_expr;
    * might also be used for setting up a future event.)
    *
    * Equivalent to using READ_ONCE() on the condition variable.
    *
    * Note that the expiration of the timeout might have an architecture specific
    * delay.
    */
    #ifndef smp_cond_load_relaxed_timeout
    #define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, time_end_ns)	\
    ({									\
            typeof(ptr) __PTR = (ptr);					\
            __unqual_scalar_typeof(*ptr) VAL;				\
            u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT;		\
            u64 __time_end_ns = (time_end_ns);				\
                                                                        \
            for (;;) {							\
                    VAL = READ_ONCE(*__PTR);				\
                    if (cond_expr)					\
                            break;					\
                    cpu_poll_relax(__PTR, VAL, __time_end_ns);		\
                    if (++__n < __spin)				\
                            continue;					\
                    if ((time_expr) >= __time_end_ns) {		\
                            VAL = READ_ONCE(*__PTR);			\
                            break;					\
                    }							\
                    __n = 0;						\
            }								\
            (typeof(*ptr))VAL;						\
    })
    #endif

That allows for a __cmpwait_timeout() as you had outlined and similar to
these two patches:

 https://lore.kernel.org/lkml/20241107190818.522639-15-ankur.a.arora@oracle.com/
 https://lore.kernel.org/lkml/20241107190818.522639-16-ankur.a.arora@oracle.com/
 (this one incorporating some changes that Catalin had suggested:
  https://lore.kernel.org/lkml/aKRTRyQAaWFtRvDv@arm.com/)

--
ankur