linux-kernel - Re: [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87a512eqvj.fsf@oracle.com>
Date: Mon, 03 Nov 2025 13:41:04 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Arnd Bergmann <arnd@...db.de>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
        Linux-Arch <linux-arch@...r.kernel.org>,
        linux-arm-kernel@...ts.infradead.org, linux-pm@...r.kernel.org,
        bpf@...r.kernel.org, Catalin Marinas <catalin.marinas@....com>,
        Will
 Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
        Andrew
 Morton <akpm@...ux-foundation.org>,
        Mark Rutland <mark.rutland@....com>,
        Haris Okanovic <harisokn@...zon.com>,
        "Christoph Lameter (Ampere)"
 <cl@...two.org>,
        Alexei Starovoitov <ast@...nel.org>,
        "Rafael J . Wysocki"
 <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Kumar
 Kartikeya Dwivedi <memxor@...il.com>, zhenglifeng1@...wei.com,
        xueshuai@...ux.alibaba.com, Joao Martins <joao.m.martins@...cle.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Konrad Rzeszutek Wilk
 <konrad.wilk@...cle.com>
Subject: Re: [RESEND PATCH v7 1/7] asm-generic: barrier: Add
 smp_cond_load_relaxed_timeout()


Arnd Bergmann <arnd@...db.de> writes:

> On Wed, Oct 29, 2025, at 04:17, Ankur Arora wrote:
>> Arnd Bergmann <arnd@...db.de> writes:
>>> On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:
>>> The FEAT_WFXT version would then look something like
>>>
>>> static inline void __cmpwait_u64_timeout(volatile u64 *ptr, unsigned long val, __u64 ns)
>>> {
>>>    unsigned long tmp;
>>>    asm volatile ("sev; wfe; ldxr; eor; cbnz; wfet; 1:"
>>>         : "=&r" (tmp), "+Q" (*ptr)
>>>         : "r" (val), "r" (ns));
>>> }
>>> #define cpu_poll_relax_timeout_wfet(__PTR, VAL, TIMECHECK) \
>>> ({                                                    \
>>>        u64 __t = TIMECHECK;
>>>        if (__t)
>>>             __cmpwait_u64_timeout(__PTR, VAL, __t);
>>> })
>>>
>>> while the 'wfe' version would continue to do the timecheck after the
>>> wait.
>>
>> I think this is a good way to do it if we need the precision
>> at some point in the future.
>
> I'm sorry I wrote this in a confusing way, but I really don't
> care about precision here, but for power savings reasons I
> would like to use a wfe/wfet based wait here (like you do)
> while at the same time being able to turn off the event
> stream entirely as long as FEAR_WFXT is available, saving
> additional power.
>
>>> I have two lesser concerns with the generic definition here:
>>>
>>> - having both a timeout and a spin counter in the same loop
>>>   feels redundant and error-prone, as the behavior in practice
>>>   would likely depend a lot on the platform. What is the reason
>>>   for keeping the counter if we already have a fixed timeout
>>>   condition?
>>
>> The main reason was that the time check is expensive in power terms.
>> Which is fine for platforms with a WFE like primitive but others
>> want to do the time check only infrequently. That's why poll_idle()
>> introduced a rate limit on polling (which the generic definition
>> reused here.)
>
> Right, I misunderstood how this works, so this part is fine.
>
>>> - I generally dislike the type-agnostic macros like this one,
>>>   it adds a lot of extra complexity here that I feel can be
>>>   completely avoided if we make explicitly 32-bit and 64-bit
>>>   wide versions of these macros. We probably won't be able
>>>   to resolve this as part of your series, but ideally I'd like
>>>   have explicitly-typed versions of cmpxchg(), smp_load_acquire()
>>>   and all the related ones, the same way we do for atomic_*()
>>>   and atomic64_*().
>>
>> Ah. And the caller uses say smp_load_acquire_long() or whatever, and
>> that resolves to whatever makes sense for the arch.
>>
>> The __unqual_scalar_typeof() does look pretty ugly when looking at the
>> preprocesed version but other than that smp_cond_load() etc look
>> pretty straight forward. Just for my curiousity could you elaborate on
>> the complexity?
>
> The nested macros with __unqual_scalar_typeof() make the
> preprocessed version completely unreadable, especially when
> combined with other sets of complex macros like min()/max(),
> tracepoints or arm64 atomics.
>
> If something goes wrong inside of it, it becomes rather
> hard for people to debug or even read the compiler warnings.
> Since I do a lot of build testing, I do tend to run into those
> more than others do.
>
> I've also seen cases where excessively complex macros get
> nested to the point of slowing down the kernel build, which
> can happen once the nesting expands to megabytes of source
> code.

Thanks for the detail. Yeah, looking at the preprocessed code (as I had
to do when working on this) it all really feels quite excessive.
Combined with other complex macro constructs ...

--
ankur