lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <84914748-4bf9-44a5-9e08-80528ca27177@app.fastmail.com>
Date: Sun, 02 Nov 2025 22:52:08 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Ankur Arora" <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, Linux-Arch <linux-arch@...r.kernel.org>,
 linux-arm-kernel@...ts.infradead.org, linux-pm@...r.kernel.org,
 bpf@...r.kernel.org, "Catalin Marinas" <catalin.marinas@....com>,
 "Will Deacon" <will@...nel.org>, "Peter Zijlstra" <peterz@...radead.org>,
 "Andrew Morton" <akpm@...ux-foundation.org>,
 "Mark Rutland" <mark.rutland@....com>,
 "Haris Okanovic" <harisokn@...zon.com>,
 "Christoph Lameter (Ampere)" <cl@...two.org>,
 "Alexei Starovoitov" <ast@...nel.org>,
 "Rafael J . Wysocki" <rafael@...nel.org>,
 "Daniel Lezcano" <daniel.lezcano@...aro.org>,
 "Kumar Kartikeya Dwivedi" <memxor@...il.com>, zhenglifeng1@...wei.com,
 xueshuai@...ux.alibaba.com, "Joao Martins" <joao.m.martins@...cle.com>,
 "Boris Ostrovsky" <boris.ostrovsky@...cle.com>,
 "Konrad Rzeszutek Wilk" <konrad.wilk@...cle.com>
Subject: Re: [RESEND PATCH v7 1/7] asm-generic: barrier: Add
 smp_cond_load_relaxed_timeout()

On Wed, Oct 29, 2025, at 04:17, Ankur Arora wrote:
> Arnd Bergmann <arnd@...db.de> writes:
>> On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:
>> The FEAT_WFXT version would then look something like
>>
>> static inline void __cmpwait_u64_timeout(volatile u64 *ptr, unsigned long val, __u64 ns)
>> {
>>    unsigned long tmp;
>>    asm volatile ("sev; wfe; ldxr; eor; cbnz; wfet; 1:"
>>         : "=&r" (tmp), "+Q" (*ptr)
>>         : "r" (val), "r" (ns));
>> }
>> #define cpu_poll_relax_timeout_wfet(__PTR, VAL, TIMECHECK) \
>> ({                                                    \
>>        u64 __t = TIMECHECK;
>>        if (__t)
>>             __cmpwait_u64_timeout(__PTR, VAL, __t);
>> })
>>
>> while the 'wfe' version would continue to do the timecheck after the
>> wait.
>
> I think this is a good way to do it if we need the precision
> at some point in the future.

I'm sorry I wrote this in a confusing way, but I really don't
care about precision here, but for power savings reasons I
would like to use a wfe/wfet based wait here (like you do)
while at the same time being able to turn off the event
stream entirely as long as FEAR_WFXT is available, saving
additional power.

>> I have two lesser concerns with the generic definition here:
>>
>> - having both a timeout and a spin counter in the same loop
>>   feels redundant and error-prone, as the behavior in practice
>>   would likely depend a lot on the platform. What is the reason
>>   for keeping the counter if we already have a fixed timeout
>>   condition?
>
> The main reason was that the time check is expensive in power terms.
> Which is fine for platforms with a WFE like primitive but others
> want to do the time check only infrequently. That's why poll_idle()
> introduced a rate limit on polling (which the generic definition
> reused here.)

Right, I misunderstood how this works, so this part is fine.

>> - I generally dislike the type-agnostic macros like this one,
>>   it adds a lot of extra complexity here that I feel can be
>>   completely avoided if we make explicitly 32-bit and 64-bit
>>   wide versions of these macros. We probably won't be able
>>   to resolve this as part of your series, but ideally I'd like
>>   have explicitly-typed versions of cmpxchg(), smp_load_acquire()
>>   and all the related ones, the same way we do for atomic_*()
>>   and atomic64_*().
>
> Ah. And the caller uses say smp_load_acquire_long() or whatever, and
> that resolves to whatever makes sense for the arch.
>
> The __unqual_scalar_typeof() does look pretty ugly when looking at the
> preprocesed version but other than that smp_cond_load() etc look
> pretty straight forward. Just for my curiousity could you elaborate on
> the complexity?

The nested macros with __unqual_scalar_typeof() make the
preprocessed version completely unreadable, especially when
combined with other sets of complex macros like min()/max(),
tracepoints or arm64 atomics.

If something goes wrong inside of it, it becomes rather
hard for people to debug or even read the compiler warnings.
Since I do a lot of build testing, I do tend to run into those
more than others do.

I've also seen cases where excessively complex macros get
nested to the point of slowing down the kernel build, which
can happen once the nesting expands to megabytes of source
code.

     Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ