lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cgba5xn.fsf@oracle.com>
Date: Thu, 02 May 2024 21:13:40 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-pm@...r.kernel.org,
        kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, catalin.marinas@....com, will@...nel.org,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
        hpa@...or.com, pbonzini@...hat.com, wanpengli@...cent.com,
        vkuznets@...hat.com, rafael@...nel.org, daniel.lezcano@...aro.org,
        peterz@...radead.org, arnd@...db.de, lenb@...nel.org,
        mark.rutland@....com, harisokn@...zon.com, joao.m.martins@...cle.com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 1/9] cpuidle: rename ARCH_HAS_CPU_RELAX to
 ARCH_HAS_OPTIMIZED_POLL


Christoph Lameter (Ampere) <cl@...two.org> writes:

> On Tue, 30 Apr 2024, Ankur Arora wrote:
>
>> ARCH_HAS_CPU_RELAX is a bit of a misnomer since all architectures
>> define cpu_relax(). Not all, however, have a performant version, with
>> some only implementing it as a compiler barrier.
>>
>> In contexts that this config option is used, it is expected to provide
>> an architectural primitive that can be used as part of a polling
>> mechanism -- one that would be cheaper than spinning in a tight loop.
>
> The intend of cpu_relax() is not a polling mechanism. Initial AFAICT it was
> introduced on x86 as the REP NOP instruction. Aka as PAUSE. And it was part of a
> spin loop. So there was no connection to polling anything.

Agreed, cpu_relax() is just a mechanism to tell the pipeline that
we are in a spin-loop.

> The intend was to make the processor aware that we are in a spin loop. Various
> processors have different actions that they take upon encountering such a cpu
> relax operation.

Sure, though most processors don't have a nice mechanism to do that.
x86 clearly has the REP; NOP thing. arm64 only has a YIELD which from my
measurements is basically a NOP when executed on a system without
hardware threads.

And that's why only x86 defines ARCH_HAS_CPU_RELAX.

> The polling (WFE/WFI) available on ARM (and potentially other platforms) is a
> different mechanism that is actually intended to reduce the power requirement of
> the processor until a certain condition is met and that check is done in
> hardware.

Sure. Which almost exactly fits the bill for the poll-idle loop -- except for the
timeout part.

> These are not the same and I think we need both config options.

My main concern is that poll_idle() conflates polling in idle with
ARCH_HAS_CPU_RELAX, when they aren't really related.

So, poll_idle(), and its users should depend on ARCH_HAS_OPTIMIZED_POLL
which, if defined by some architecture, means that poll_idle() would
be better than a spin-wait loop.

Beyond that I'm okay to keep ARCH_HAS_CPU_RELAX around.

That said, do you see a use for ARCH_HAS_CPU_RELAX? The only current
user is the poll-idle path.

> The issues that you have with WFET later in the patchset arise from not making
> this distinction.

Did you mean the issue with WFE? I'm not using WFET in this patchset at all.

With WFE, sure there's a problem in that you depend on an interrupt or
the event-stream to get out of the wait. And, so sometimes you would
overshoot the target poll timeout.

> The polling (waiting for an event) could be implemented for a processor not
> supporting that in hardware by using a loop that checks for the condition and
> then does a cpu_relax().

Yeah. That's exactly what patch-6 does. smp_cond_load_relaxed() uses
cpu_relax() internally in its spin-loop variant (non arm64).

On arm64, this would use LDXR; WFE. Or are you suggesting implementing
the arm64 loop via cpu_relax() (and thus YIELD?)

Ankur

> With that you could f.e. support the existing cpu_relax() and also have some
> form of cpu_poll() interface.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ