[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfm9z812.fsf@oracle.com>
Date: Fri, 08 Nov 2024 14:15:53 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-pm@...r.kernel.org,
kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
catalin.marinas@....com, will@...nel.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, pbonzini@...hat.com,
vkuznets@...hat.com, rafael@...nel.org, daniel.lezcano@...aro.org,
peterz@...radead.org, arnd@...db.de, lenb@...nel.org,
mark.rutland@....com, harisokn@...zon.com, mtosatti@...hat.com,
sudeep.holla@....com, maz@...nel.org, misono.tomohiro@...itsu.com,
maobibo@...ngson.cn, zhenglifeng1@...wei.com,
joao.m.martins@...cle.com, boris.ostrovsky@...cle.com,
konrad.wilk@...cle.com
Subject: Re: [PATCH v9 01/15] asm-generic: add barrier
smp_cond_load_relaxed_timeout()
Christoph Lameter (Ampere) <cl@...two.org> writes:
> On Thu, 7 Nov 2024, Ankur Arora wrote:
>
>> > Calling the clock retrieval function repeatedly should be fine and is
>> > typically done in user space as well as in kernel space for functions that
>> > need to wait short time periods.
>>
>> The problem is that you might have multiple CPUs polling in idle
>> for prolonged periods of time. And, so you want to minimize
>> your power/thermal envelope.
>
> On ARM that maps to YIELD which does not do anything for the power
> envelope AFAICT. It switches to the other hyperthread.
Agreed. For arm64 patch-5 adds a specialized version.
For the fallback case when we don't have an event stream, the
arm64 version does use the same cpu_relax() loop but that's
not a production thing.
>> For instance see commit 4dc2375c1a4e "cpuidle: poll_state: Avoid
>> invoking local_clock() too often" which originally added a similar
>> rate limit to poll_idle() where they saw exactly that issue.
>
> Looping w/o calling local_clock may increase the wait period etc.
Yeah. I don't think that's a real problem for the poll_idle()
case as the only thing waiting on the other side of the possibly
delayed timer is a deeper idle state.
But, for any other potential users the looping duration might be
too long (the generated code for x86 will execute around 200 * 7
instructions before checking the timer, so a worst case delay of
say around 1-2us.)
I'll note that in the comment around smp_cond_time_check_count
just to warn any future users.
> For power saving most arches have special instructions like ARMS
> WFE/WFET. These are then causing more accurate wait times than the looping
> thing?
Definitely true for WFET. The WFE can still overshoot because the
eventstream has a period of 100us.
--
ankur
Powered by blists - more mailing lists