[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h69amjng.fsf@oracle.com>
Date: Thu, 17 Oct 2024 15:47:31 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: "Okanovic, Haris" <harisokn@...zon.com>,
"ankur.a.arora@...cle.com"
<ankur.a.arora@...cle.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"rafael@...nel.org" <rafael@...nel.org>,
"sudeep.holla@....com"
<sudeep.holla@....com>,
"joao.m.martins@...cle.com"
<joao.m.martins@...cle.com>,
"dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>,
"konrad.wilk@...cle.com"
<konrad.wilk@...cle.com>,
"wanpengli@...cent.com" <wanpengli@...cent.com>,
"cl@...two.org" <cl@...two.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"maobibo@...ngson.cn" <maobibo@...ngson.cn>,
"pbonzini@...hat.com"
<pbonzini@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"misono.tomohiro@...itsu.com" <misono.tomohiro@...itsu.com>,
"daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
"arnd@...db.de"
<arnd@...db.de>, "lenb@...nel.org" <lenb@...nel.org>,
"will@...nel.org"
<will@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"peterz@...radead.org"
<peterz@...radead.org>,
"boris.ostrovsky@...cle.com"
<boris.ostrovsky@...cle.com>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>,
"mtosatti@...hat.com" <mtosatti@...hat.com>,
"x86@...nel.org"
<x86@...nel.org>,
"mark.rutland@....com" <mark.rutland@....com>
Subject: Re: [PATCH v8 01/11] cpuidle/poll_state: poll via
smp_cond_load_relaxed()
Catalin Marinas <catalin.marinas@....com> writes:
> On Wed, Oct 16, 2024 at 03:13:33PM +0000, Okanovic, Haris wrote:
>> On Tue, 2024-10-15 at 13:04 +0100, Catalin Marinas wrote:
>> > On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote:
>> > > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> > > index 9b6d90a72601..fc1204426158 100644
>> > > --- a/drivers/cpuidle/poll_state.c
>> > > +++ b/drivers/cpuidle/poll_state.c
>> > > @@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> > >
>> > > raw_local_irq_enable();
>> > > if (!current_set_polling_and_test()) {
>> > > - unsigned int loop_count = 0;
>> > > u64 limit;
>> > >
>> > > limit = cpuidle_poll_time(drv, dev);
>> > >
>> > > while (!need_resched()) {
>> > > - cpu_relax();
>> > > - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> > > - continue;
>> > > -
>> > > - loop_count = 0;
>> > > + unsigned int loop_count = 0;
>> > > if (local_clock_noinstr() - time_start > limit) {
>> > > dev->poll_time_limit = true;
>> > > break;
>> > > }
>> > > +
>> > > + smp_cond_load_relaxed(¤t_thread_info()->flags,
>> > > + VAL & _TIF_NEED_RESCHED ||
>> > > + loop_count++ >= POLL_IDLE_RELAX_COUNT);
>> >
>> > The above is not guaranteed to make progress if _TIF_NEED_RESCHED is
>> > never set. With the event stream enabled on arm64, the WFE will
>> > eventually be woken up, loop_count incremented and the condition would
>> > become true. However, the smp_cond_load_relaxed() semantics require that
>> > a different agent updates the variable being waited on, not the waiting
>> > CPU updating it itself. Also note that the event stream can be disabled
>> > on arm64 on the kernel command line.
>>
>> Alternately could we condition arch_haltpoll_want() on
>> arch_timer_evtstrm_available(), like v7?
>
> No. The problem is about the smp_cond_load_relaxed() semantics - it
> can't wait on a variable that's only updated in its exit condition. We
> need a new API for this, especially since we are changing generic code
> here (even it was arm64 code only, I'd still object to such
> smp_cond_load_*() constructs).
Right. The problem is that smp_cond_load_relaxed() used in this context
depends on the event-stream side effect when the interface does not
encode those semantics anywhere.
So, a smp_cond_load_timeout() like in [1] that continues to depend on
the event-stream is better because it explicitly accounts for the side
effect from the timeout.
This would cover both the WFxT and the event-stream case.
The part I'm a little less sure about is the case where WFxT and the
event-stream are absent.
As you said earlier, for that case on arm64, we use either short
__delay() calls or spin in cpu_relax(), both of which are essentially
the same thing.
Now on x86 cpu_relax() is quite optimal. The spec explicitly recommends
it and from my measurement a loop doing "while (!cond) cpu_relax()" gets
an IPC of something like 0.1 or similar.
On my arm64 systems however the same loop gets an IPC of 2. Now this
likely varies greatly but seems like it would run pretty hot some of
the time.
So maybe the right thing to do would be to keep smp_cond_load_timeout()
but only allow polling if WFxT or event-stream is enabled. And enhance
cpuidle_poll_state_init() to fail if the above condition is not met.
Does that make sense?
Thanks
Ankur
[1] https://lore.kernel.org/lkml/87edae3a1x.fsf@oracle.com/
Powered by blists - more mailing lists