[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wmi8ou7g.fsf@oracle.com>
Date: Wed, 16 Oct 2024 10:04:19 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: "Okanovic, Haris" <harisokn@...zon.com>
Cc: "catalin.marinas@....com" <catalin.marinas@....com>,
"ankur.a.arora@...cle.com" <ankur.a.arora@...cle.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"rafael@...nel.org"
<rafael@...nel.org>,
"sudeep.holla@....com" <sudeep.holla@....com>,
"joao.m.martins@...cle.com" <joao.m.martins@...cle.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
"wanpengli@...cent.com"
<wanpengli@...cent.com>,
"cl@...two.org" <cl@...two.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"maobibo@...ngson.cn"
<maobibo@...ngson.cn>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"misono.tomohiro@...itsu.com"
<misono.tomohiro@...itsu.com>,
"daniel.lezcano@...aro.org"
<daniel.lezcano@...aro.org>,
"arnd@...db.de" <arnd@...db.de>, "lenb@...nel.org" <lenb@...nel.org>,
"will@...nel.org" <will@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"peterz@...radead.org"
<peterz@...radead.org>,
"boris.ostrovsky@...cle.com"
<boris.ostrovsky@...cle.com>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>,
"mtosatti@...hat.com" <mtosatti@...hat.com>,
"x86@...nel.org"
<x86@...nel.org>,
"mark.rutland@....com" <mark.rutland@....com>
Subject: Re: [PATCH v8 01/11] cpuidle/poll_state: poll via
smp_cond_load_relaxed()
Okanovic, Haris <harisokn@...zon.com> writes:
> On Tue, 2024-10-15 at 13:04 +0100, Catalin Marinas wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote:
>> > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> > index 9b6d90a72601..fc1204426158 100644
>> > --- a/drivers/cpuidle/poll_state.c
>> > +++ b/drivers/cpuidle/poll_state.c
>> > @@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> >
>> > raw_local_irq_enable();
>> > if (!current_set_polling_and_test()) {
>> > - unsigned int loop_count = 0;
>> > u64 limit;
>> >
>> > limit = cpuidle_poll_time(drv, dev);
>> >
>> > while (!need_resched()) {
>> > - cpu_relax();
>> > - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> > - continue;
>> > -
>> > - loop_count = 0;
>> > + unsigned int loop_count = 0;
>> > if (local_clock_noinstr() - time_start > limit) {
>> > dev->poll_time_limit = true;
>> > break;
>> > }
>> > +
>> > + smp_cond_load_relaxed(¤t_thread_info()->flags,
>> > + VAL & _TIF_NEED_RESCHED ||
>> > + loop_count++ >= POLL_IDLE_RELAX_COUNT);
>>
>> The above is not guaranteed to make progress if _TIF_NEED_RESCHED is
>> never set. With the event stream enabled on arm64, the WFE will
>> eventually be woken up, loop_count incremented and the condition would
>> become true. However, the smp_cond_load_relaxed() semantics require that
>> a different agent updates the variable being waited on, not the waiting
>> CPU updating it itself. Also note that the event stream can be disabled
>> on arm64 on the kernel command line.
>
> Alternately could we condition arch_haltpoll_want() on
> arch_timer_evtstrm_available(), like v7?
Yes, I'm thinking of staging it somewhat like that. First an
smp_cond_load_relaxed() which gets rid of this issue, followed by
one based on smp_cond_load_relaxed_timeout().
That said, conditioning just arch_haltpoll_want() won't suffice since
what Catalin pointed out affects all users of poll_idle(), not just
haltpoll.
Right now there's only haltpoll but there are future users like
zhenglifeng with a patch for acpi-idle here:
https://lore.kernel.org/all/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/
>> Does the code above break any other architecture? I'd say if you want
>> something like this, better introduce a new smp_cond_load_timeout()
>> API. The above looks like a hack that may only work on arm64 when the
>> event stream is enabled.
>>
>> A generic option is udelay() (on arm64 it would use WFE/WFET by
>> default). Not sure how important it is for poll_idle() but the downside
>> of udelay() that it won't be able to also poll need_resched() while
>> waiting for the timeout. If this matters, you could instead make smaller
>> udelay() calls. Yet another problem, I don't know how energy efficient
>> udelay() is on x86 vs cpu_relax().
>>
>> So maybe an smp_cond_load_timeout() would be better, implemented with
>> cpu_relax() generically and the arm64 would use LDXR, WFE and rely on
>> the event stream (or fall back to cpu_relax() if the event stream is
>> disabled).
>>
>> --
>> Catalin
--
ankur
Powered by blists - more mailing lists