[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46086489-5a01-16e1-9314-70ae53c01952@gmail.com>
Date: Thu, 16 Nov 2017 17:12:28 +0800
From: Quan Xu <quan.xu0@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>
Cc: Quan Xu <quan.xu03@...il.com>, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
virtualization@...ts.linux-foundation.org, x86@...nel.org,
xen-devel@...ts.xenproject.org,
Yang Zhang <yang.zhang.wz@...il.com>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
Kyle Huey <me@...ehuey.com>, Len Brown <len.brown@...el.com>,
Andy Lutomirski <luto@...nel.org>,
Tom Lendacky <thomas.lendacky@....com>,
Tobias Klauser <tklauser@...tanz.ch>,
Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter
real idle path
On 2017-11-16 06:03, Thomas Gleixner wrote:
> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>
>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>> From: Yang Zhang <yang.zhang.wz@...il.com>
>>>
>>> Implement a generic idle poll which resembles the functionality
>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>> can be overridden by the architecture code if needed.
>> No, we want less of those magic hooks, not more.
>>
>>> Interrupts arrive which may not cause a reschedule in idle loops.
>>> In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
>>> for interrupts and VM-exit immediately. Also this becomes more
>>> expensive than bare metal. Add a generic idle poll before enter
>>> real idle path. When a reschedule event is pending, we can bypass
>>> the real idle path.
>> Why not do a HV specific idle driver?
> If I understand the problem correctly then he wants to avoid the heavy
> lifting in tick_nohz_idle_enter() in the first place, but there is already
> an interesting quirk there which makes it exit early. See commit
> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit
> looks similar. But lets not proliferate that. I'd rather see that go away.
agreed.
Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce
arch_needs_cpu")
in kvm guest. I won't proliferate that..
> But the irq_timings stuff is heading into the same direction, with a more
> complex prediction logic which should tell you pretty good how long that
> idle period is going to be and in case of an interrupt heavy workload this
> would skip the extra work of stopping and restarting the tick and provide a
> very good input into a polling decision.
interesting. I have tested with IRQ_TIMINGS related code, which seems
not working so far.
Also I'd like to help as much as I can.
> This can be handled either in a HV specific idle driver or even in the
> generic core code. If the interrupt does not arrive then you can assume
> within the predicted time then you can assume that the flood stopped and
> invoke halt or whatever.
>
> That avoids all of that 'tunable and tweakable' x86 specific hackery and
> utilizes common functionality which is mostly there already.
here is some sample code. Poll for a while before enter halt in
cpuidle_enter_state()
If I get a reschedule event, then don't try to enter halt. (I hope this
is the right direction as Peter mentioned in another email)
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
target_state = &drv->states[index];
}
+#ifdef CONFIG_PARAVIRT
+ paravirt_idle_poll();
+
+ if (need_resched())
+ return -EBUSY;
+#endif
+
/* Take note of the planned idle state. */
sched_idle_set_state(target_state);
thanks,
Quan
Alibaba Cloud
Powered by blists - more mailing lists