[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9f02b7f-04e2-41e9-38c5-f770f40f6faf@gmail.com>
Date: Fri, 17 Nov 2017 20:21:42 +0800
From: Quan Xu <quan.xu0@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>,
Quan Xu <quan.xu03@...il.com>, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
virtualization@...ts.linux-foundation.org, x86@...nel.org,
xen-devel@...ts.xenproject.org,
Yang Zhang <yang.zhang.wz@...il.com>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
Kyle Huey <me@...ehuey.com>, Len Brown <len.brown@...el.com>,
Andy Lutomirski <luto@...nel.org>,
Tom Lendacky <thomas.lendacky@....com>,
Tobias Klauser <tklauser@...tanz.ch>,
Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter
real idle path
On 2017-11-17 19:36, Thomas Gleixner wrote:
> On Fri, 17 Nov 2017, Quan Xu wrote:
>> On 2017-11-16 17:53, Thomas Gleixner wrote:
>>> That's just plain wrong. We don't want to see any of this PARAVIRT crap in
>>> anything outside the architecture/hypervisor interfacing code which really
>>> needs it.
>>>
>>> The problem can and must be solved at the generic level in the first place
>>> to gather the data which can be used to make such decisions.
>>>
>>> How that information is used might be either completely generic or requires
>>> system specific variants. But as long as we don't have any information at
>>> all we cannot discuss that.
>>>
>>> Please sit down and write up which data needs to be considered to make
>>> decisions about probabilistic polling. Then we need to compare and contrast
>>> that with the data which is necessary to make power/idle state decisions.
>>>
>>> I would be very surprised if this data would not overlap by at least 90%.
>>>
>> 1. which data needs to considerd to make decisions about probabilistic polling
>>
>> I really need to write up which data needs to considerd to make
>> decisions about probabilistic polling. At last several months,
>> I always focused on the data _from idle to reschedule_, then to bypass
>> the idle loops. unfortunately, this makes me touch scheduler/idle/nohz
>> code inevitably.
>>
>> with tglx's suggestion, the data which is necessary to make power/idle
>> state decisions, is the last idle state's residency time. IIUC this data
>> is duration from idle to wakeup, which maybe by reschedule irq or other irq.
> That's part of the picture, but not complete.
tglx, could you share more? I am very curious about it..
>> I also test that the reschedule irq overlap by more than 90% (trace the
>> need_resched status after cpuidle_idle_call), when I run ctxsw/netperf for
>> one minute.
>>
>> as the overlap, I think I can input the last idle state's residency time
>> to make decisions about probabilistic polling, as @dev->last_residency does.
>> it is much easier to get data.
> That's only true for your particular use case.
>
>> 2. do a HV specific idle driver (function)
>>
>> so far, power management is not exposed to guest.. idle is simple for KVM
>> guest,
>> calling "sti" / "hlt"(cpuidle_idle_call() --> default_idle_call())..
>> thanks Xen guys, who has implemented the paravirt framework. I can implement
>> it
>> as easy as following:
>>
>> --- a/arch/x86/kernel/kvm.c
> Your email client is using a very strange formatting.
my bad, I insert space to highlight these code.
> This is definitely better than what you proposed so far and implementing it
> as a prove of concept seems to be worthwhile.
>
> But I doubt that this is the final solution. It's not generic and not
> necessarily suitable for all use case scenarios.
>
>
yes, I am exhausted :):)
could you tell me the gap to be generic and necessarily suitable for
all use case scenarios? as lack of irq/idle predictors?
I really want to upstream it for all of public cloud users/providers..
as kvm host has a similar one, is it possible to upstream with following
conditions? :
1). add a QEMU configuration, whether enable or not, by default
disable.
2). add some "TODO" comments near the code.
3). ...
anyway, thanks for your help..
Quan
Alibaba Cloud
Powered by blists - more mailing lists