[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8c63af5-2509-310d-7ba0-7687b20e3b44@oracle.com>
Date: Thu, 29 Aug 2019 22:12:56 +0100
From: Joao Martins <joao.m.martins@...cle.com>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Marcelo Tosatti <mtosatti@...hat.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, linux-pm@...r.kernel.org,
Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: Default governor regardless of cpuidle driver
On 8/29/19 9:22 PM, Daniel Lezcano wrote:
> On 29/08/2019 21:11, Joao Martins wrote:
>> On 8/29/19 7:28 PM, Daniel Lezcano wrote:
>>> On 29/08/2019 20:07, Joao Martins wrote:
>>>> On 8/29/19 6:42 PM, Daniel Lezcano wrote:
>>>>> On 29/08/2019 19:16, Joao Martins wrote:
>>>>>> On 8/29/19 4:10 PM, Joao Martins wrote:
>>>>>>> When cpus != maxcpus cpuidle-haltpoll will fail to register all vcpus
>>>>>>> past the online ones and thus fail to register the idle driver.
>>>>>>> This is because cpuidle_add_sysfs() will return with -ENODEV as a
>>>>>>> consequence from get_cpu_device() return no device for a non-existing
>>>>>>> CPU.
>>>>>>>
>>>>>>> Instead switch to cpuidle_register_driver() and manually register each
>>>>>>> of the present cpus through cpuhp_setup_state() callback and future
>>>>>>> ones that get onlined. This mimmics similar logic that intel_idle does.
>>>>>>>
>>>>>>> Fixes: fa86ee90eb11 ("add cpuidle-haltpoll driver")
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@...cle.com>
>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@...cle.com>
>>>>>>> ---
>>>>>>
>>>>>> While testing the above, I found out another issue on the haltpoll series.
>>>>>> But I am not sure what is best suited to cpuidle framework, hence requesting
>>>>>> some advise if below is a reasonable solution or something else is preferred.
>>>>>>
>>>>>> Essentially after haltpoll governor got introduced and regardless of the cpuidle
>>>>>> driver the default governor is gonna be haltpoll for a guest (given haltpoll
>>>>>> governor doesn't get registered for baremetal). Right now, for a KVM guest, the
>>>>>> idle governors have these ratings:
>>>>>>
>>>>>> * ladder -> 10
>>>>>> * teo -> 19
>>>>>> * menu -> 20
>>>>>> * haltpoll -> 21
>>>>>> * ladder + nohz=off -> 25
>>>>>>
>>>>>> When a guest is booted with MWAIT and intel_idle is probed and sucessfully
>>>>>> registered, we will end up with a haltpoll governor being used as opposed to
>>>>>> 'menu' (which used to be the default case). This would prevent IIUC that other
>>>>>> C-states get used other than poll_state (state 0) and state 1.
>>>>>>
>>>>>> Given that haltpoll governor is largely only useful with a cpuidle-haltpoll
>>>>>> it doesn't look reasonable to be the default? What about using haltpoll governor
>>>>>> as default when haltpoll idle driver registers or modload.
>>>>>
>>>>> Are the guest and host kernel the same? IOW compiled with the same
>>>>> kernel config?
>>>>>
>>>> You just need to toggle this (regardless off CONFIG_HALTPOLL_CPUIDLE):
>>>>
>>>> CONFIG_CPU_IDLE_GOV_HALTPOLL=y
>>>>
>>>> And *if you are a KVM guest* it will be the default (unless using nohz=off in
>>>> which case ladder gets the highest rating -- see the listing right above).
>>>>
>>>> Host will just behave differently because the haltpoll governor is checking if
>>>> it is running as kvm guest, and only registering in that case.
>>>
>>> I understood the problem. Actually my question was about if the kernels
>>> are compiled for host and guest, and can be run indifferently.
>>
>> /nods Correct.
>>
>>> In this
>>> case a runtime detection must be done as you propose, otherwise that can
>>> be done at config time. I pretty sure it is the former but before
>>> thinking about the runtime side, I wanted to double check.
>>>
>> Hmm, but even with separate kernels/configs for guest and host I think we would
>> still have the same issue.
>>
>> What I was trying to convey is that even when running with a config solely for
>> KVM guests (that is different than baremetal) you can have today various ways of
>> idling. An Intel x86 kvm guest can have no idle driver (but arch-specific),
>> intel_idle (like baremetal config) and haltpoll. There are usecases for these
>> three, and makes sense to consolidate all.
>>
>> Say you wanted to have a kvm specific config, you would still see the same
>> problem if you happen to compile intel_idle together with haltpoll
>> driver+governor.
>
> Can a guest work with an intel_idle driver?
>
Yes.
If you use Qemu you would add '-overcommit cpu-pm=on' to try it out. ofc,
assuming you're on a relatively recent Qemu (v3.0+) and a fairly recent kernel
version as host (v4.17+).
>> Creating two separate configs here, with and without haltpoll
>> for VMs doesn't sound effective for distros.
>
> Agree
>
>> Perhaps decreasing the rating of
>> haltpoll governor, but while a short term fix it wouldn't give much sensible
>> defaults without the one-off runtime switch.
>
Powered by blists - more mailing lists