[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b44f8fad-55ee-ff03-faf9-d8ef4b8f4ab8@gmail.com>
Date: Wed, 15 Nov 2017 11:15:30 +0800
From: "quan.xu04@...il.com" <quan.xu04@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
kvm <kvm@...r.kernel.org>, Quan Xu <quan.xu0@...il.com>
Subject: Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart
idle poll
On 2017年11月14日 15:44, Ingo Molnar wrote:
> * Quan Xu <quan.xu0@...il.com> wrote:
>
>>
>> On 2017/11/13 23:08, Ingo Molnar wrote:
>>> * Quan Xu <quan.xu04@...il.com> wrote:
>>>
>>>> From: Quan Xu <quan.xu0@...il.com>
>>>>
>>>> To reduce the cost of poll, we introduce three sysctl to control the
>>>> poll time when running as a virtual machine with paravirt.
>>>>
>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@...il.com>
>>>> Signed-off-by: Quan Xu <quan.xu0@...il.com>
>>>> ---
>>>> Documentation/sysctl/kernel.txt | 35 +++++++++++++++++++++++++++++++++++
>>>> arch/x86/kernel/paravirt.c | 4 ++++
>>>> include/linux/kernel.h | 6 ++++++
>>>> kernel/sysctl.c | 34 ++++++++++++++++++++++++++++++++++
>>>> 4 files changed, 79 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
>>>> index 694968c..30c25fb 100644
>>>> --- a/Documentation/sysctl/kernel.txt
>>>> +++ b/Documentation/sysctl/kernel.txt
>>>> @@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
>>>> ==============================================================
>>>> +paravirt_poll_grow: (X86 only)
>>>> +
>>>> +Multiplied value to increase the poll time. This is expected to take
>>>> +effect only when running as a virtual machine with CONFIG_PARAVIRT
>>>> +enabled. This can't bring any benifit on bare mental even with
>>>> +CONFIG_PARAVIRT enabled.
>>>> +
>>>> +By default this value is 2. Possible values to set are in range {2..16}.
>>>> +
>>>> +==============================================================
>>>> +
>>>> +paravirt_poll_shrink: (X86 only)
>>>> +
>>>> +Divided value to reduce the poll time. This is expected to take effect
>>>> +only when running as a virtual machine with CONFIG_PARAVIRT enabled.
>>>> +This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
>>>> +enabled.
>>>> +
>>>> +By default this value is 2. Possible values to set are in range {2..16}.
>>>> +
>>>> +==============================================================
>>>> +
>>>> +paravirt_poll_threshold_ns: (X86 only)
>>>> +
>>>> +Controls the maximum poll time before entering real idle path. This is
>>>> +expected to take effect only when running as a virtual machine with
>>>> +CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
>>>> +even with CONFIG_PARAVIRT enabled.
>>>> +
>>>> +By default, this value is 0 means not to poll. Possible values to set
>>>> +are in range {0..500000}. Change the value to non-zero if running
>>>> +latency-bound workloads in a virtual machine.
>>> I absolutely hate it how this hybrid idle loop polling mechanism is not
>>> self-tuning!
>> Ingo, actually it is self-tuning..
> Then why the hell does it touch the syscall ABI?
just for more data about performance and CPU utilization with different
the maximum poll time.
there are 3 parameters, paravirt_poll_{grow|shrink|threshold_ns}..
we didn't touch paravirt_poll_{grow|shrink} since we sent out v1.
We tested it based on benchmark contextswitch / netperf with different
paravirt_poll_threshold_ns.
Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):
halt_poll_threshold=0 -- 3402.9 ns/ctxsw -- 199.8 %CPU
halt_poll_threshold=10000 -- 1151.4 ns/ctxsw -- 200.1 %CPU
halt_poll_threshold=20000 -- 1149.7 ns/ctxsw -- 199.9 %CPU
halt_poll_threshold=30000 -- 1151.0 ns/ctxsw -- 199.9 %CPU
halt_poll_threshold=40000 -- 1155.4 ns/ctxsw -- 199.3 %CPU
halt_poll_threshold=50000 -- 1161.0 ns/ctxsw -- 200.0 %CPU
halt_poll_threshold=100000 -- 1163.8 ns/ctxsw -- 200.4 %CPU
halt_poll_threshold=200000 -- 1163.8 ns/ctxsw -- 201.4 %CPU
halt_poll_threshold=300000 -- 1159.4 ns/ctxsw -- 201.9 %CPU
halt_poll_threshold=500000 -- 1163.5 ns/ctxsw -- 205.5 %CPU
Here is the data we get when running benchmark netperf:
halt_poll_threshold=0 -- 29031.6 bit/s -- 76.1 %CPU
halt_poll_threshold=10000 -- 29021.7 bit/s -- 105.1 %CPU
halt_poll_threshold=20000 -- 33463.5 bit/s -- 128.2 %CPU
halt_poll_threshold=30000 -- 34436.4 bit/s -- 127.8 %CPU
halt_poll_threshold=40000 -- 35563.3 bit/s -- 129.6 %CPU
halt_poll_threshold=50000 -- 35787.7 bit/s -- 129.4 %CPU
halt_poll_threshold=100000 -- 35477.7 bit/s -- 130.0 %CPU
halt_poll_threshold=200000 -- 35877.7 bit/s -- 131.0 %CPU
halt_poll_threshold=300000 -- 35730.0 bit/s -- 132.4 %CPU
halt_poll_threshold=500000 -- 34978.4 bit/s -- 134.2 %CPU
and think of the default value(200000, for x86) of kvm dynamic poll,
I'll set it as the same as kvm dynamic poll.
I also test idle VM with diffrent halt_poll_threshold, which doesn't
make CPU utilization fluctuated..
>> could I only leave paravirt_poll_threshold_ns parameter (the maximum poll time),
>> which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user can
>> turn it off, or find an appropriate threshold for some odd scenario..
> That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to
> userspace: NAK.
>
.. so, I will make these 3 parameters by default in next v4.
paravirt_poll_threshold_ns = 200000
paravirt_poll_shrink = 2
paravirt_poll_grow = 2
neither touch the syscal ABI nor expose it to userspace again.
Quan
Powered by blists - more mailing lists