lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Nov 2017 11:15:30 +0800
From:   "quan.xu04@...il.com" <quan.xu04@...il.com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        kvm <kvm@...r.kernel.org>, Quan Xu <quan.xu0@...il.com>
Subject: Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart
 idle poll



On 2017年11月14日 15:44, Ingo Molnar wrote:
> * Quan Xu <quan.xu0@...il.com> wrote:
>
>>
>> On 2017/11/13 23:08, Ingo Molnar wrote:
>>> * Quan Xu <quan.xu04@...il.com> wrote:
>>>
>>>> From: Quan Xu <quan.xu0@...il.com>
>>>>
>>>> To reduce the cost of poll, we introduce three sysctl to control the
>>>> poll time when running as a virtual machine with paravirt.
>>>>
>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@...il.com>
>>>> Signed-off-by: Quan Xu <quan.xu0@...il.com>
>>>> ---
>>>>    Documentation/sysctl/kernel.txt |   35 +++++++++++++++++++++++++++++++++++
>>>>    arch/x86/kernel/paravirt.c      |    4 ++++
>>>>    include/linux/kernel.h          |    6 ++++++
>>>>    kernel/sysctl.c                 |   34 ++++++++++++++++++++++++++++++++++
>>>>    4 files changed, 79 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
>>>> index 694968c..30c25fb 100644
>>>> --- a/Documentation/sysctl/kernel.txt
>>>> +++ b/Documentation/sysctl/kernel.txt
>>>> @@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
>>>>    ==============================================================
>>>> +paravirt_poll_grow: (X86 only)
>>>> +
>>>> +Multiplied value to increase the poll time. This is expected to take
>>>> +effect only when running as a virtual machine with CONFIG_PARAVIRT
>>>> +enabled. This can't bring any benifit on bare mental even with
>>>> +CONFIG_PARAVIRT enabled.
>>>> +
>>>> +By default this value is 2. Possible values to set are in range {2..16}.
>>>> +
>>>> +==============================================================
>>>> +
>>>> +paravirt_poll_shrink: (X86 only)
>>>> +
>>>> +Divided value to reduce the poll time. This is expected to take effect
>>>> +only when running as a virtual machine with CONFIG_PARAVIRT enabled.
>>>> +This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
>>>> +enabled.
>>>> +
>>>> +By default this value is 2. Possible values to set are in range {2..16}.
>>>> +
>>>> +==============================================================
>>>> +
>>>> +paravirt_poll_threshold_ns: (X86 only)
>>>> +
>>>> +Controls the maximum poll time before entering real idle path. This is
>>>> +expected to take effect only when running as a virtual machine with
>>>> +CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
>>>> +even with CONFIG_PARAVIRT enabled.
>>>> +
>>>> +By default, this value is 0 means not to poll. Possible values to set
>>>> +are in range {0..500000}. Change the value to non-zero if running
>>>> +latency-bound workloads in a virtual machine.
>>> I absolutely hate it how this hybrid idle loop polling mechanism is not
>>> self-tuning!
>> Ingo, actually it is self-tuning..
> Then why the hell does it touch the syscall ABI?


just for more data about performance and CPU utilization with different
the maximum poll time.

there are 3 parameters, paravirt_poll_{grow|shrink|threshold_ns}..
we didn't touch paravirt_poll_{grow|shrink} since we sent out v1.

We tested it based on  benchmark contextswitch / netperf with different
paravirt_poll_threshold_ns.

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):
       halt_poll_threshold=0      -- 3402.9 ns/ctxsw -- 199.8 %CPU
       halt_poll_threshold=10000  -- 1151.4 ns/ctxsw -- 200.1 %CPU
       halt_poll_threshold=20000  -- 1149.7 ns/ctxsw -- 199.9 %CPU
       halt_poll_threshold=30000  -- 1151.0 ns/ctxsw -- 199.9 %CPU
       halt_poll_threshold=40000  -- 1155.4 ns/ctxsw -- 199.3 %CPU
       halt_poll_threshold=50000  -- 1161.0 ns/ctxsw -- 200.0 %CPU
       halt_poll_threshold=100000 -- 1163.8 ns/ctxsw -- 200.4 %CPU
       halt_poll_threshold=200000 -- 1163.8 ns/ctxsw -- 201.4 %CPU
       halt_poll_threshold=300000 -- 1159.4 ns/ctxsw -- 201.9 %CPU
       halt_poll_threshold=500000 -- 1163.5 ns/ctxsw -- 205.5 %CPU


Here is the data we get when running benchmark netperf:
       halt_poll_threshold=0      -- 29031.6 bit/s -- 76.1  %CPU
       halt_poll_threshold=10000  -- 29021.7 bit/s -- 105.1 %CPU
       halt_poll_threshold=20000  -- 33463.5 bit/s -- 128.2 %CPU
       halt_poll_threshold=30000  -- 34436.4 bit/s -- 127.8 %CPU
       halt_poll_threshold=40000  -- 35563.3 bit/s -- 129.6 %CPU
       halt_poll_threshold=50000  -- 35787.7 bit/s -- 129.4 %CPU
       halt_poll_threshold=100000 -- 35477.7 bit/s -- 130.0 %CPU
       halt_poll_threshold=200000 -- 35877.7 bit/s -- 131.0 %CPU
       halt_poll_threshold=300000 -- 35730.0 bit/s -- 132.4 %CPU
       halt_poll_threshold=500000 -- 34978.4 bit/s -- 134.2 %CPU


and think of the default value(200000, for x86) of kvm dynamic poll,
I'll set it as the same as kvm dynamic poll.

I also test idle VM with diffrent halt_poll_threshold, which doesn't
make CPU utilization fluctuated..


>> could I only leave paravirt_poll_threshold_ns parameter (the maximum poll time),
>> which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user can
>> turn it off, or find an appropriate threshold for some odd scenario..
> That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to
> userspace: NAK.
>
.. so, I will make these 3 parameters by default in next v4.
      paravirt_poll_threshold_ns = 200000
      paravirt_poll_shrink = 2
      paravirt_poll_grow = 2

neither touch the syscal ABI nor expose it to userspace again.


Quan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ