linux-kernel - Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzav=c6HkSjYKVNYQ23kzhgi9hOW_Q8OzRsGE7cL0eOGg70Lw@mail.gmail.com>
Date:	Tue, 1 Sep 2015 15:34:59 -0700
From:	David Matlack <dmatlack@...gle.com>
To:	Wanpeng Li <wanpeng.li@...mail.com>
Cc:	Paolo Bonzini <pbonzini@...hat.com>,
	kvm list <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li <wanpeng.li@...mail.com> wrote:
> On 9/2/15 5:45 AM, David Matlack wrote:
>>
>> On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li <wanpeng.li@...mail.com>
>> wrote:
>>>
>>> v3 -> v4:
>>>   * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
>>>     when idle VCPU is detected
>>>
>>> v2 -> v3:
>>>   * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
>>> /halt_poll_ns_shrink
>>>   * drop the macros and hard coding the numbers in the param definitions
>>>   * update the comments "5-7 us"
>>>   * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns
>>> time,
>>>     vcpu->halt_poll_ns start at zero
>>>   * drop the wrappers
>>>   * move the grow/shrink logic before "out:" w/ "if (waited)"
>>
>> I posted a patchset which adds dynamic poll toggling (on/off switch). I
>> think
>> this gives you a good place to build your dynamic growth patch on top. The
>> toggling patch has close to zero overhead for idle VMs and equivalent
>> performance VMs doing message passing as always-poll. It's a patch that's
>> been
>> in my queue for a few weeks but just haven't had the time to send out. We
>> can
>> win even more with your patchset by only polling as much as we need (via
>> dynamic growth/shrink). It also gives us a better place to stand for
>> choosing
>> a default for halt_poll_ns. (We can run experiments and see how high
>> vcpu->halt_poll_ns tends to grow.)
>>
>> The reason I posted a separate patch for toggling is because it adds
>> timers
>> to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
>> called multiple times for one halt). To do dynamic poll adjustment
>> correctly,
>> we have to time the length of each halt. Otherwise we hit some bad edge
>> cases:
>>
>>    v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew
>> every
>>    time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1
>> ms ->
>>    2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0
>> when
>>    the halts are long.
>>
>>    v4: v4 fixed the idle overhead problem but broke dynamic growth for
>> message
>>    passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would
>> grow.
>>    That means vcpu->halt_poll_ns will always be maxed out, even when the
>> halt
>>    time is much less than the max.
>>
>> I think we can fix both edge cases if we make grow/shrink decisions based
>> on
>> the length of kvm_vcpu_block rather than the arrival of a guest interrupt
>> during polling.
>>
>> Some thoughts for dynamic growth:
>>    * Given Windows 10 timer tick (1 ms), let's set the maximum poll time
>> to
>>      less than 1ms. 200 us has been a good value for always-poll. We can
>>      probably go a bit higher once we have your patch. Maybe 500 us?
>>
>>    * The base case of dynamic growth (the first grow() after being at 0)
>> should
>>      be small. 500 us is too big. When I run TCP_RR in my guest I see poll
>> times
>>      of < 10 us. TCP_RR is on the lower-end of message passing workload
>> latency,
>>      so 10 us would be a good base case.
>
>
> How to get your TCP_RR benchmark?
>
> Regards,
> Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:

# ./netserver
# ./netperf -t TCP_RR

Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
passing workload in order to test halt-polling).

>
>
>>> v1 -> v2:
>>>   * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
>>>     the module parameter
>>>   * use the shrink/grow matrix which is suggested by David
>>>   * set halt_poll_ns_max to 2ms
>>>
>>> There is a downside of halt_poll_ns since poll is still happen for idle
>>> VCPU which can waste cpu usage. This patchset add the ability to adjust
>>> halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
>>> shrinks halt_poll_ns when idle VCPU is detected.
>>>
>>> There are two new kernel parameters for changing the halt_poll_ns:
>>> halt_poll_ns_grow and halt_poll_ns_shrink.
>>>
>>>
>>> Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
>>> halt-poll is the default 500000ns, the max halt_poll_ns of dynamic
>>> halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
>>> The test method is almost from David.
>>>
>>> +-----------------+----------------+-------------------+
>>> |                 |                |                   |
>>> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
>>> +-----------------+----------------+-------------------+
>>> |                 |                |                   |
>>> |    ~0.9%        |    ~1.8%       |     ~1.2%         |
>>> +-----------------+----------------+-------------------+
>>>
>>> The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
>>> dynamic halt-poll drop it to ~0.3% which means that reduce the 67%
>>> overhead
>>> introduced by always halt-poll.
>>>
>>> Wanpeng Li (3):
>>>    KVM: make halt_poll_ns per-VCPU
>>>    KVM: dynamic halt_poll_ns adjustment
>>>    KVM: trace kvm_halt_poll_ns grow/shrink
>>>
>>>   include/linux/kvm_host.h   |  1 +
>>>   include/trace/events/kvm.h | 30 ++++++++++++++++++++++++++++
>>>   virt/kvm/kvm_main.c        | 50
>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>   3 files changed, 78 insertions(+), 3 deletions(-)
>>> --
>>> 1.9.1
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/