linux-kernel - Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU436-SMTP27C779DB55C4A295C6085680690@phx.gbl>
Date:	Wed, 2 Sep 2015 08:29:46 +0800
From:	Wanpeng Li <wanpeng.li@...mail.com>
To:	David Matlack <dmatlack@...gle.com>
CC:	Paolo Bonzini <pbonzini@...hat.com>,
	kvm list <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Kieser <peter@...ser.ca>
Subject: Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

On 9/2/15 7:24 AM, David Matlack wrote:
> On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li <wanpeng.li@...mail.com> wrote:
>> On 9/2/15 6:34 AM, David Matlack wrote:
>>> On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li <wanpeng.li@...mail.com> wrote:
>>>> On 9/2/15 5:45 AM, David Matlack wrote:
>>>>> On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li <wanpeng.li@...mail.com>
>>>>> wrote:
>>>>>> v3 -> v4:
>>>>>>     * bring back grow vcpu->halt_poll_ns when interrupt arrives and
>>>>>> shrinks
>>>>>>       when idle VCPU is detected
>>>>>>
>>>>>> v2 -> v3:
>>>>>>     * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
>>>>>> /halt_poll_ns_shrink
>>>>>>     * drop the macros and hard coding the numbers in the param
>>>>>> definitions
>>>>>>     * update the comments "5-7 us"
>>>>>>     * remove halt_poll_ns_max and use halt_poll_ns as the max
>>>>>> halt_poll_ns
>>>>>> time,
>>>>>>       vcpu->halt_poll_ns start at zero
>>>>>>     * drop the wrappers
>>>>>>     * move the grow/shrink logic before "out:" w/ "if (waited)"
>>>>> I posted a patchset which adds dynamic poll toggling (on/off switch). I
>>>>> think
>>>>> this gives you a good place to build your dynamic growth patch on top.
>>>>> The
>>>>> toggling patch has close to zero overhead for idle VMs and equivalent
>>>>> performance VMs doing message passing as always-poll. It's a patch
>>>>> that's
>>>>> been
>>>>> in my queue for a few weeks but just haven't had the time to send out.
>>>>> We
>>>>> can
>>>>> win even more with your patchset by only polling as much as we need (via
>>>>> dynamic growth/shrink). It also gives us a better place to stand for
>>>>> choosing
>>>>> a default for halt_poll_ns. (We can run experiments and see how high
>>>>> vcpu->halt_poll_ns tends to grow.)
>>>>>
>>>>> The reason I posted a separate patch for toggling is because it adds
>>>>> timers
>>>>> to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can
>>>>> get
>>>>> called multiple times for one halt). To do dynamic poll adjustment
>>
>> Why this can happen?
> Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
> of TSC deadline timer interrupt). I don't think the edge case exists in
> the latest kernel.

Yeah, hope we both(include Peter Kieser) can test against latest kvm 
tree to avoid confusing. The reason to introduce the adaptive 
halt-polling toggle is to handle the "edge case" as you mentioned above. 
So I think we can make more efforts improve v4 instead. I will improve 
v4 to handle short halt today. ;-)

>
>>
>>>>> correctly,
>>>>> we have to time the length of each halt. Otherwise we hit some bad edge
>>>>> cases:
>>>>>
>>>>>      v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns
>>>>> grew
>>>>> every
>>>>>      time we had a long halt. So idle VMs looked like: 0 us -> 500 us ->
>>>>> 1
>>>>> ms ->
>>>>>      2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at
>>>>> 0
>>>>> when
>>>>>      the halts are long.
>>>>>
>>>>>      v4: v4 fixed the idle overhead problem but broke dynamic growth for
>>>>> message
>>>>>      passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns
>>>>> would
>>>>> grow.
>>>>>      That means vcpu->halt_poll_ns will always be maxed out, even when
>>>>> the
>>>>> halt
>>>>>      time is much less than the max.
>>>>>
>>>>> I think we can fix both edge cases if we make grow/shrink decisions
>>>>> based
>>>>> on
>>>>> the length of kvm_vcpu_block rather than the arrival of a guest
>>>>> interrupt
>>>>> during polling.
>>>>>
>>>>> Some thoughts for dynamic growth:
>>>>>      * Given Windows 10 timer tick (1 ms), let's set the maximum poll
>>>>> time
>>>>> to
>>>>>        less than 1ms. 200 us has been a good value for always-poll. We
>>>>> can
>>>>>        probably go a bit higher once we have your patch. Maybe 500 us?
>>
>> Did you test your patch against a windows guest?
> I have not. I tested against a 250HZ linux guest to check how it performs
> against a ticking guest. Presumably, windows should be the same, but at a
> higher tick rate. Do you have a test for Windows?

I just test the idle vCPUs usage.


V4 for windows 10:

+-----------------+----------------+-----------------------+
|                                 | 
|                                           |
|  w/o halt-poll           |  w/ halt-poll          | dynamic(v4) 
halt-poll         |
+-----------------+----------------+-----------------------+
|                                 | 
|                                           |
|    ~2.1%                    |    ~3.0%                  | ~2.4%       
                   |
+-----------------+----------------+-----------------------+

V4  for linux guest:

+-----------------+----------------+-------------------+
|                 |                |                   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-----------------+----------------+-------------------+
|                 |                |                   |
|    ~0.9%        |    ~1.8%       |     ~1.2%         |
+-----------------+----------------+-------------------+


Regards,
Wanpeng Li

>
>>>>>      * The base case of dynamic growth (the first grow() after being at
>>>>> 0)
>>>>> should
>>>>>        be small. 500 us is too big. When I run TCP_RR in my guest I see
>>>>> poll
>>>>> times
>>>>>        of < 10 us. TCP_RR is on the lower-end of message passing workload
>>>>> latency,
>>>>>        so 10 us would be a good base case.
>>>>
>>>> How to get your TCP_RR benchmark?
>>>>
>>>> Regards,
>>>> Wanpeng Li
>>> Install the netperf package, or build from here:
>>> http://www.netperf.org/netperf/DownloadNetperf.html
>>>
>>> In the vm:
>>>
>>> # ./netserver
>>> # ./netperf -t TCP_RR
>>>
>>> Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
>>> passing workload in order to test halt-polling).
>>
>> Ah, ok, I use the same benchmark as yours.
>>
>> Regards,
>> Wanpeng Li
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/