linux-kernel - Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 27 Sep 2012 15:51:37 +0530
From:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To:	Andrew Jones <drjones@...hat.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>, Avi Kivity <avi@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	chegu vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
	Gleb Natapov <gleb@...hat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios
 in PLE handler

On 09/26/2012 06:27 PM, Andrew Jones wrote:
> On Mon, Sep 24, 2012 at 02:36:05PM +0200, Peter Zijlstra wrote:
>> On Mon, 2012-09-24 at 17:22 +0530, Raghavendra K T wrote:
>>> On 09/24/2012 05:04 PM, Peter Zijlstra wrote:
>>>> On Fri, 2012-09-21 at 17:29 +0530, Raghavendra K T wrote:
>>>>> In some special scenarios like #vcpu<= #pcpu, PLE handler may
>>>>> prove very costly, because there is no need to iterate over vcpus
>>>>> and do unsuccessful yield_to burning CPU.
>>>>
>>>> What's the costly thing? The vm-exit, the yield (which should be a nop
>>>> if its the only task there) or something else entirely?
>>>>
>>> Both vmexit and yield_to() actually,
>>>
>>> because unsuccessful yield_to() overall is costly in PLE handler.
>>>
>>> This is because when we have large guests, say 32/16 vcpus, and one
>>> vcpu is holding lock, rest of the vcpus waiting for the lock, when they
>>> do PL-exit, each of the vcpu try to iterate over rest of vcpu list in
>>> the VM and try to do directed yield (unsuccessful). (O(n^2) tries).
>>>
>>> this results is fairly high amount of cpu burning and double run queue
>>> lock contention.
>>>
>>> (if they were spinning probably lock progress would have been faster).
>>> As Avi/Chegu Vinod had felt it is better to avoid vmexit itself, which
>>> seems little complex to achieve currently.
>>
>> OK, so the vmexit stays and we need to improve yield_to.
>
> Can't we do this check sooner as well, as it only requires per-cpu data?
> If we do it way back in kvm_vcpu_on_spin, then we avoid get_pid_task()
> and a bunch of read barriers from kvm_for_each_vcpu. Also, moving the test
> into kvm code would allow us to do other kvm things as a result of the
> check in order to avoid some vmexits. It looks like we should be able to
> avoid some without much complexity by just making a per-vm ple_window
> variable, and then, when we hit the nr_running == 1 condition, also doing
> vmcs_write32(PLE_WINDOW, (kvm->ple_window += PLE_WINDOW_BUMP))
> Reset the window to the default value when we successfully yield (and
> maybe we should limit the number of bumps).

We indeed checked early in original undercommit patch and it has given
result closer to PLE disabled case. But Agree with Peter that it is ugly 
to export nr_running info to ple handler.

Looking at the result and comparing result of A and C,
> Base = 3.6.0-rc5 + ple handler optimization patches
> A = Base + checking rq_running in vcpu_on_spin() patch
> B = Base + checking rq->nr_running in sched/core
> C = Base - PLE
>
>    % improvements w.r.t BASE
> ---+------------+------------+------------+
>    |      A     |    B       |     C      |
> ---+------------+------------+------------+
> 1x | 206.37603  |  139.70410 |  210.19323 |

I have a feeling that vmexit has not caused significant overhead
compared to iterating over vcpus in PLE handler.. Does it not sound so?

But
> vmcs_write32(PLE_WINDOW, (kvm->ple_window += PLE_WINDOW_BUMP))

is worth trying. I will have to see it eventually.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/