linux-kernel - Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50615EE4.1040809@linux.vnet.ibm.com>
Date:	Tue, 25 Sep 2012 13:06:04 +0530
From:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To:	Avi Kivity <avi@...hat.com>
CC:	Rik van Riel <riel@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	chegu vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
	Gleb Natapov <gleb@...hat.com>
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE
 handler

On 09/24/2012 09:11 PM, Avi Kivity wrote:
> On 09/21/2012 08:24 PM, Raghavendra K T wrote:
>> On 09/21/2012 06:32 PM, Rik van Riel wrote:
>>> On 09/21/2012 08:00 AM, Raghavendra K T wrote:
>>>> From: Raghavendra K T<raghavendra.kt@...ux.vnet.ibm.com>
>>>>
>>>> When total number of VCPUs of system is less than or equal to physical
>>>> CPUs,
>>>> PLE exits become costly since each VCPU can have dedicated PCPU, and
>>>> trying to find a target VCPU to yield_to just burns time in PLE handler.
>>>>
>>>> This patch reduces overhead, by simply doing a return in such
>>>> scenarios by
>>>> checking the length of current cpu runqueue.
>>>
>>> I am not convinced this is the way to go.
>>>
>>> The VCPU that is holding the lock, and is not releasing it,
>>> probably got scheduled out. That implies that VCPU is on a
>>> runqueue with at least one other task.
>>
>> I see your point here, we have two cases:
>>
>> case 1)
>>
>> rq1 : vcpu1->wait(lockA) (spinning)
>> rq2 : vcpu2->holding(lockA) (running)
>>
>> Here Ideally vcpu1 should not enter PLE handler, since it would surely
>> get the lock within ple_window cycle. (assuming ple_window is tuned for
>> that workload perfectly).
>>
>> May be this explains why we are not seeing benefit with kernbench.
>>
>> On the other side, Since we cannot have a perfect ple_window tuned for
>> all type of workloads, for those workloads, which may need more than
>> 4096 cycles, we gain. thinking is it that we are seeing in benefited
>> cases?
>
> Maybe we need to increase the ple window regardless.  4096 cycles is 2
> microseconds or less (call it t_spin).  The overhead from
> kvm_vcpu_on_spin() and the associated task switches is at least a few
> microseconds, increasing as contention is added (call it t_tield).  The
> time for a natural context switch is several milliseconds (call it
> t_slice).  There is also the time the lock holder owns the lock,
> assuming no contention (t_hold).
>
> If t_yield>  t_spin, then in the undercommitted case it dominates
> t_spin.  If t_hold>  t_spin we lose badly.
>
> If t_spin>  t_yield, then the undercommitted case doesn't suffer as much
> as most of the spinning happens in the guest instead of the host, so it
> can pick up the unlock timely.  We don't lose too much in the
> overcommitted case provided the values aren't too far apart (say a
> factor of 3).
>
> Obviously t_spin must be significantly smaller than t_slice, otherwise
> it accomplishes nothing.
>
> Regarding t_hold: if it is small, then a larger t_spin helps avoid false
> exits.  If it is large, then we're not very sensitive to t_spin.  It
> doesn't matter if it takes us 2 usec or 20 usec to yield, if we end up
> yielding for several milliseconds.
>
> So I think it's worth trying again with ple_window of 20000-40000.
>

Agree that spinning is not costly and  I have tried increasing
ple_window earlier. I 'll give one more shot.

I was thinking, unnessary spinning of vcpus (spinning when lockholder
is preempted), add up to degradation significantly, especially in
ticketlock scenario is more problemtic. no?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/