[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5045EC91.9050406@linux.vnet.ibm.com>
Date: Tue, 04 Sep 2012 17:27:05 +0530
From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To: Rik van Riel <riel@...hat.com>, Gleb Natapov <gleb@...hat.com>
CC: Avi Kivity <avi@...hat.com>, Marcelo Tosatti <mtosatti@...hat.com>,
Srikar <srikar@...ux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
KVM <kvm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>
Subject: Re: [PATCH RFC 1/1] kvm: Use vcpu_id as pivot instead of last boosted
vcpu in PLE handler
On 09/02/2012 09:59 PM, Rik van Riel wrote:
> On 09/02/2012 06:12 AM, Gleb Natapov wrote:
>> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote:
>>> The idea of starting from next vcpu (source of yield_to + 1) seem to
>>> work
>>> well for overcomitted guest rather than using last boosted vcpu. We
>>> can also
>>> remove per VM variable with this approach.
>>>
>>> Iteration for eligible candidate after this patch starts from vcpu
>>> source+1
>>> and ends at source-1 (after wrapping)
>>>
>>> Thanks Nikunj for his quick verification of the patch.
>>>
>>> Please let me know if this patch is interesting and makes sense.
>>>
>> This last_boosted_vcpu thing caused us trouble during attempt to
>> implement vcpu destruction. It is good to see it removed from this POV.
>
> I like this implementation. It should achieve pretty much
> the same as my old code, but without the downsides and without
> having to keep the same amount of global state.
>
My theoretical understanding how it would help is,
|
V
T0 ------- T1
suppose there are 4 vcpus (v1..v4) out of 32/64 vcpus simpultaneously
enter directed yield handler,
if last_boosted_vcpu = i then v1 .. v4 will start from i, and there may
be some unnecessary attempts for directed yields.
We may not see such attempts with above patch. But again I agree that,
whole directed_yield stuff itself is very complicated because of
possibility of each vcpu in different state (running/pauseloop exited
while spinning/eligible) and how they are located w.r.t each other.
Here is the result I got for ebizzy, 32 vcpu guest 32 core PLE machine
for 1x 2x and 3x overcommits.
base = 3.5-rc5 kernel with ple handler improvements patches applied
patched = base + vcpuid patch
base stdev patched stdev %improvement
1x 1955.6250 39.8961 1863.3750 37.8302 -4.71716
2x 2475.3750 165.0307 3078.8750 341.9500 24.38014
3x 2071.5556 91.5370 2112.6667 56.6171 1.98455
Note:
I have to admit that, I am seeing very inconsistent results while
experimenting with 3.6-rc kernel (not specific to vcpuid patch but as a
whole) but not sure if it is some thing wrong in my config or should I
spend some time debugging. Anybody has observed same?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists