[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120619202047.26191.40429.sendpatchset@codeblue>
Date: Wed, 20 Jun 2012 01:50:50 +0530
From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To: Avi Kivity <avi@...hat.com>, Marcelo Tosatti <mtosatti@...hat.com>,
Rik van Riel <riel@...hat.com>
Cc: Srikar <srikar@...ux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
KVM <kvm@...r.kernel.org>,
Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...hat.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Regarding improving ple handler (vcpu_on_spin)
In ple handler code, last_boosted_vcpu (lbv) variable is
serving as reference point to start when we enter.
lbv = kvm->lbv;
for each vcpu i of kvm
if i is eligible
if yield_to(i) is success
lbv = i
currently this variable is per VM and it is set after we do
yield_to(target), unfortunately it may take little longer
than we expect to come back again (depending on its lag in rb tree)
on successful yield and set the value.
So when several ple_handle entry happens before it is set,
all of them start from same place. (and overall RR is also slower).
Also statistical analysis (below) is showing lbv is not very well
distributed with current approach.
naturally, first approach is to move lbv before yield_to, without
bothering failure case to make RR fast. (was in Rik's V4
vcpu_on_spin patch series).
But when I did performance analysis, in no-overcommit scenario,
I saw violent/cascaded directed yield happening, leading to
more wastage of cpu in spinning. (huge degradation in 1x and
improvement in 3x, I assume this was the reason it was moved after
yield_to in V5 of vcpu_on_spin series.)
Second approach, I tried was,
(1) get rid of per kvm lbv variable
(2) everybody who enters handler start from a random vcpu as reference
point.
The above gave good distribution of starting point,(and performance
improvement in 32 vcpu guest I tested) and also IMO, it scales well
for larger VM's.
Analysis
=============
Four 32 vcpu guest running with one of them running kernbench.
PLE handler yield stat is the statistics for successfully
yielded case (for 32 vcpus)
PLE handler start stat is the statistics for frequency of
each vcpu index as starting point (for 32 vcpus)
snapshot1
=============
PLE handler yield stat :
274391 33088 32554 46688 46653 48742 48055 37491
38839 31799 28974 30303 31466 45936 36208 51580
32754 53441 28956 30738 37940 37693 26183 40022
31725 41879 23443 35826 40985 30447 37352 35445
PLE handler start stat :
433590 383318 204835 169981 193508 203954 175960 139373
153835 125245 118532 140092 135732 134903 119349 149467
109871 160404 117140 120554 144715 125099 108527 125051
111416 141385 94815 138387 154710 116270 123130 173795
snapshot2
============
PLE handler yield stat :
1957091 59383 67866 65474 100335 77683 80958 64073
53783 44620 80131 81058 66493 56677 74222 74974
42398 132762 48982 70230 78318 65198 54446 104793
59937 57974 73367 96436 79922 59476 58835 63547
PLE handler start stat :
2555089 611546 461121 346769 435889 452398 407495 314403
354277 298006 364202 461158 344783 288263 342165 357270
270887 451660 300020 332120 378403 317848 307969 414282
351443 328501 352840 426094 375050 330016 347540 371819
So questions I have in mind is,
1. Do you think going for randomizing last_boosted_vcpu and get rid
of per VM variable is better?
2. Can/Do we have a mechanism, from which we will be able to decide
not to yield to vcpu who is doing frequent PLE exit (possibly
because he is doing unnecessary busy-waits) OR doing yield_to better
candidate?
On a side note: With pv patches I have tried doing yield_to a kicked
VCPU, in vcpu_block path and is giving some performance improvement.
Please let me know if you have any comments/suggestions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists