[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50642139.80309@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 15:19:45 +0530
From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To: dlaor@...hat.com
CC: Chegu Vinod <chegu_vinod@...com>,
Peter Zijlstra <peterz@...radead.org>,
"H. Peter Anvin" <hpa@...or.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Ingo Molnar <mingo@...hat.com>, Avi Kivity <avi@...hat.com>,
Rik van Riel <riel@...hat.com>,
Srikar <srikar@...ux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
Gleb Natapov <gleb@...hat.com>,
Andrew Jones <drjones@...hat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios
in PLE handler
On 09/25/2012 08:30 PM, Dor Laor wrote:
> On 09/24/2012 02:02 PM, Raghavendra K T wrote:
>> On 09/24/2012 02:12 PM, Dor Laor wrote:
>>> In order to help PLE and pvticketlock converge I thought that a small
>>> test code should be developed to test this in a predictable,
>>> deterministic way.
>>>
>>> The idea is to have a guest kernel module that spawn a new thread each
>>> time you write to a /sys/.... entry.
>>>
>>> Each such a thread spins over a spin lock. The specific spin lock is
>>> also chosen by the /sys/ interface. Let's say we have an array of spin
>>> locks *10 times the amount of vcpus.
>>>
>>> All the threads are running a
>>> while (1) {
>>>
>>> spin_lock(my_lock);
>>> sum += execute_dummy_cpu_computation(time);
>>> spin_unlock(my_lock);
>>>
>>> if (sys_tells_thread_to_die()) break;
>>> }
>>>
>>> print_result(sum);
>>>
>>> Instead of calling the kernel's spin_lock functions, clone them and make
>>> the ticket lock order deterministic and known (like a linear walk of all
>>> the threads trying to catch that lock).
>>
>> By Cloning you mean hierarchy of the locks?
>
> No, I meant to clone the implementation of the current spin lock code in
> order to set any order you may like for the ticket selection.
> (even for a non pvticket lock version)
>
> For instance, let's say you have N threads trying to grab the lock, you
> can always make the ticket go linearly from 1->2...->N.
> Not sure it's a good idea, just a recommendation.
>
>> Also I believe time should be passed via sysfs / hardcoded for each
>> type of lock we are mimicking
>
> Yap
>
>>
>>>
>>> This way you can easy calculate:
>>> 1. the score of a single vcpu running a single thread
>>> 2. the score of sum of all thread scores when #thread==#vcpu all
>>> taking the same spin lock. The overall sum should be close as
>>> possible to #1.
>>> 3. Like #2 but #threads > #vcpus and other versions of #total vcpus
>>> (belonging to all VMs) > #pcpus.
>>> 4. Create #thread == #vcpus but let each thread have it's own spin
>>> lock
>>> 5. Like 4 + 2
>>>
>>> Hopefully this way will allows you to judge and evaluate the exact
>>> overhead of scheduling VMs and threads since you have the ideal result
>>> in hand and you know what the threads are doing.
>>>
>>> My 2 cents, Dor
>>>
>>
>> Thank you,
>> I think this is an excellent idea. ( Though I am trying to put all the
>> pieces together you mentioned). So overall we should be able to measure
>> the performance of pvspinlock/PLE improvements with a deterministic
>> load in guest.
>>
>> Only thing I am missing is,
>> How to generate different combinations of the lock.
>>
>> Okay, let me see if I can come with a solid model for this.
>>
>
> Do you mean the various options for PLE/pvticket/other? I haven't
> thought of it and assumed its static but it can also be controlled
> through the temporary /sys interface.
>
No, I am not there yet.
So In summary, we are suffering with inconsistent benchmark result,
while measuring the benefit of our improvement in PLE/pvlock etc..
So good point from your suggestion is,
- Giving predictability to workload that runs in guest, so that we have
pi-pi comparison of improvement.
- we can easily tune the workload via sysfs, and we can have script to
automate them.
What is complicated is:
- How can we simulate a workload close to what we measure with
benchmarks?
- How can we mimic lock holding time/ lock hierarchy close to the way
it is seen with real workloads (for e.g. highly contended zone lru lock
with similar amount of lockholding times).
- How close it would be to when we forget about other types of spinning
(for e.g, flush_tlb).
So I feel it is not as trivial as it looks like.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists