linux-kernel - Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50641569.9060305@redhat.com>
Date:	Thu, 27 Sep 2012 10:59:21 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Gleb Natapov <gleb@...hat.com>
CC:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Rik van Riel <riel@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	chegu vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE
 handler

On 09/27/2012 09:44 AM, Gleb Natapov wrote:
> On Tue, Sep 25, 2012 at 10:54:21AM +0200, Avi Kivity wrote:
>> On 09/25/2012 10:09 AM, Raghavendra K T wrote:
>> > On 09/24/2012 09:36 PM, Avi Kivity wrote:
>> >> On 09/24/2012 05:41 PM, Avi Kivity wrote:
>> >>>
>> >>>>
>> >>>> case 2)
>> >>>> rq1 : vcpu1->wait(lockA) (spinning)
>> >>>> rq2 : vcpu3 (running) ,  vcpu2->holding(lockA) [scheduled out]
>> >>>>
>> >>>> I agree that checking rq1 length is not proper in this case, and as
>> >>>> you
>> >>>> rightly pointed out, we are in trouble here.
>> >>>> nr_running()/num_online_cpus() would give more accurate picture here,
>> >>>> but it seemed costly. May be load balancer save us a bit here in not
>> >>>> running to such sort of cases. ( I agree load balancer is far too
>> >>>> complex).
>> >>>
>> >>> In theory preempt notifier can tell us whether a vcpu is preempted or
>> >>> not (except for exits to userspace), so we can keep track of whether
>> >>> it's we're overcommitted in kvm itself.  It also avoids false positives
>> >>> from other guests and/or processes being overcommitted while our vm
>> >>> is fine.
>> >>
>> >> It also allows us to cheaply skip running vcpus.
>> >
>> > Hi Avi,
>> >
>> > Could you please elaborate on how preempt notifiers can be used
>> > here to keep track of overcommit or skip running vcpus?
>> >
>> > Are we planning set some flag in sched_out() handler etc?
>> >
>> 
>> Keep a bitmap kvm->preempted_vcpus.
>> 
>> In sched_out, test whether we're TASK_RUNNING, and if so, set a vcpu
>> flag and our bit in kvm->preempted_vcpus.  On sched_in, if the flag is
>> set, clear our bit in kvm->preempted_vcpus.  We can also keep a counter
>> of preempted vcpus.
>> 
>> We can use the bitmap and the counter to quickly see if spinning is
>> worthwhile (if the counter is zero, better to spin).  If not, we can use
>> the bitmap to select target vcpus quickly.
>> 
>> The only problem is that in order to keep this accurate we need to keep
>> the preempt notifiers active during exits to userspace.  But we can
>> prototype this without this change, and add it later if it works.
>> 
> Can user return notifier can be used instead? Set bit in
> kvm->preempted_vcpus on return to userspace.
> 

User return notifier is per-cpu, not per-task.  There is a new task_work
(<linux/task_work.h>) that does what you want.  With these
technicalities out of the way, I think it's the wrong idea.  If a vcpu
thread is in userspace, that doesn't mean it's preempted, there's no
point in boosting it if it's already running.

btw, we can have secondary effects.  A vcpu can be waiting for a lock in
the host kernel, or for a host page fault.  There's no point in boosting
anything for that.  Or a vcpu in userspace can be waiting for a lock
that is held by another thread, which has been preempted.  This is (like
I think Peter already said) a priority inheritance problem.  However
with fine-grained locking in userspace, we can make it go away.  The
guest kernel is unlikely to access one device simultaneously from two
threads (and if it does, we just need to improve the threading in the
device model).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/