linux-kernel - Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1348502600.11847.90.camel@twins>
Date:	Mon, 24 Sep 2012 18:03:20 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Avi Kivity <avi@...hat.com>
Cc:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>, Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	chegu vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
	Gleb Natapov <gleb@...hat.com>,
	Andrew Jones <drjones@...hat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios
 in PLE handler

On Mon, 2012-09-24 at 17:51 +0200, Avi Kivity wrote:
> On 09/24/2012 03:54 PM, Peter Zijlstra wrote:
> > On Mon, 2012-09-24 at 18:59 +0530, Raghavendra K T wrote:
> >> However Rik had a genuine concern in the cases where runqueue is not
> >> equally distributed and lockholder might actually be on a different run 
> >> queue but not running.
> > 
> > Load should eventually get distributed equally -- that's what the
> > load-balancer is for -- so this is a temporary situation.
> 
> What's the expected latency?  This is the whole problem.  Eventually the
> scheduler would pick the lock holder as well, the problem is that it's
> in the millisecond scale while lock hold times are in the microsecond
> scale, leading to a 1000x slowdown.

Yeah I know.. Heisenberg's uncertainty applied to SMP computing becomes
something like accurate or fast, never both.

> If we want to yield, we really want to boost someone.

Now if only you knew which someone ;-) This non-modified guest nonsense
is such a snake pit.. but you know how I feel about all that.

> > We already try and favour the non running vcpu in this case, that's what
> > yield_to_task_fair() is about. If its still not eligible to run, tough
> > luck.
> 
> Crazy idea: instead of yielding, just run that other vcpu in the thread
> that would otherwise spin.  I can see about a million objections to this
> already though.

Yah.. you want me to list a few? :-) It would require synchronization
with the other cpu to pull its task -- one really wants to avoid it also
running it.

Do this at a high enough frequency and you're dead too.

Anyway, you can do this inside the KVM stuff, simply flip the vcpu state
associated with a vcpu thread and use the preemption notifiers to sort
things against the scheduler or somesuch.

> >> Do you think instead of using rq->nr_running, we could get a global 
> >> sense of load using avenrun (something like avenrun/num_onlinecpus) 
> > 
> > To what purpose? Also, global stuff is expensive, so you should try and
> > stay away from it as hard as you possibly can.
> 
> Spinning is also expensive.  How about we do the global stuff every N
> times, to amortize the cost (and reduce contention)?

Nah, spinning isn't expensive, its a waste of time, similar end result
for someone who wants to do useful work though, but not the same cause.

Pick N and I'll come up with a scenario for which its wrong ;-)

Anyway, its an ugly problem and one I really want to contain inside the
insanity that created it (virt), lets not taint the rest of the kernel
more than we need to. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/