[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikFRLXrdY6_qV5PHeSgH1Dd24ApaHs8R-ixOed=@mail.gmail.com>
Date: Thu, 14 Oct 2010 03:25:33 -0700
From: Paul Turner <pjt@...gle.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc: Peter Zijlstra <peterz@...radead.org>, bharata@...ux.vnet.ibm.com,
linux-kernel@...r.kernel.org,
Dhaval Giani <dhaval.giani@...il.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...e.hu>,
Pavel Emelyanov <xemul@...nvz.org>,
Herbert Poetzl <herbert@...hfloor.at>,
Avi Kivity <avi@...hat.com>,
Chris Friesen <cfriesen@...tel.com>,
Paul Menage <menage@...gle.com>,
Mike Waychison <mikew@...gle.com>,
Nikhil Rao <ncrao@...gle.com>
Subject: Re: [PATCH v3 3/7] sched: throttle cfs_rq entities which exceed their
local quota
On Thu, Oct 14, 2010 at 3:08 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@...fujitsu.com> wrote:
> On Thu, 14 Oct 2010 11:59:55 +0200
> Peter Zijlstra <peterz@...radead.org> wrote:
>
>> On Thu, 2010-10-14 at 18:50 +0900, KAMEZAWA Hiroyuki wrote:
>> > On Thu, 14 Oct 2010 11:12:22 +0200
>> > Peter Zijlstra <peterz@...radead.org> wrote:
>> >
>> > > On Wed, 2010-10-13 at 15:34 +0900, KAMEZAWA Hiroyuki wrote:
>> > > > cpu.share and bandwidth control can't be used simultaneously or...
>> > > > is this fair ? I'm not familiar with scheduler but this allows boost this tg.
>> > > > Could you add a brief documentaion of a spec/feature. in the next post ?
>> > >
>> > > Like explained, shares control the proportional distribution of time
>> > > between groups, bandwidth puts a limit on how much time a group can
>> > > take. It can cause a group to receive less than its fair share, but
>> > > never more.
>> > >
>> > > There is, however, a problem with all this, and that is that all this
>> > > explicit idling of tasks can lead to a form of priority inversion.
>> > > Regular preemptive scheduling already suffers from this, but explicitly
>> > > idling tasks exacerbates the situation.
>> > >
>> > > You basically get to add the longest induced idle time to all your lock
>> > > hold times.
>> > >
>> >
>> > What is the user-visible difference of the problem between
>> > 1) limit share to be very small.
>> > 2) use throttole.
>> >
>> > If share is used, lock-hodler's priority is boosted ?
>>
>> No, both lead to the same problem, its just that this adds another
>> dimension to it.. and I'm fairly sure people won't realise this until it
>> bites them in the ass.
>>
> Hmm, them, existing problem but this add a new pitfall.
>
> What's your recomendation to make progess on this work ?
>
> I think 1st step will be..
> - explain the problem of priority inversion in cgroup+cfs documenation with
> !!CAUTION!!
>
> I'm sorry I'm not sure there have been trials for fixing priority inversion
> in the linux scheduler development.
>
> Explaining my motivation, a user of this feature on my customer is virtual machine
> rental service. So, some fuctionality as
> "When vcpu holds spinlock in kernel, please don't sleep.." will be nice.
> Is there patch already ?
>
Per above:
When a group exceeds its bandwidth we don't actively force it off the
cpu, we only set TIF_RESCHED; we won't process the throttling until we
drop back down to userspace and handle the flag.
This means: we'll never throttle a spinlock
We'll also only throttle a sleepable lock (that doesn't disable
preemption) when they voluntarily reschedule without releasing the
lock, at which point they've chosen to open themselves to an arbitrary
latency period anyway.
**
The case of a guest cpu holding spinlocks is part of a much larger
rabbit hole that is spinlock enlightenment which should occur via
pvops/etc interaction. The sane thing for this to do would be to (at
least) preempt_disable() at which point the vcpu will be protected
from throttling.
This seems somewhat orthogonal to this patchset however.
**
Agreed that PI inversion across threads and across vcpus are rather
sickly beasts; especially given how bare the curtains are on the first
case (which the second can only really build upon).
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists