[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200808271641.46359.nickpiggin@yahoo.com.au>
Date: Wed, 27 Aug 2008 16:41:46 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Gregory Haskins <ghaskins@...ell.com>
Cc: mingo@...e.hu, srostedt@...hat.com, peterz@...radead.org,
linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
npiggin@...e.de, gregory.haskins@...il.com
Subject: Re: [PATCH 2/5] sched: pull only one task during NEWIDLE balancing to limit critical section
On Tuesday 26 August 2008 21:36, Gregory Haskins wrote:
> Nick Piggin wrote:
> > On Tuesday 26 August 2008 06:15, Gregory Haskins wrote:
> >> git-id c4acb2c0669c5c5c9b28e9d02a34b5c67edf7092 attempted to limit
> >> newidle critical section length by stopping after at least one task
> >> was moved. Further investigation has shown that there are other
> >> paths nested further inside the algorithm which still remain that allow
> >> long latencies to occur with newidle balancing. This patch applies
> >> the same technique inside balance_tasks() to limit the duration of
> >> this optional balancing operation.
> >>
> >> Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
> >> CC: Nick Piggin <npiggin@...e.de>
> >
> > Hmm, this (andc4acb2c0669c5c5c9b28e9d02a34b5c67edf7092) still could
> > increase the amount of work to do significantly for workloads where
> > the CPU is going idle and pulling tasks over frequently. I don't
> > really like either of them too much.
>
> I had a feeling you may object to this patch based on your comments on
> the first one. Thats why I CC'd you so you wouldnt think I was trying
> to sneak something past ;)
Appreciated.
> > Maybe increasing the limit would effectively amortize most of the
> > problem (say, limit to move 16 tasks at most).
>
> The problem I was seeing was that even moving 2 was too many in the
> ftraces traces I looked at. I think the idea of making a variable limit
> (set via a sysctl, etc) here is a good one, but I would recommend we
> have the default be "1" for CONFIG_PREEMPT (or at least
> CONFIG_PREEMPT_RT) based on what I know right now. I know last time
> you objected to any kind of special cases for the preemptible kernels,
> but I think this is a good compromise. Would this be acceptable?
Well I _prefer_ not to have a special case for preemptible kernels, but
we already have similar arbitrary kind of changes like in tlb flushing,
so...
I understand and accept there are some places where fundamentally you
have to trade latency for throughput, so at some point we have to have a
config and/or sysctl for that.
I'm surprised 2 is too much but 1 is OK. Seems pretty fragile to me. Are
you just running insane tests that load up the runqueues heaps and tests
latency? -rt users will have to understand that some algorithms scale
linearly or so with the number of a particular resource allocated, so
they aren't going to get a constant low latency under arbitrary
conditions.
FWIW, if you haven't already, then for -rt you might want to look at a
more advanced data structure than simple run ordered list for moving tasks
from one rq to the other. A simple one I was looking at is a time ordered
list to pull the most cache cold tasks (and thus we can stop searching
when we encounter the first cache hot task, in situations where it is
appropriate, etc).
Anyway... yeah I'm OK with this if it is under a config option.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists