[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48B53F97.20101@novell.com>
Date: Wed, 27 Aug 2008 07:50:47 -0400
From: Gregory Haskins <ghaskins@...ell.com>
To: Nick Piggin <nickpiggin@...oo.com.au>
CC: mingo@...e.hu, srostedt@...hat.com, peterz@...radead.org,
linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
npiggin@...e.de, gregory.haskins@...il.com
Subject: Re: [PATCH 2/5] sched: pull only one task during NEWIDLE balancing
to limit critical section
Nick Piggin wrote:
> On Tuesday 26 August 2008 21:36, Gregory Haskins wrote:
>
>> Nick Piggin wrote:
>>
>>> On Tuesday 26 August 2008 06:15, Gregory Haskins wrote:
>>>
>>>> git-id c4acb2c0669c5c5c9b28e9d02a34b5c67edf7092 attempted to limit
>>>> newidle critical section length by stopping after at least one task
>>>> was moved. Further investigation has shown that there are other
>>>> paths nested further inside the algorithm which still remain that allow
>>>> long latencies to occur with newidle balancing. This patch applies
>>>> the same technique inside balance_tasks() to limit the duration of
>>>> this optional balancing operation.
>>>>
>>>> Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
>>>> CC: Nick Piggin <npiggin@...e.de>
>>>>
>>> Hmm, this (andc4acb2c0669c5c5c9b28e9d02a34b5c67edf7092) still could
>>> increase the amount of work to do significantly for workloads where
>>> the CPU is going idle and pulling tasks over frequently. I don't
>>> really like either of them too much.
>>>
>> I had a feeling you may object to this patch based on your comments on
>> the first one. Thats why I CC'd you so you wouldnt think I was trying
>> to sneak something past ;)
>>
>
> Appreciated.
>
>
>
>>> Maybe increasing the limit would effectively amortize most of the
>>> problem (say, limit to move 16 tasks at most).
>>>
>> The problem I was seeing was that even moving 2 was too many in the
>> ftraces traces I looked at. I think the idea of making a variable limit
>> (set via a sysctl, etc) here is a good one, but I would recommend we
>> have the default be "1" for CONFIG_PREEMPT (or at least
>> CONFIG_PREEMPT_RT) based on what I know right now. I know last time
>> you objected to any kind of special cases for the preemptible kernels,
>> but I think this is a good compromise. Would this be acceptable?
>>
>
> Well I _prefer_ not to have a special case for preemptible kernels, but
> we already have similar arbitrary kind of changes like in tlb flushing,
> so...
>
> I understand and accept there are some places where fundamentally you
> have to trade latency for throughput, so at some point we have to have a
> config and/or sysctl for that.
>
> I'm surprised 2 is too much but 1 is OK. Seems pretty fragile to me.
Its not that 1 is magically "ok". Its simply that newidle balancing
hurts latency, and 1 is the minimum to pull to reasonably reduce the
critical section. I already check if we NEEDS_RESCHED before taking the
rq->lock in newidle, so waiting for one task to pull is the first
opportunity I have to end the section as quickly as possible. It would
be nice if I could just keep going if I could detect whether there was
not any real contention. Let me give this angle some more thought.
> Are
> you just running insane tests that load up the runqueues heaps and tests
> latency? -rt users will have to understand that some algorithms scale
> linearly or so with the number of a particular resource allocated, so
> they aren't going to get a constant low latency under arbitrary
> conditions.
>
> FWIW, if you haven't already, then for -rt you might want to look at a
> more advanced data structure than simple run ordered list for moving tasks
> from one rq to the other. A simple one I was looking at is a time ordered
> list to pull the most cache cold tasks (and thus we can stop searching
> when we encounter the first cache hot task, in situations where it is
> appropriate, etc).
>
Im not sure I follow your point, but if I do note that the RT scheduler
uses a completely different load balancer (that is priority ordered).
> Anyway... yeah I'm OK with this if it is under a config option.
>
Cool.. See v2 ;)
Thanks Nick,
-Greg
Download attachment "signature.asc" of type "application/pgp-signature" (258 bytes)
Powered by blists - more mailing lists