[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <464DA995.8060400@bigpond.net.au>
Date: Fri, 18 May 2007 23:26:45 +1000
From: Peter Williams <pwil3058@...pond.net.au>
To: Ingo Molnar <mingo@...e.hu>
CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [patch] CFS scheduler, -v12
Peter Williams wrote:
> Ingo Molnar wrote:
>> * Peter Williams <pwil3058@...pond.net.au> wrote:
>>
>>> I've now done this test on a number of kernels: 2.6.21 and 2.6.22-rc1
>>> with and without CFS; and the problem is always present. It's not
>>> "nice" related as the all four tasks are run at nice == 0.
>>
>> could you try -v13 and did this behavior get better in any way?
>
> It's still there but I've got a theory about what the problems is that
> is supported by some other tests I've done.
>
> What I'd forgotten is that I had gkrellm running as well as top (to
> observe which CPU tasks were on) at the same time as the spinners were
> running. This meant that between them top, gkrellm and X were using
> about 2% of the CPU -- not much but enough to make it possible that at
> least one of them was running when the load balancer was trying to do
> its thing.
>
> This raises two possibilities: 1. the system looked balanced and 2. the
> system didn't look balanced but one of top, gkrellm or X was moved
> instead of one of the spinners.
>
> If it's 1 then there's not much we can do about it except say that it
> only happens in these strange circumstances. If it's 2 then we may have
> to modify the way move_tasks() selects which tasks to move (if we think
> that the circumstances warrant it -- I'm not sure that this is the case).
>
> To examine these possibilities I tried two variations of the test.
>
> a. run the spinners at nice == -10 instead of nice == 0. When I did
> this the load balancing was perfect on 10 consecutive runs which
> according to my calculations makes it 99.9999997 certain that this
> didn't happen by chance. This supports theory 2 above.
>
> b. run the tests without gkrellm running but use nice == 0 for the
> spinners. When I did this the load balancing was mostly perfect but was
> quite volatile (switching between a 2/2 and 1/3 allocation of spinners
> to CPUs) but the %CPU allocation was quite good with the spinners all
> getting approximately 49% of a CPU each. This also supports theory 2
> above and gives weak support to theory 1 above.
>
> This leaves the question of what to do about it. Given that most CPU
> intensive tasks on a real system probably only run for a few tens of
> milliseconds it probably won't matter much on a real system except that
> a malicious user could exploit it to disrupt a system.
>
> So my opinion is that we probably do need to do something about it but
> that it's not urgent.
>
> One thing that might work is to jitter the load balancing interval a
> bit. The reason I say this is that one of the characteristics of top
> and gkrellm is that they run at a more or less constant interval (and,
> in this case, X would also be following this pattern as it's doing
> screen updates for top and gkrellm) and this means that it's possible
> for the load balancing interval to synchronize with their intervals
> which in turn causes the observed problem. A jittered load balancing
> interval should break the synchronization. This would certainly be
> simpler than trying to change the move_task() logic for selecting which
> tasks to move.
I should have added that the reason I think this mooted synchronization
is the cause of the problem is that I can think of no other way that
tasks with such low activity (2% between the 3 of them) could cause the
imbalance of the spinner to CPU allocation to be so persistent.
>
> What do you think?
Peter
--
Peter Williams pwil3058@...pond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists