[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.999.0710021514040.3579@woody.linux-foundation.org>
Date: Tue, 2 Oct 2007 15:32:53 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: Anders Bostr?m <anders@...trom.dyndns.org>,
linux-kernel@...r.kernel.org, arjan@...ux.intel.com
Subject: Re: PROBLEM: high load average when idle
On Tue, 2 Oct 2007, Andrew Morton wrote:
>
> This is unexpected. High load average is due to either a task chewing a
> lot of CPU time or a task stuck in uninterruptible sleep.
Not necessarily.
We saw high loadaverages with the timer bogosity with "gettimeofday()" and
"select()" not agreeing, so they would do things like
date = time(..)
select(.. , timeout = <time + 1> )
and when "date" wasn't taking the jiffies offset into account, and thus
mixing these kinds of different time sources, the select ended up
returning immediately because they effectively used different clocks, and
suddenly we had some applications chewing up 30% CPU time, because they
were in a loop that *tried* to sleep.
And I wonder if the same kind thing is effectively happening here: the
code is written so that it *tries* to sleep, but the rounding of the clock
basically means that it's trying to sleep using a different clock than the
one we're using to wake things up with, so some percentage of the time it
doesn't sleep at all!
I wonder if the whole "round_jiffies()" thing should be written so that it
never rounds down, or at least never rounds down to before the current
second!
I have to say, I also think it's a bit iffy to do "round_jiffies()" at all
in that per-CPU kind of way. The "per-cpu" thing is quite possibly going
to change by the time we actually add the timer, so the goal of trying to
get wakeups to happen in "bunches" per CPU should really be done by
setting a flag on the timer itself - so that we could do that rounding
when the timer is actually added to the per-cpu queues!
Now, I think the JBD "t_expires" field should never be "near" in seconds,
so I do find it a bit surprising that this rounding can have any effect,
but on the other hand it clearly *does* have some effect, so.. It migt
just be interacting with some other use, of course.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists