[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <h2s40ec3ea41004011300zaa5699f0kcf07434fec9b1aab@mail.gmail.com>
Date: Thu, 1 Apr 2010 16:00:24 -0400
From: Chase Douglas <chase.douglas@...onical.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
kernel-team <kernel-team@...ts.ubuntu.com>
Subject: Re: [REGRESSION 2.6.30][PATCH 1/1] sched: defer idle accounting till
after load update period
On Thu, Apr 1, 2010 at 3:37 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Thu, 1 Apr 2010, Andrew Morton wrote:
>> On Mon, 29 Mar 2010 09:41:12 -0400
>> Chase Douglas <chase.douglas@...onical.com> wrote:
>>
>> > There's a period of 10 ticks where calc_load_tasks is updated by all the
>> > cpus for the load avg. Usually all the cpus do this during the first
>> > tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
>> > However, if they wake up calc_load_tasks is not incremented. Thus, if
>> > cpus go idle during the 10 tick period, calc_load_tasks may be
>> > decremented to a non-representative value. This issue can lead to
>> > systems having a load avg of exactly 0, even though the real load avg
>> > could theoretically be up to NR_CPUS.
>> >
>> > This change defers calc_load_tasks accounting after each cpu updates the
>> > count until after the 10 tick period.
>> >
>> > BugLink: http://bugs.launchpad.net/bugs/513848
>> >
>>
>> There was useful information in the [patch 0/1] email, such as the
>> offending commit ID. If possible, it's best to avoid the [patch 0/n]
>> thing altogether - that information either has to be moved into the
>> [patch 1/n] changelog by someone (ie: me), or it just gets ommitted and
>> lost.
>>
>>
>> > ---
>> > kernel/sched.c | 16 ++++++++++++++--
>> > 1 files changed, 14 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched.c b/kernel/sched.c
>> > index 9ab3cd7..c0aedac 100644
>> > --- a/kernel/sched.c
>> > +++ b/kernel/sched.c
>> > @@ -3064,7 +3064,8 @@ void calc_global_load(void)
>> > */
>> > static void calc_load_account_active(struct rq *this_rq)
>> > {
>> > - long nr_active, delta;
>> > + static atomic_long_t deferred;
>> > + long nr_active, delta, deferred_delta;
>> >
>> > nr_active = this_rq->nr_running;
>> > nr_active += (long) this_rq->nr_uninterruptible;
>> > @@ -3072,6 +3073,17 @@ static void calc_load_account_active(struct rq *this_rq)
>> > if (nr_active != this_rq->calc_load_active) {
>> > delta = nr_active - this_rq->calc_load_active;
>> > this_rq->calc_load_active = nr_active;
>> > +
>> > + /* Need to defer idle accounting during load update period: */
>> > + if (unlikely(time_before(jiffies, this_rq->calc_load_update) &&
>> > + time_after_eq(jiffies, calc_load_update))) {
>> > + atomic_long_add(delta, &deferred);
>> > + return;
>> > + }
>>
>> That seems a sensible way to avoid the gross-looking "10 ticks" thing.
>>
>> What was the reason for "update the avenrun load estimates 10 ticks
>> after the CPUs have updated calc_load_tasks"? Can we do something
>> smarter there to fix this?
>
> The reason was to make sure that all cpus have updated their stuff
> halfways before we start to calculate and we spread out stuff among
> cpus to avoid contention on those global atomic variables.
>
> Yes, we can do something smarter. First thing is to fix the
> nr_uninterruptible accounting which can become negative. Peter looked
> into that already and we really need to fix this first before doing
> anything smart about that load avg stuff.
I noticed this too, and figured it was some clever accounting :). If
this is a real bug, I'd be happy to take a look at it as well.
What do you have in mind as a smarter way of fixing this issue, and
why is it dependent on fixing the absolute values of each runqueue's
nr_uninterruptible count?
-- Chase
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists