[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B5E88EB.4020001@linux.vnet.ibm.com>
Date: Tue, 26 Jan 2010 11:47:15 +0530
From: Balbir Singh <balbir@...ux.vnet.ibm.com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: Anton Blanchard <anton@...ba.org>,
Bharata B Rao <bharata@...ux.vnet.ibm.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Ingo Molnar <mingo@...e.hu>, mingo@...hat.com, hpa@...or.com,
linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl,
schwidefsky@...ibm.com, balajirrao@...il.com,
dhaval@...ux.vnet.ibm.com, tglx@...utronix.de,
kamezawa.hiroyu@...fujitsu.com, Tony Luck <tony.luck@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Heiko Carstens <heiko.carstens@...ibm.com>, linux390@...ibm.com
Subject: Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values
for stats counters
On Tuesday 26 January 2010 04:44 AM, Andrew Morton wrote:
> On Mon, 18 Jan 2010 15:41:42 +1100
> Anton Blanchard <anton@...ba.org> wrote:
>
>> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
>> call cpuacct_update_stats with values much larger than percpu_counter_batch.
>> This means the call to percpu_counter_add will always add to the global count
>> which is protected by a spinlock and we end up with a global spinlock in
>> the scheduler.
>
> When one looks at the end result:
>
> : static void cpuacct_update_stats(struct task_struct *tsk,
> : enum cpuacct_stat_index idx, cputime_t val)
> : {
> : struct cpuacct *ca;
> : int batch;
> :
> : if (unlikely(!cpuacct_subsys.active))
> : return;
> :
> : rcu_read_lock();
> : ca = task_ca(tsk);
> :
> : batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> : do {
> : __percpu_counter_add(&ca->cpustat[idx], val, batch);
> : ca = ca->parent;
> : } while (ca);
> : rcu_read_unlock();
> : }
>
> the code (which used to be quite obvious) becomes pretty unobvious. In
> fact it looks quite wrong.
>
> Shouldn't there be a comment there explaining wtf is going on?
Andrew,
I guess a lot of the changelog and comments are in the email history,
but your point on the comment is valid. Why does it look quite wrong to you?
cputime_one_jiffy tells us how many cputime_t's we've gotten in one
jiffy. If virtual accounting is enabled, this number is quite large, and
1 if virtual accounting is not enabled. Overall the value is set to 32
for non-virtual accounting enabled systems. On systems that support
virtual accounting, the value is set to 32*cputime_per_jifffy, so the
per cpu counter syncs up roughly once in 32 jiffies assuming
cpuacct_update_stats is called once per jiffy for non-virtual machines.
If the above comment, pleases you I'll polish it and send it across.
Anton, could you please confirm what I've said above is indeed correct.
--
Three Cheers,
Balbir Singh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists