[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090716083948.GA2950@kryten>
Date: Thu, 16 Jul 2009 18:39:48 +1000
From: Anton Blanchard <anton@...ba.org>
To: Bharata B Rao <bharata@...ux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Ingo Molnar <mingo@...e.hu>,
Balbir Singh <balbir@...ux.vnet.ibm.com>, mingo@...hat.com,
hpa@...or.com, linux-kernel@...r.kernel.org,
a.p.zijlstra@...llo.nl, schwidefsky@...ibm.com,
balajirrao@...il.com, dhaval@...ux.vnet.ibm.com,
tglx@...utronix.de, kamezawa.hiroyu@...fujitsu.com,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter
batch values for stats counters
Hi,
> On ppc64, calling jiffies_to_cputime() from sched_init() is too early because
> jiffies_to_cputime() needs tb_ticks_per_sec which gets initialized only
> later in time_init(). Because of this I see that cpuacct_batch will always
> be zero effectively negating what this patch is trying to do.
>
> As explained by you earlier, we too are finding the default batch value to
> be too low for ppc64 with VIRT_CPU_ACCOUNTING turned on. Hence I guess
> if this patch is taken in (ofcourse with the above issue fixed), it will
> benefit ppc64 also.
I created this patch earlier today when I hit the problem. Thoughts?
Anton
--
When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats
with values much larger than percpu_counter_batch. This means the
call to percpu_counter_add will always add to the global count which is
protected by a spinlock.
Since reading of the CPU accounting cgroup counters is not performance
critical, we can use a maximum size batch of INT_MAX and use
percpu_counter_sum on the read side which will add all the percpu
counters.
With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of
397k/sec to 3.9M/sec, a 10x improvement.
Signed-off-by: Anton Blanchard <anton@...ba.org>
---
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
+++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
@@ -10551,7 +10551,7 @@
int i;
for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
- s64 val = percpu_counter_read(&ca->cpustat[i]);
+ s64 val = percpu_counter_sum(&ca->cpustat[i]);
val = cputime64_to_clock_t(val);
cb->fill(cb, cpuacct_stat_desc[i], val);
}
@@ -10621,7 +10621,7 @@
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists