lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090428163509.EBCF.A69D9226@jp.fujitsu.com>
Date:	Tue, 28 Apr 2009 16:38:51 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	bharata@...ux.vnet.ibm.com
Cc:	kosaki.motohiro@...fujitsu.com,
	LKML <linux-kernel@...r.kernel.org>,
	Balaji Rao <balajirrao@...il.com>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	seto.hidetoshi@...fujitsu.com
Subject: Re: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count

> On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
> > 
> > I'm not cpuacct expert. please give me comment.
> > 
> > ====================
> > Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
> > 
> > impact: little performance improvement
> > 
> > cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> > for avoiding performance degression.
> > 
> > Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
> > if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
> > Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
> > 
> > This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
> > cputime in percpu cache.
> > it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
> > works well on VIRT_CPU_ACCOUNTING=y too.
> 
> Let me try to understand what you are saying...
> 
> For archs which define VIRT_CPU_ACCOUNTING, every tick would result
> in around 1000 units of cputime updates and since this is much much greater
> than percpu_batch_counter, we end up taking spinlock on every tick.

Correct.

> 
> If my above reading of the problem is correct, please look at my below comments.
> 
> > Index: b/kernel/sched.c
> > ===================================================================
> > --- a/kernel/sched.c	2009-04-28 14:18:36.000000000 +0900
> > +++ b/kernel/sched.c	2009-04-28 15:18:07.000000000 +0900
> > @@ -10117,6 +10117,7 @@ struct cpuacct {
> >  };
> > 
> >  struct cgroup_subsys cpuacct_subsys;
> > +static s32 cpuacct_batch;
> > 
> >  /* return cpu accounting group corresponding to this container */
> >  static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> > @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> >  	if (!ca->cpuusage)
> >  		goto out_free_ca;
> > 
> > +	if (!cpuacct_batch)
> > +		cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> 
> You essentially end up increasing the batch value from the default value
> of max(32, nr_cpus*2). 
> 
> > +
> >  	for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> >  		if (percpu_counter_init(&ca->cpustat[i], 0))
> >  			goto out_free_counters;
> > @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct 
> >  	ca = task_ca(tsk);
> > 
> >  	do {
> > -		percpu_counter_add(&ca->cpustat[idx], val);
> > +		__percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
> 
> And you do this unconditionally which will affect all archs ? So you make
> this behaviour default for archs which have VIRT_CPU_ACCOUNTING=n.

if VIRT_CPU_ACCOUNTING=n, jiffies_to_cputime(n) return n. 
IOW, jiffies_to_cputime() don't change value.

and percpu_counter() use percpu_counter_batch as batch.

Then, if VIRT_CPU_ACCOUNTING=n, this patch don't change behavior.



> BTW, did you observe any real problem with the percpu counter spinlock ?

No.
I review percpu_counter() caller recently and it seems stragen usage.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ