lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1305168014.2373.7.camel@sli10-conroe>
Date:	Thu, 12 May 2011 10:40:14 +0800
From:	Shaohua Li <shaohua.li@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"tj@...nel.org" <tj@...nel.org>,
	"eric.dumazet@...il.com" <eric.dumazet@...il.com>,
	"cl@...ux.com" <cl@...ux.com>,
	"npiggin@...nel.dk" <npiggin@...nel.dk>
Subject: Re: [patch v2 4/5] percpu_counter: use atomic64 for counter in SMP

On Wed, 2011-05-11 at 17:34 +0800, Andrew Morton wrote:
> On Wed, 11 May 2011 16:10:16 +0800 Shaohua Li <shaohua.li@...el.com> wrote:
> 
> > The percpu_counter global lock is only used to protect updating fbc->count after
> > we use lglock to protect percpu data. Uses atomic64 for percpu_counter, because
> > it is cheaper than spinlock. This doesn't slow fast path (percpu_counter_read).
> > atomic64_read equals to read fbc->count for 64-bit system, or equals to
> > spin_lock-read-spin_unlock for 32-bit system.
> > 
> > Note, originally the percpu_counter_read for 32-bit system doesn't hold
> > spin_lock, but that is buggy and might cause very wrong value accessed.
> > This patch fixes the issue.
> > 
> > This can also improve some workloads with percpu_counter->lock heavily
> > contented. For example, vm_committed_as sometimes causes the contention.
> > We should tune the batch count, but if we can make percpu_counter better,
> > why not? In a 24 CPUs system and 24 processes, each runs:
> > while (1) {
> > 	mmap(128M);
> > 	munmap(128M);
> > }
> > we then measure how many loops each process can take:
> > orig: 1226976
> > patched: 6727264
> > The atomic method gives 5x~6x faster.
> 
> How much slower did percpu_counter_sum() become?
I did a stress test. 23 CPU run _add, one cpu runs _sum
In both cases (_add fast path (don't hold lock), _add slow path (hold
lock)), _sum becomes about 2.4x slower. Not too much slower, anyway,
_sum isn't frequently used.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ