lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160607201340.GL13997@two.firstfloor.org>
Date:	Tue, 7 Jun 2016 13:13:40 -0700
From:	Andi Kleen <andi@...stfloor.org>
To:	Waiman Long <Waiman.Long@....com>
Cc:	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.com>,
	Jeff Layton <jlayton@...chiereds.net>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Tejun Heo <tj@...nel.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Chinner <dchinner@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>
Subject: Re: [RESEND PATCH 1/5] lib/dlock-list: Distributed and
 lock-protected lists

On Tue, Jun 07, 2016 at 03:35:51PM -0400, Waiman Long wrote:
> Linked list is used everywhere in the Linux kernel. However, if many
> threads are trying to add or delete entries into the same linked list,
> it can create a performance bottleneck.
> 
> This patch introduces a new list APIs that provide a set of distributed
> lists (one per CPU), each of which is protected by its own spinlock.

One thing I don't like is that it is per CPU. One per CPU is almost
certainly overkill and not needed for true scalability, especially
on systems using SMT. Also it makes the case where everything has to
be walked more and more expensive, because all these locks have to
be taken. Even when not contended this will add up.

It would be better to do this per every Nth CPU. Now I don't have
a clear answer what the best N is, but I'm pretty sure it's > 1.
For example at least on SMT systems only per core instead of per
thread. Likely even more coarse grained, although per socket
may be not good enough.

-Andi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ