lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 22 Jul 2016 16:35:56 -0400
From:	Waiman Long <Waiman.Long@....com>
To:	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.com>,
	Jeff Layton <jlayton@...chiereds.net>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Tejun Heo <tj@...nel.org>,
	Christoph Lameter <cl@...ux-foundation.org>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Chinner <dchinner@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>,
	Waiman Long <Waiman.Long@....com>
Subject: [PATCH v4 5/5] lib/dlock-list: Allow cacheline alignment of percpu head

Christoph Lameter had raised the concern that the spinlock in the
dlock_list_head_percpu structure may cause undesirable cacheline
contention in the percpu area that normally shouldn't have contention
of this kind.

This patch addresses this issue by allowing an option to force the
dlock_list_head_percpu structure to be cacheline aligned so that any
contention on the spinlock will not affect any nearby data items. It
then forces cacheline alignment when alloc_dlock_list_head() is called
by alloc_super() in fs/super.c.

Reported-by: Christoph Lameter <cl@...ux.com>
Signed-off-by: Waiman Long <Waiman.Long@....com>
---
 fs/super.c                 |    2 +-
 include/linux/dlock-list.h |    2 +-
 lib/dlock-list.c           |   20 ++++++++++++++++++--
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 4c33204..39f2214 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -206,7 +206,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	mutex_init(&s->s_sync_lock);
 
-	if (alloc_dlock_list_head(&s->s_inodes))
+	if (alloc_dlock_list_head(&s->s_inodes, 1))
 		goto fail;
 	if (list_lru_init_memcg(&s->s_dentry_lru))
 		goto fail;
diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h
index ceb4228..f0a0b2a 100644
--- a/include/linux/dlock-list.h
+++ b/include/linux/dlock-list.h
@@ -127,7 +127,7 @@ static inline void dlock_list_relock(struct dlock_list_iter *iter)
 /*
  * Allocation and freeing of dlock list
  */
-extern int  alloc_dlock_list_head(struct dlock_list_head *dlist);
+extern int alloc_dlock_list_head(struct dlock_list_head *dlist, int align);
 extern void free_dlock_list_head(struct dlock_list_head *dlist);
 
 /*
diff --git a/lib/dlock-list.c b/lib/dlock-list.c
index 54006dc..f117d11 100644
--- a/lib/dlock-list.c
+++ b/lib/dlock-list.c
@@ -26,22 +26,38 @@
  */
 static struct lock_class_key dlock_list_key;
 
+struct dlock_list_head_percpu_caligned {
+	struct dlock_list_head_percpu head;
+} ____cacheline_aligned_in_smp;
+
 /**
  * alloc_dlock_list_head - Initialize and allocate the per-cpu list head
  * @dlist: Pointer to the dlock_list_head structure to be initialized
+ * @align: A boolean flag for cacheline alignment
  * Return: 0 if successful, -ENOMEM if memory allocation error
  *
  * This function does not allocate the dlock_list_head structure itself. The
  * callers will have to do their own memory allocation, if necessary. However,
  * this allows embedding the dlock_list_head structure directly into other
  * structures.
+ *
+ * As the percpu spinlocks can be accessed remotely from other CPUs, it may
+ * have a performance impact on other percpu data items resided in the same
+ * cacheline as the spinlock. This performance impact can be avoided by
+ * setting the align flag forcing cacheline alignment for the percpu head
+ * structure at the expense of some wasted memory space.
  */
-int alloc_dlock_list_head(struct dlock_list_head *dlist)
+int alloc_dlock_list_head(struct dlock_list_head *dlist, int align)
 {
 	struct dlock_list_head dlist_tmp;
 	int cpu;
 
-	dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu);
+	if (align)
+		dlist_tmp.head = (struct dlock_list_head_percpu __percpu *)
+			alloc_percpu(struct dlock_list_head_percpu_caligned);
+	else
+		dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu);
+
 	if (!dlist_tmp.head)
 		return -ENOMEM;
 
-- 
1.7.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ