lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 21 Mar 2013 22:12:43 +0000
From:	Eric Wong <normalperson@...t.net>
To:	linux-kernel@...r.kernel.org
Cc:	Davide Libenzi <davidel@...ilserver.org>,
	Al Viro <viro@...IV.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	linux-fsdevel@...r.kernel.org
Subject: [RFC v2 3/2] epoll: avoid using extra cache line on most 64-bit

By moving the events field epitem, we can avoid dirtying (or even
loading) an extra cache line on 64-bit machines with 64-byte cache
lines.  Since EPOLLWAKEUP is uncommonly used, we add an additional check
for the EPOLLWAKEUP flag to avoid reading a second cache line for
the wakeup_source.

This allows ep_send_events to only read/write the top 64-bytes of an
epitem in common cases.

This patch was only made possible by the smaller footprint required
by wfcqueue.

epwbench test timings:

Before (without wfcq at all):
AVG: 5.448400
SIG: 0.003056

Before (with wfcq local):
AVG: 5.532024
SIG: 0.000244

After (this commit):
AVG: 5.331539
SIG: 0.000234

Even with the variability between runs on my KVM, I'm confident this
wfcqueue epoll series introduces no performance regressions in the
common single-threaded use cases of epoll.

ref: http://www.xmailserver.org/epwbench.c

Somewhat-tested-by: Eric Wong <normalperson@...t.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Davide Libenzi <davidel@...ilserver.org>
Cc: Al Viro <viro@...IV.linux.org.uk>
Cc: Andrew Morton <akpm@...ux-foundation.org>
---
 fs/eventpoll.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e04175..82bf483 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -155,12 +155,27 @@ struct epitem {
 	/* The file descriptor information this item refers to */
 	struct epoll_filefd ffd;
 
-	/* Number of active wait queue attached to poll operations */
+	/*
+	 * Number of active wait queue attached to poll operations
+	 * This is infrequently used, it pads well here but may be
+	 * removed in the future
+	 */
 	int nwait;
 
 	/* state of this item */
 	enum epoll_item_state state;
 
+	/* The structure that describe the interested events and the source fd */
+	struct epoll_event event;
+
+	/*
+	 * --> 64-byte boundary for 64-bit systems <--
+	 * frequently accessed (read/written) items ar above this comment
+	 * infrequently accessed items are below this comment
+	 * Keeping frequently accessed items within the 64-byte boundary
+	 * prevents extra cache line usage on common x86-64 machines
+	 */
+
 	/* List containing poll wait queues */
 	struct list_head pwqlist;
 
@@ -172,9 +187,6 @@ struct epitem {
 
 	/* wakeup_source used when EPOLLWAKEUP is set */
 	struct wakeup_source __rcu *ws;
-
-	/* The structure that describe the interested events and the source fd */
-	struct epoll_event event;
 };
 
 /*
@@ -596,6 +608,13 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 /* call only when ep->mtx is held */
 static inline struct wakeup_source *ep_wakeup_source(struct epitem *epi)
 {
+	/*
+	 * avoid loading the extra cache line on machines with
+	 * <= 64-byte cache lines
+	 */
+	if (!(epi->event.events & EPOLLWAKEUP))
+		return NULL;
+
 	return rcu_dereference_check(epi->ws, lockdep_is_held(&epi->ep->mtx));
 }
 
-- 
Eric Wong

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ