lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <000101c6ee55$c3e121a0$db34030a@amr.corp.intel.com>
Date:	Thu, 12 Oct 2006 16:25:54 -0700
From:	"Chen, Kenneth W" <kenneth.w.chen@...el.com>
To:	"'Zach Brown'" <zach.brown@...cle.com>,
	"'Suparna Bhattacharya'" <suparna@...ibm.com>,
	"Lahaise, Benjamin C" <benjamin.c.lahaise@...el.com>
Cc:	<linux-kernel@...r.kernel.org>, "linux-aio" <linux-aio@...ck.org>
Subject: [patch] rearrange kioctx and aio_ring_info

The following patch is result of our kernel data latency analysis on a
large database setup (128 logical CPU, 1 TB memory, a couple thousand
disks). The analysis is done at an individual cache line level through
hardware performance monitor unit.  What we did essentially is to collect
data access latency on every kernel memory address and look for top 50
longest average access latency.  The hardware performance unit also keeps
track of instruction address for each collected data access.  This allows
us to further associate each of the top 50 cache lines to instruction
sequences that caused long latency.

One structure stood out in the woods is the AIO kioctx structure.  In
particular, kioctx->users and kioctx->ctx_lock are the two hottest variables
within one cache line (there are multiple of these cache lines since db
app uses multiple kioctx). What also appears to be a more serious cacheline
conflict between io_submit_one and aio_complete path: aio_complete()
finishes process an iocb, unlocks ctx->ctx_lock (which implies it just lost
ownership of that cache line), turns around and access wait->task_list and
also does an atomic update to ctx->users.  I think that is causing cache
lines to be bounced around when another process took the ctx_lock in between.

So here we go, I'd like to take this opportunity to rearrange kioctx, put
variables that used together in close proximity.  The rearrangement doesn't
change number of cache line touched in io_submit_one or io_getevents, however,
it reduces one line in aio_complete path.  Also I think we ought to watch
out on updating variable that is in the same cache line of ctx_lock.

Thoughts?  Comments?  Flames?


Signed-off-by: Ken Chen <kenneth.w.chen@...el.com>


diff -Nurp linux-2.6.18/include/linux/aio.h linux-2.6.18.ken/include/linux/aio.h
--- linux-2.6.18/include/linux/aio.h	2006-09-19 20:42:06.000000000 -0700
+++ linux-2.6.18.ken/include/linux/aio.h	2006-10-12 13:28:17.000000000 -0700
@@ -156,41 +156,38 @@ struct aio_ring {
 
 #define AIO_RING_PAGES	8
 struct aio_ring_info {
-	unsigned long		mmap_base;
-	unsigned long		mmap_size;
-
-	struct page		**ring_pages;
 	spinlock_t		ring_lock;
-	long			nr_pages;
-
+	int			nr_pages;
 	unsigned		nr, tail;
 
+	struct page		**ring_pages;
 	struct page		*internal_pages[AIO_RING_PAGES];
+
+	unsigned long		mmap_base;
+	unsigned long		mmap_size;
 };
 
 struct kioctx {
 	atomic_t		users;
 	int			dead;
-	struct mm_struct	*mm;
-
-	/* This needs improving */
-	unsigned long		user_id;
-	struct kioctx		*next;
-
 	wait_queue_head_t	wait;
 
 	spinlock_t		ctx_lock;
-
 	int			reqs_active;
 	struct list_head	active_reqs;	/* used for cancellation */
 	struct list_head	run_list;	/* used for kicked reqs */
 
-	/* sys_io_setup currently limits this to an unsigned int */
-	unsigned		max_reqs;
-
 	struct aio_ring_info	ring_info;
 
+	/* This needs improving */
+	unsigned long		user_id;
+	struct kioctx		*next;
+	struct mm_struct	*mm;
+
 	struct work_struct	wq;
+
+	/* sys_io_setup currently limits this to an unsigned int */
+	unsigned		max_reqs;
 };
 
 /* prototypes */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ