linux-kernel - [PATCH 38/46] fs: prefetch inode data in dcache lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 27 Nov 2010 20:45:08 +1100
From:	Nick Piggin <npiggin@...nel.dk>
To:	linux-fsdevel@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org
Subject: [PATCH 38/46] fs: prefetch inode data in dcache lookup

This gains another 5% or so on the cached git diff workload by
prefetching the important first cacheline of the inode in while
we do the actual name compare and other operations on the dentry.

There was no measurable slowdown in the single file stat case, or
the creat case (where negative dentries would be common). (actually
there was about a 5 nanosecond speedup in these cases, but I can't
say it is significant.

Workload is 100 git diffs in sequence:
real		user		sys

vanilla single thread
0m9.753s	0m1.860s 	0m7.230s
0m9.752s	0m1.960s 	0m7.270s
0m9.754s	0m1.870s 	0m7.290s
0m9.749s	0m1.910s 	0m7.330s
0m9.750s	0m2.110s 	0m7.060s

scale single thread
0m7.678s	0m1.990s 	0m5.090s
0m7.682s	0m2.090s 	0m5.000s
0m7.681s	0m1.970s 	0m5.100s
0m7.679s	0m1.810s 	0m5.280s
0m7.679s	0m1.970s 	0m5.100s

Single threaded case has about 25% higher throughput. The actual
kernel's throughput is increased by about 45%. This is incredibly
significant for a single threaded performance increase in core
kernel code in 2010.

vanilla multi thread (preloadindex=true)
0m6.517s	0m1.430s 	0m20.200s
0m6.514s	0m1.360s	0m20.230s
0m6.521s	0m1.410s 	0m20.090s
0m6.519s	0m1.410s 	0m20.060s
0m6.521s	0m1.610s 	0m20.140s

scale multi thread (preloadindex=true)
0m3.301s	0m0.840s 	0m3.300s
0m3.304s	0m0.940s 	0m3.320s
0m3.291s	0m0.930s 	0m3.170s
0m3.292s	0m0.900s 	0m3.230s
0m3.277s	0m0.770s 	0m3.230s

Parallel case throughput is very nearly doubled, despite git being
unable to produce enough work to keep all CPUs busy (118% CPU used
over the duration of the test). System time shows that scalability
of path walk has already turned to shit in the vanilla kernel.

Signed-off-by: Nick Piggin <npiggin@...nel.dk>
---
 fs/dcache.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 58faf37..fa6e7a5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1658,6 +1658,9 @@ seqretry:
 		tlen = dentry->d_name.len;
 		tname = dentry->d_name.name;
 		i = dentry->d_inode;
+		prefetch(tname);
+		if (i)
+			prefetch(i);
 		/*
 		 * This seqcount check is required to ensure name and
 		 * len are loaded atomically, so as not to walk off the
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/