[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87y4ac2t4y.fsf@rasmusvillemoes.dk>
Date: Tue, 23 Feb 2016 00:59:57 +0100
From: Rasmus Villemoes <linux@...musvillemoes.dk>
To: Theodore Ts'o <tytso@....edu>
Cc: Al Viro <viro@...IV.linux.org.uk>,
Jeff Layton <jlayton@...chiereds.net>,
"J. Bruce Fields" <bfields@...ldses.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned
On Fri, Feb 19 2016, Theodore Ts'o <tytso@....edu> wrote:
> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>>
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?
No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.
$ grep strncpy_from_user *.report
perf_30_0_after.report: 5.47% s_f_u [k] strncpy_from_user
perf_30_0_before.report: 7.40% s_f_u [k] strncpy_from_user
perf_30_3_after.report: 5.05% s_f_u [k] strncpy_from_user
perf_30_3_before.report: 7.29% s_f_u [k] strncpy_from_user
perf_30_4_after.report: 4.88% s_f_u [k] strncpy_from_user
perf_30_4_before.report: 7.28% s_f_u [k] strncpy_from_user
perf_30_6_after.report: 5.43% s_f_u [k] strncpy_from_user
perf_30_6_before.report: 6.74% s_f_u [k] strncpy_from_user
perf_40_0_after.report: 5.68% s_f_u [k] strncpy_from_user
perf_40_0_before.report: 10.99% s_f_u [k] strncpy_from_user
perf_40_3_after.report: 5.37% s_f_u [k] strncpy_from_user
perf_40_3_before.report: 10.62% s_f_u [k] strncpy_from_user
perf_40_4_after.report: 5.61% s_f_u [k] strncpy_from_user
perf_40_4_before.report: 10.91% s_f_u [k] strncpy_from_user
perf_40_6_after.report: 5.81% s_f_u [k] strncpy_from_user
perf_40_6_before.report: 10.84% s_f_u [k] strncpy_from_user
perf_50_0_after.report: 6.29% s_f_u [k] strncpy_from_user
perf_50_0_before.report: 12.46% s_f_u [k] strncpy_from_user
perf_50_3_after.report: 7.15% s_f_u [k] strncpy_from_user
perf_50_3_before.report: 14.09% s_f_u [k] strncpy_from_user
perf_50_4_after.report: 7.64% s_f_u [k] strncpy_from_user
perf_50_4_before.report: 14.10% s_f_u [k] strncpy_from_user
perf_50_6_after.report: 7.30% s_f_u [k] strncpy_from_user
perf_50_6_before.report: 14.10% s_f_u [k] strncpy_from_user
perf_60_0_after.report: 6.81% s_f_u [k] strncpy_from_user
perf_60_0_before.report: 13.25% s_f_u [k] strncpy_from_user
perf_60_3_after.report: 9.48% s_f_u [k] strncpy_from_user
perf_60_3_before.report: 13.26% s_f_u [k] strncpy_from_user
perf_60_4_after.report: 9.90% s_f_u [k] strncpy_from_user
perf_60_4_before.report: 15.09% s_f_u [k] strncpy_from_user
perf_60_6_after.report: 9.91% s_f_u [k] strncpy_from_user
perf_60_6_before.report: 13.85% s_f_u [k] strncpy_from_user
So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.
Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.
I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.
[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]
> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.
I tried that as well; strncpy_from_user was around 0.5% both before and
after.
Rasmus
Powered by blists - more mailing lists