lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 23 Feb 2016 00:59:57 +0100
From:	Rasmus Villemoes <linux@...musvillemoes.dk>
To:	Theodore Ts'o <tytso@....edu>
Cc:	Al Viro <viro@...IV.linux.org.uk>,
	Jeff Layton <jlayton@...chiereds.net>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned

On Fri, Feb 19 2016, Theodore Ts'o <tytso@....edu> wrote:

> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>> 
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?

No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.

$ grep strncpy_from_user *.report
perf_30_0_after.report:     5.47%  s_f_u    [k] strncpy_from_user            
perf_30_0_before.report:     7.40%  s_f_u    [k] strncpy_from_user         
perf_30_3_after.report:     5.05%  s_f_u    [k] strncpy_from_user             
perf_30_3_before.report:     7.29%  s_f_u    [k] strncpy_from_user          
perf_30_4_after.report:     4.88%  s_f_u    [k] strncpy_from_user            
perf_30_4_before.report:     7.28%  s_f_u    [k] strncpy_from_user        
perf_30_6_after.report:     5.43%  s_f_u    [k] strncpy_from_user            
perf_30_6_before.report:     6.74%  s_f_u    [k] strncpy_from_user        
perf_40_0_after.report:     5.68%  s_f_u    [k] strncpy_from_user            
perf_40_0_before.report:    10.99%  s_f_u    [k] strncpy_from_user        
perf_40_3_after.report:     5.37%  s_f_u    [k] strncpy_from_user            
perf_40_3_before.report:    10.62%  s_f_u    [k] strncpy_from_user        
perf_40_4_after.report:     5.61%  s_f_u    [k] strncpy_from_user            
perf_40_4_before.report:    10.91%  s_f_u    [k] strncpy_from_user        
perf_40_6_after.report:     5.81%  s_f_u    [k] strncpy_from_user            
perf_40_6_before.report:    10.84%  s_f_u    [k] strncpy_from_user          
perf_50_0_after.report:     6.29%  s_f_u    [k] strncpy_from_user            
perf_50_0_before.report:    12.46%  s_f_u    [k] strncpy_from_user        
perf_50_3_after.report:     7.15%  s_f_u    [k] strncpy_from_user            
perf_50_3_before.report:    14.09%  s_f_u    [k] strncpy_from_user                   
perf_50_4_after.report:     7.64%  s_f_u    [k] strncpy_from_user            
perf_50_4_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_50_6_after.report:     7.30%  s_f_u    [k] strncpy_from_user            
perf_50_6_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_60_0_after.report:     6.81%  s_f_u    [k] strncpy_from_user            
perf_60_0_before.report:    13.25%  s_f_u    [k] strncpy_from_user         
perf_60_3_after.report:     9.48%  s_f_u    [k] strncpy_from_user            
perf_60_3_before.report:    13.26%  s_f_u    [k] strncpy_from_user         
perf_60_4_after.report:     9.90%  s_f_u    [k] strncpy_from_user            
perf_60_4_before.report:    15.09%  s_f_u    [k] strncpy_from_user          
perf_60_6_after.report:     9.91%  s_f_u    [k] strncpy_from_user            
perf_60_6_before.report:    13.85%  s_f_u    [k] strncpy_from_user        

So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.

Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.

I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.

[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]

> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.

I tried that as well; strncpy_from_user was around 0.5% both before and
after.

Rasmus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ