lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 13 Nov 2009 09:10:37 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Pavel Machek <pavel@....cz>, "Ma, Ling" <ling.ma@...el.com>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.


* H. Peter Anvin <hpa@...or.com> wrote:

> On 11/12/2009 11:33 PM, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@....cz> wrote:
> > 
> >>> Ling, if you are interested, could you send a user-space test-app to 
> >>> this thread that everyone could just compile and run on various older 
> >>> boxes, to gather a performance profile of hand-coded versus string ops 
> >>> performance?
> >>>
> >>> ( And i think we can make a judgement based on cache-hot performance
> >>>   alone - if then the strings ops will perform comparatively better in
> >>>   cache-cold scenarios, so the cache-hot numbers would be a conservative
> >>>   estimate. )
> >>
> >> Ugh, really? I'd expect cache-cold performance to be not helped at all 
> >> (memory bandwidth limit) and you'll get slow down from additional 
> >> i-cache misses...
> > 
> > That's my point - the new code is shorter, which will run comparatively 
> > faster in a cache-cold environment.
> > 
> 
> memcpy_c by itself is by far the shortest variant, of course.

yep. The argument i made was when a long function was compared to a 
short one. As you noted we dont actually enable the long function all 
that often - which inverts the same argument.

> The question is if it makes sense to use the long variants for short 
> (< 1024 bytes) copies.

I'd say not - the kernel executes in a icache-cold environment most of 
the time (as user-space is far more cache intense in the majority of 
workloads and kernel processing starts with a cold icache), so 
optimizing the kernel for code size is very important. (but numbers done 
on real workloads can convince me of the opposite.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ