lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 26 Jan 2013 13:08:02 -0800
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Jan Beulich <jbeulich@...e.com>, ling.ml@...pay.com,
	Steven Rostedt <rostedt@...dmis.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/asm] x86/defconfig: Turn on CONFIG_CC_OPTIMIZE_FOR_SIZE= y in the 64-bit defconfig

The fast rep movsb was introduced on Ivy Bridge, IIRC.

Linus Torvalds <torvalds@...ux-foundation.org> wrote:

>On Sat, Jan 26, 2013 at 7:18 AM, H. Peter Anvin <hpa@...or.com> wrote:
>> On the CPUs Ling is testing on the downsides of -Os probably matter
>less, in particular since rep movsb works well.
>>
>> It is questionable as a generic default, though.
>
>So being the person who really pushed for -Os to begin with (I think
>I$ and instruction decode bandwidth is one of the most fundamental
>limits to CPU performance), I wouldn't mind it if we reintroduced it.
>
>HOWEVER.
>
>It wasn't just "rep movs". The thing that killed -Os for me was that
>it makes it impossible to try to optimize hot code, because -Os seems
>to throw out branch prediction information. So when you use "likely()"
>etc to try to teach the compiler to lay out code a certain way so that
>code that never really gets executed isn't even brought into the I$,
>-Os then screws it up completely.
>
>Of course, maybe newer versions of gcc might not suck so horribly with
>-Os, I haven't actually tried in a while.
>
>[ Just tested. Still does it ]
>
>Also, I doubt Ling was testing a SB CPU. Because "rep movb" still
>sucks pretty bad on SB. What core *is* Ling testing? Haswell?
>
>Ugh. We could make it depend on the optimization target. I'd also wish
>there was some way to just tune gcc -Os to be closer to reasonable. Or
>make -O2 not do some of the excessive crap it does (it aligns code
>*much* too much, for example - who cares if you can do it with a
>single instruction, if that instruction is so long that it uses up
>half your decode bandwidth?)
>
>The problem, of course, is that most -O2 code generation is done
>assuming hot loops that don't show much if any I$ issues. And the -Os
>thing is done *purely* for size, not taking any performance into
>account at all. There's no balanced middle ground, which is what _we_
>would want.
>
>                  Linus

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ