lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Nov 2009 12:25:02 -0800
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Willy Tarreau <w@....eu>
CC:	Avi Kivity <avi@...hat.com>, Alan Cox <alan@...rguk.ukuu.org.uk>,
	Pavel Machek <pavel@....cz>,
	Matteo Croce <technoboy85@...il.com>,
	Sven-Haegar Koch <haegar@...net.de>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: i686 quirk for AMD Geode

On 11/10/2009 12:16 PM, Willy Tarreau wrote:
> 
> Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> on one side and [sse*] on the other : distros are built assuming the
> former are always available while they are not always. And the distro
> which make the difference have to provide an dedicated build for earlier
> systems just for compatibility. SSE*, 3dnow* etc... are only used by a
> handful of media players/converters/encoders which are able to detect
> themselves what to use and already have the necessary fallbacks because
> these instruction sets vary too much between processors and vendors.
> 

That is increasingly not true since gcc is now doing autovectorization.

> One could argue that cmpxchg/bswap/xadd are supported by 486 and that
> implementing them for 386 is almost useless now (though it costs almost
> nothing to provide them, I did a few years ago). 
> 
> CMOV/NOPL are rarely used, thus have no reason to cause a massive
> performance drop, but are frequent enough (at least cmov) for almost
> any program to have at least one or two inside, making it incompatible
> with a given processor, and are almost obvious to implement too.

I could 970 cmovs in libc out of 322660 instructions.  That is one in
333 instruction.  In other words, a trap-and-emulate of some 500 cycles
would add some two cycles *per instruction* during execution -- hardly
an insignificant number.  All in all, any of this is really only useful
as a limp.

> SSE*/3dnow* would be much much harder and would only serve very few
> programs, and serve them badly because when they're used, it would
> be intensive.
> 
> I personally am not against being able to emulate every optional
> instruction, quite the opposite instead. It's just that if in order
> to do this, we add cost to the other obvious ones, we lose what we
> expected to win (simplicity and efficiency).

I don't see any particular subset as being more obvious than the other,
with the *possible* exception of NOPL, simply because NOPL was
undocumented for so long.

	-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ