[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47D01868.9060609@o2.pl>
Date: Thu, 06 Mar 2008 17:14:32 +0100
From: Artur Skawina <art_k@...pl>
To: Olivier Galibert <galibert@...ox.com>,
"H. Peter Anvin" <hpa@...or.com>,
Chris Lattner <clattner@...le.com>,
Michael Matz <matz@...e.de>,
Richard Guenther <richard.guenther@...il.com>,
Joe Buck <Joe.Buck@...opsys.com>, Jan Hubicka <hubicka@....cz>,
Aurelien Jarno <aurelien@...el32.net>,
linux-kernel@...r.kernel.org, gcc@....gnu.org
Subject: Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction
flag
Olivier Galibert wrote:
> On Wed, Mar 05, 2008 at 05:12:07PM -0800, H. Peter Anvin wrote:
>> It's a kernel bug, and it needs to be fixed.
>
> I'm not convinced. It's been that way for 15 years, it's that way in
> the BSD kernels, at that point it's a feature. The bug is in the
> documentation, nowhere else. And in gcc for blindly trusting the
> documentation.
well, you could see this either way -- either the kernel is buggy and
needs to be fixed or the current behavior is correct and the abi needs
an errata. If there were no performance implications i'd go for the
latter, mostly because of the security aspect.
But this thread made me dig up an old benchmark and apparently omitting
the cld before the string ops makes a significant difference; on P2 it
was ~8%, on P4 it's ~6% for 1480 byte copies; for 32 byte ones the gain
is more like 90% on a P4 [1].
So the impact on small structure memcpy/memset etc is significant, hence
fixing the kernel looks like a better long term plan.
artur
[1]
P4 # ./bcsp m
IACCK 0.9.29 Artur Skawina <...>
[ exec time; lower is better ] [speed ] [ time ] [ok?]
TIME-N+S TIME32 TIME33 TIME1480 MBYTES/S TIMEXXXX CSUM FUNCTION ( rdtsc_overhead=0 null=0 )
0 0 0 0 inf 0 ffff csum_partial_copy_null
1885 375 389 156 7589.74 39350 0 generic_memcpy
10894 532 666 1696 698.11 108557 0 kernel_memcpylib
1804 325 346 151 7841.06 19614 0 kernel_memcpy686
1804 325 346 151 7841.06 19693 0 kernel_memcpy686ncld
1744 323 381 148 8000.00 19687 0 kernel_memcpy686as1
1332 157 232 139 8517.99 19235 0 kernel_memcpy686as1ncld
1782 318 339 148 8000.00 19607 0 kernel_memcpy686as2
1371 168 189 139 8517.99 19221 0 kernel_memcpy686as2ncld
P2 # ./bcsp m
IACKK 0.9.28 Artur Skawina <...>
TIME-N+S TIME32 TIME33 TIME1480 MBYTES/S TIMEXXXX CKSUM FUNCTION ( rdtsc_overhead=1 null=0 )
0 0 0 0 inf 0 : ffff csum_partial_copy_null
7121 746 1215 730 1621.92 127418 : 0 generic_memcpy
43604 2032 1709 6574 180.10 416409 : 0 kernel_memcpylib
7480 771 726 684 1730.99 96084 : 0 kernel_memcpy686
7036 735 543 685 1728.47 95508 : 0 kernel_memcpy686ncld
7498 1015 711 716 1653.63 92200 : 0 kernel_memcpy686as1
5826 438 489 662 1788.52 91598 : 0 kernel_memcpy686as1ncld
6667 657 488 708 1672.32 89366 : 0 kernel_memcpy686as2
6614 456 270 658 1799.39 91203 : 0 kernel_memcpy686as2ncld
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists