linux-kernel - Re: [PATCH] x86: Change x86 to use generic find_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080309201016.GA28454@elte.hu>
Date:	Sun, 9 Mar 2008 21:10:16 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Alexander van Heukelum <heukelum@...lshack.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	LKML <linux-kernel@...r.kernel.org>, heukelum@...tmail.fm
Subject: Re: [PATCH] x86: Change x86 to use generic find_next_bit


* Alexander van Heukelum <heukelum@...lshack.com> wrote:

> x86: Change x86 to use the generic find_next_bit implementation
> 
> The versions with inline assembly are in fact slower on the machines I 
> tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, 
> AMD Opteron 270). The i386-version needed a fix similar to 06024f21 to 
> avoid crashing the benchmark.
> 
> Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 
> 1...512, for each possible bitmap with one bit set, for each possible 
> offset: find the position of the first bit starting at offset. If you 
> follow ;). Times include setup of the bitmap and checking of the 
> results.
> 
> 		Athlon		Xeon		Opteron 32/64bit
> x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
> generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

ok, that's rather convincing.

the generic version in lib/find_next_bit.c is open-coded C which gcc can 
optimize pretty nicely.

the hand-coded assembly versions in arch/x86/lib/bitops_32.c mostly use 
the special x86 'bit search forward' (BSF) instruction - which i know 
from the days when the scheduler relied on it has some non-trivial setup 
costs. So especially when there's _small_ bitmasks involved, it's more 
expensive.

> If the bitmap size is not a multiple of BITS_PER_LONG, and no set 
> (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a 
> value outside of the range [0,size]. The generic version always 
> returns exactly size. The generic version also uses unsigned long 
> everywhere, while the x86 versions use a mishmash of int, unsigned 
> (int), long and unsigned long.

i'm not surprised that the hand-coded assembly versions had a bug ...

[ this means we have to test it quite carefully though, as lots of code 
  only ever gets tested on x86 so code could have built dependency on 
  the buggy behavior. ]

> Using the generic version does give a slightly bigger kernel, though.
> 
> defconfig:	   text    data     bss     dec     hex filename
> x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
> generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
> x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
> generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

i'd not worry about that too much. Have you tried to build with:

  CONFIG_CC_OPTIMIZE_FOR_SIZE=y
  CONFIG_OPTIMIZE_INLINING=y

(the latter only available in x86.git)

> Patch is against -x86#testing. It compiles.

i've picked it up into x86.git, lets see how it goes in practice.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/