lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1424625857-19354-1-git-send-email-yury.norov@gmail.com>
Date:	Sun, 22 Feb 2015 20:24:14 +0300
From:	Yury Norov <yury.norov@...il.com>
To:	linux@...izon.com, klimov.linux@...il.com, linux@...musvillemoes.dk
Cc:	akpm@...ux-foundation.org, davem@...emloft.net,
	dborkman@...hat.com, hannes@...essinduktion.org,
	laijs@...fujitsu.com, msalter@...hat.com,
	takahiro.akashi@...aro.org, tgraf@...g.ch,
	valentinrothberg@...il.com, yury.norov@...il.com,
	linux-kernel@...r.kernel.org, chris@...is-wilson.co.uk
Subject: [PATCH v5 0/3] lib: find_*_bit reimplementation

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:
	
	---
	 arch/arm/include/asm/bitops.h   | 19 -------------------
	 arch/arm/kernel/armksyms.c      | 11 -----------
	 arch/arm/lib/Makefile           |  2 +-
	 include/asm-generic/bitops/le.h |  1 +
	 4 files changed, 2 insertions(+), 31 deletions(-)

	diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
	index 5638099..e0611d1 100644
	--- a/arch/arm/include/asm/bitops.h
	+++ b/arch/arm/include/asm/bitops.h
	@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
	 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP(test_and_clear_bit,nr,p)
	 #define test_and_change_bit(nr,p)	ATOMIC_BITOP(test_and_change_bit,nr,p)
	 
	-#ifndef __ARMEB__
	-/*
	- * These are the little endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
	-
	-#else
	-/*
	- * These are the big endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
	-
	-#endif
	 
	 #if __LINUX_ARM_ARCH__ < 5
	 
	diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
	index a88671c..22e8748 100644
	--- a/arch/arm/kernel/armksyms.c
	+++ b/arch/arm/kernel/armksyms.c
	@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
	 EXPORT_SYMBOL(_test_and_clear_bit);
	 EXPORT_SYMBOL(_change_bit);
	 EXPORT_SYMBOL(_test_and_change_bit);
	-EXPORT_SYMBOL(_find_first_zero_bit_le);
	-EXPORT_SYMBOL(_find_next_zero_bit_le);
	-EXPORT_SYMBOL(_find_first_bit_le);
	-EXPORT_SYMBOL(_find_next_bit_le);
	-
	-#ifdef __ARMEB__
	-EXPORT_SYMBOL(_find_first_zero_bit_be);
	-EXPORT_SYMBOL(_find_next_zero_bit_be);
	-EXPORT_SYMBOL(_find_first_bit_be);
	-EXPORT_SYMBOL(_find_next_bit_be);
	-#endif
	 
	 #ifdef CONFIG_FUNCTION_TRACER
	 #ifdef CONFIG_OLD_MCOUNT
	diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
	index 0573faa..de369aa 100644
	--- a/arch/arm/lib/Makefile
	+++ b/arch/arm/lib/Makefile
	@@ -6,7 +6,7 @@
	 
	 lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
			   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
	-		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
	+		   delay.o delay-loop.o memchr.o memcpy.o	      \
			   memmove.o memset.o memzero.o setbit.o              \
			   strchr.o strrchr.o                                 \
			   testchangebit.o testclearbit.o testsetbit.o        \
	diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
	index 6173154..9a8798f 100644
	--- a/include/asm-generic/bitops/le.h
	+++ b/include/asm-generic/bitops/le.h
	@@ -2,6 +2,7 @@
	 #define _ASM_GENERIC_BITOPS_LE_H_
	 
	 #include <asm/types.h>
	+#include <asm-generic/bitops/find.h>
	 #include <asm/byteorder.h>
	 
	 #if defined(__LITTLE_ENDIAN)
	-- 
	1.9.1

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.
	
v5:
* {FIRST,LAST}_BITS_MASK macros are removed. BITMAP_{FIRST,LAST}_WORD_MASK
are used instead.

v4:
* find_last_bit found buggy and replaced with George Spelvin's version,
which handles garbage at the end of word-unaligned bitmap correctly;
* find_last_bit description fixed in 'include/linux/bitops.h';
* '_find_next_bit{,_le}: first word handling turned to be unconditional;

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
  mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
  so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
  file content;
* this cover added.

Yury Norov (3):
  lib: find_*_bit reimplementation
  lib: move find_last_bit to lib/find_next_bit.c
  lib: rename lib/find_next_bit.c to lib/find_bit.c

 include/linux/bitops.h |   4 +-
 lib/Makefile           |   2 +-
 lib/find_bit.c         | 195 +++++++++++++++++++++++++++++++++
 lib/find_last_bit.c    |  49 ---------
 lib/find_next_bit.c    | 285 -------------------------------------------------
 5 files changed, 198 insertions(+), 337 deletions(-)
 create mode 100644 lib/find_bit.c
 delete mode 100644 lib/find_last_bit.c
 delete mode 100644 lib/find_next_bit.c

-- 
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ