lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 4 Feb 2010 16:10:50 +0100
From:	Borislav Petkov <bp@...64.org>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Wu Fengguang <fengguang.wu@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Jamie Lokier <jamie@...reable.org>,
	Roland Dreier <rdreier@...co.com>,
	Al Viro <viro@...IV.linux.org.uk>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH 2/5] bitops: compile time optimization for
 hweight_long(CONSTANT)

On Wed, Feb 03, 2010 at 11:49:54AM -0800, H. Peter Anvin wrote:
> On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> > On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
> > 
> >> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
> > 
> > Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> > and hweight64() needs a bit of magic for 32bit, but yes, something like
> > that ought to work nicely.
> > 
> 
> Arguably the "best" option is to have the alternative being a jump to an
> out-of-line stub which does the necessary parameter marshalling before
> calling a stub.  This technique is already used in a few other places.

Ok, here's a first alpha prototype and completely untested. The asm
output looks ok though. I've added separate 32-bit and 64-bit helpers in
order to dispense with the if-else tests. The hw-popcnt versions are the
opcodes for "popcnt %eax, %eax" and "popcnt %rax, %rax", respectively,
so %rAX has to be preloaded with the bitmask and the computed value
has to be retrieved from there afterwards. And yes, it looks not that
elegant so I'm open for suggestions.

The good thing is, this should work on any toolchain since we don't rely
on the compiler to know about popcnt and we're protected by CPUID flag
so that the hw-popcnt version is used only on processors which support
it.

Please take a good look and let me know what do you guys think.

Thanks.

--
 arch/x86/include/asm/bitops.h |    4 ++
 arch/x86/lib/Makefile         |    2 +-
 arch/x86/lib/popcnt.c         |   62 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/popcnt.c

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 02b47a6..deb5013 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -434,6 +434,10 @@ static inline int fls(int x)
 #endif
 	return r + 1;
 }
+
+
+extern int arch_hweight_long(unsigned long);
+
 #endif /* __KERNEL__ */
 
 #undef ADDR
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index cffd754..c03fe2d 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_KPROBES) += insn.o inat.o
 
-obj-y += msr.o msr-reg.o msr-reg-export.o
+obj-y += msr.o msr-reg.o msr-reg-export.o popcnt.o
 
 ifeq ($(CONFIG_X86_32),y)
         obj-y += atomic64_32.o
diff --git a/arch/x86/lib/popcnt.c b/arch/x86/lib/popcnt.c
new file mode 100644
index 0000000..179a6e8
--- /dev/null
+++ b/arch/x86/lib/popcnt.c
@@ -0,0 +1,62 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+
+int _hweight32(void)
+{
+	unsigned long w;
+
+	asm volatile("" : "=a" (w));
+
+	return hweight32(w);
+}
+
+int _hweight64(void)
+{
+	unsigned long w;
+
+	asm volatile("" : "=a" (w));
+
+	return hweight64(w);
+}
+
+int _popcnt32(void)
+{
+
+	unsigned long w;
+
+	asm volatile(".byte 0xf3\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc0\n\t"
+			: "=a" (w));
+
+	return w;
+}
+
+int _popcnt64(void)
+{
+
+	unsigned long w;
+
+	asm volatile(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t."
+		     "byte 0xb8\n\t.byte 0xc0\n\t"
+		     : "=a" (w));
+
+	return w;
+}
+
+int arch_hweight_long(unsigned long w)
+{
+	if (sizeof(w) == 4) {
+		asm volatile("movl %[w], %%eax" :: [w] "r" (w));
+		alternative("call _hweight32",
+			    "call _popcnt32",
+			    X86_FEATURE_POPCNT);
+		asm volatile("" : "=a" (w));
+
+	} else {
+		asm volatile("movq %[w], %%rax" :: [w] "r" (w));
+		alternative("call _hweight64",
+			    "call _popcnt64",
+			    X86_FEATURE_POPCNT);
+		asm volatile("" : "=a" (w));
+	}
+	return w;
+}
-- 
1.6.6

-- 
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ