lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A2407D1.5050706@zytor.com>
Date:	Mon, 01 Jun 2009 09:54:41 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Borislav Petkov <petkovbb@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Borislav Petkov <borislav.petkov@....com>, greg@...ah.com,
	mingo@...e.hu, norsk5@...oo.com, tglx@...utronix.de,
	mchehab@...hat.com, aris@...hat.com, edt@....ca,
	linux-kernel@...r.kernel.org, randy.dunlap@...cle.com
CC:	Sam Ravnborg <sam@...nborg.org>
Subject: Re: [PATCH 0/4] amd64_edac: misc fixes

Borislav Petkov wrote:
> 
> How about we pin the src/dst into a register:
> 
> #define popcnt_spelled(x)                                       \
> ({                                                              \
>         typeof(x) __ret;                                        \
>         __asm__(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t"    \
>                 ".byte 0xb8\n\t.byte 0xc0\n\t"                  \
>                 : "=a" (__ret)                                  \
>                 : "0" (x));                                     \
>         __ret;                                                  \
> })
> 
> which generates
> 
>   40055e:       48 8b 45 e8             mov    -0x18(%rbp),%rax
>   400562:       f3 48 0f b8 c0          popcnt %rax,%rax
>   400567:       48 89 45 f8             mov    %rax,-0x8(%rbp)
> 
> here.
> 

Yes, we would have to do something like that.

However, if you're doing that you shouldn't use typeof() there...
instead this should be turned into an inline function with explicit
64-bit types.

It would be good if we could get Kbuild to export some kind of macro
that we can use to test binutils version, so we can do something like:

#if BINUTILS_VERSION >= KERNEL_VERSION(2,18,50)
/* Do the right thing */
#else
/* Do the wrong thing */
#endif

> For < 64bit operand sizes, the operands get zero-extended so that
> garbage in the high 32/48 bits of %rax doesn't corrupt the result.
> We might even want to do the movzwq explicitly so that some compiler
> doesn't decide to take the version with the "0f b6" opcode which
> zero-extends only the 16-/32-bit register. This way, you can popcnt even
> single bytes although the popcnt implementation doesn't allow single
> byte operands.
> 
>   400572:       0f b7 45 f2             movzwl -0xe(%rbp),%eax
>   400579:       f3 48 0f b8 c0          popcnt %rax,%rax
>   40057e:       66 89 45 f6             mov    %ax,-0xa(%rbp)
> 
> 
> So, in addition to popcnt itself, we have two movs added. This is still
> less than the 30+ ops (+ function call overhead) that hweight* get
> translated into. I'll redo my kernel build benchmarks tomorrow to get
> some more recent numbers on the performance gain.

With explicit types, the compiler should do the right thing.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ