[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A2407D1.5050706@zytor.com>
Date: Mon, 01 Jun 2009 09:54:41 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Borislav Petkov <petkovbb@...il.com>,
"H. Peter Anvin" <hpa@...or.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Borislav Petkov <borislav.petkov@....com>, greg@...ah.com,
mingo@...e.hu, norsk5@...oo.com, tglx@...utronix.de,
mchehab@...hat.com, aris@...hat.com, edt@....ca,
linux-kernel@...r.kernel.org, randy.dunlap@...cle.com
CC: Sam Ravnborg <sam@...nborg.org>
Subject: Re: [PATCH 0/4] amd64_edac: misc fixes
Borislav Petkov wrote:
>
> How about we pin the src/dst into a register:
>
> #define popcnt_spelled(x) \
> ({ \
> typeof(x) __ret; \
> __asm__(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t" \
> ".byte 0xb8\n\t.byte 0xc0\n\t" \
> : "=a" (__ret) \
> : "0" (x)); \
> __ret; \
> })
>
> which generates
>
> 40055e: 48 8b 45 e8 mov -0x18(%rbp),%rax
> 400562: f3 48 0f b8 c0 popcnt %rax,%rax
> 400567: 48 89 45 f8 mov %rax,-0x8(%rbp)
>
> here.
>
Yes, we would have to do something like that.
However, if you're doing that you shouldn't use typeof() there...
instead this should be turned into an inline function with explicit
64-bit types.
It would be good if we could get Kbuild to export some kind of macro
that we can use to test binutils version, so we can do something like:
#if BINUTILS_VERSION >= KERNEL_VERSION(2,18,50)
/* Do the right thing */
#else
/* Do the wrong thing */
#endif
> For < 64bit operand sizes, the operands get zero-extended so that
> garbage in the high 32/48 bits of %rax doesn't corrupt the result.
> We might even want to do the movzwq explicitly so that some compiler
> doesn't decide to take the version with the "0f b6" opcode which
> zero-extends only the 16-/32-bit register. This way, you can popcnt even
> single bytes although the popcnt implementation doesn't allow single
> byte operands.
>
> 400572: 0f b7 45 f2 movzwl -0xe(%rbp),%eax
> 400579: f3 48 0f b8 c0 popcnt %rax,%rax
> 40057e: 66 89 45 f6 mov %ax,-0xa(%rbp)
>
>
> So, in addition to popcnt itself, we have two movs added. This is still
> less than the 30+ ops (+ function call overhead) that hweight* get
> translated into. I'll redo my kernel build benchmarks tomorrow to get
> some more recent numbers on the performance gain.
With explicit types, the compiler should do the right thing.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists