[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100212174751.GD3114@aftab>
Date: Fri, 12 Feb 2010 18:47:51 +0100
From: Borislav Petkov <bp@...64.org>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Borislav Petkov <petkovbb@...glemail.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Wu Fengguang <fengguang.wu@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Jamie Lokier <jamie@...reable.org>,
Roland Dreier <rdreier@...co.com>,
Al Viro <viro@...IV.linux.org.uk>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>, Brian Gerst <brgerst@...il.com>
Subject: Re: [PATCH 2/5] bitops: compile time optimization for
hweight_long(CONSTANT)
On Fri, Feb 12, 2010 at 09:28:32AM -0800, H. Peter Anvin wrote:
> On 02/12/2010 09:06 AM, Borislav Petkov wrote:
> > On Thu, Feb 11, 2010 at 09:33:49AM -0800, H. Peter Anvin wrote:
> >> You don't do the push/pop inline -- if you're going to take the hit of
> >> pushing this into the caller, it's better to list them as explicit
> >> clobbers and let the compiler figure out how to do it. The point of
> >> doing an explicit push/pop is that it can be pushed into the out-of-line
> >> subroutine.
> >
> > Doh! Of course, this was wrong. See below.
> >
> > diff --git a/arch/x86/lib/hweight.c b/arch/x86/lib/hweight.c
> > new file mode 100644
> > index 0000000..9b4bfa5
> > --- /dev/null
> > +++ b/arch/x86/lib/hweight.c
> > @@ -0,0 +1,72 @@
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/bitops.h>
> > +
> > +#ifdef CONFIG_64BIT
> > +/* popcnt %rdi, %rax */
> > +#define POPCNT ".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc7"
> > +#define REG_IN "D"
> > +#define REG_OUT "a"
> > +#else
> > +/* popcnt %eax, %eax */
> > +#define POPCNT ".byte 0xf3\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc0"
> > +#define REG_IN "a"
> > +#define REG_OUT "a"
> > +#endif
> > +
> > +/*
> > + * Those are called out-of-line in the alternative below and are added here only
> > + * so that gcc is able to figure out which registers have been clobbered by
> > + * __sw_hweightXX so that it could restore their values before returning from
> > + * the __arch_hweightXX versions. See also <arch/x86/lib/Makefile>.
> > + */
> > +unsigned int _sw_hweight32(unsigned int w)
> > +{
> > + return __sw_hweight32(w);
> > +}
> > +
> > +unsigned long _sw_hweight64(__u64 w)
> > +{
> > + return __sw_hweight64(w);
> > +}
> > +
> > +unsigned int __arch_hweight32(unsigned int w)
> > +{
> > + unsigned int res = 0;
> > +
> > + asm (ALTERNATIVE("call _sw_hweight32", POPCNT, X86_FEATURE_POPCNT)
> > + : "="REG_OUT (res)
> > + : REG_IN (w));
> > +
>
> Now you have no clobbers and no assembly wrapper. That really doesn't work.
I see, this means we have to build lib/hweight.c with -fcall-saved*. I
just did a test and it uses %rcx and %rdx temporarily by saving their
values on the stack and restoring them before it returns.
However, this is generic code and for the above to work we have to
enforce x86-specific CFLAGS for it. What is the preferred way to do
that?
--
Regards/Gruss,
Boris.
--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists