lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-mTVyOb8h1wYxvt@gmail.com>
Date: Sun, 30 Mar 2025 20:54:15 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with
 X86_NATIVE_CPU option


* Uros Bizjak <ubizjak@...il.com> wrote:

> On Sun, Mar 30, 2025 at 11:56 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> >
> > * Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > > > So a better optimization I think would be to declare and implement
> > > > __sw_hweight32 with a different, less intrusive function call ABI
> > > > that
> > >
> > > With an external function, the ABI specifies the location of input
> > > argument and function result.
> >
> > This is all within the kernel, and __sw_hweight32() is implemented in
> > the kernel as well, entirely in assembly, and the ALTERNATIVE*() macros
> > are fully under our control as well - so we have full control over the
> > calling convention.
> 
> There is a minor issue with a generic prototype in <linux/bitops.h>,
> where we declare:
> 
> extern unsigned int __sw_hweight32(unsigned int w);
> extern unsigned long __sw_hweight64(__u64 w);
> 
> This creates a bit of mixup, so perhaps it is better to define and use
> an x86 specific function name.

Yes, I alluded to this complication:

> > For example, we could make a version of __sw_hweight32 that is a
> > largely no-clobber function that only touches a single register, which

That version of __sw_hweight32 would be a different symbol.

> > I'm not saying it's *worth* it for POPCNTL emulation alone:
> >
> >  - The code generation benefits might or might not be there. Needs to
> >    be examined.
> 
> Matching inputs with output will actually make the instruction
> "destructive", so the compiler will have to copy the input argument
> when it won't die in the instruction. This is not desirable.

Yeah, absolutely - it was mainly a demonstration that even 
single-clobber functions are possible. (There's even zero-clobber 
functions, like __fentry__)

> I think that adding a __POPCNT__ version (similar to my original 
> patch) would bring the most benefit, because we could use "rm" input 
> and "=r" output registers, without any constraints, enforced by 
> fallback function call. This is only possible with a new 
> -march=native functionality.

Yeah, -march=native might be nice for local tinkering, but it won't 
reach 99.999% of Linux users - so it's immaterial to this particular 
discussion.

Also, is POPCNTL the best example for this? Are there no other, more 
frequently used ALTERNATIVE() patching sites with function call 
alternatives that disturb the register state of important kernel 
functions? (And I don't know the answer.)

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ