lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-kVT4ROZJXx6kui@gmail.com>
Date: Sun, 30 Mar 2025 11:56:31 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with
 X86_NATIVE_CPU option


* Uros Bizjak <ubizjak@...il.com> wrote:

> > So a better optimization I think would be to declare and implement 
> > __sw_hweight32 with a different, less intrusive function call ABI 
> > that
> 
> With an external function, the ABI specifies the location of input 
> argument and function result.

This is all within the kernel, and __sw_hweight32() is implemented in 
the kernel as well, entirely in assembly, and the ALTERNATIVE*() macros 
are fully under our control as well - so we have full control over the 
calling convention.

Ie. in principle there's no need for the __sw_hweight32 function 
utilized by ALTERNATIVE() to be a C-call-ABI external function with all 
its call-clobbering constraints that disturbs register state affected 
by the C-call-ABI. (RSI RSI RDX RCX R8 R9)

The calling convention used is the kernel's choice, which we can 
re-evaluate.

For example, we could make a version of __sw_hweight32 that is a 
largely no-clobber function that only touches a single register, which 
receives its input in RAX and returns the result to RAX (as usual), and 
saves/restores everything else. This pushes overhead into the uncommon 
case (__sw_hweight32 users) and reduces register pressure on the 
calling site.

I'm not saying it's *worth* it for POPCNTL emulation alone:

 - The code generation benefits might or might not be there. Needs to 
   be examined.

 - There may be some trouble with on-stack red zones used by the 
   compiler, if the compiler doesn't know that a call was done.

 - Plus rolling a different calling convention down the alternatives 
   patching macros will have some maintenance overhead side effects. 
   Possibly other usecases need to be found as well for this to be 
   worth it.

But I wanted to bust the false assumption you seem to be making about 
C-call-ABI constraints.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ