[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-kVT4ROZJXx6kui@gmail.com>
Date: Sun, 30 Mar 2025 11:56:31 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with
X86_NATIVE_CPU option
* Uros Bizjak <ubizjak@...il.com> wrote:
> > So a better optimization I think would be to declare and implement
> > __sw_hweight32 with a different, less intrusive function call ABI
> > that
>
> With an external function, the ABI specifies the location of input
> argument and function result.
This is all within the kernel, and __sw_hweight32() is implemented in
the kernel as well, entirely in assembly, and the ALTERNATIVE*() macros
are fully under our control as well - so we have full control over the
calling convention.
Ie. in principle there's no need for the __sw_hweight32 function
utilized by ALTERNATIVE() to be a C-call-ABI external function with all
its call-clobbering constraints that disturbs register state affected
by the C-call-ABI. (RSI RSI RDX RCX R8 R9)
The calling convention used is the kernel's choice, which we can
re-evaluate.
For example, we could make a version of __sw_hweight32 that is a
largely no-clobber function that only touches a single register, which
receives its input in RAX and returns the result to RAX (as usual), and
saves/restores everything else. This pushes overhead into the uncommon
case (__sw_hweight32 users) and reduces register pressure on the
calling site.
I'm not saying it's *worth* it for POPCNTL emulation alone:
- The code generation benefits might or might not be there. Needs to
be examined.
- There may be some trouble with on-stack red zones used by the
compiler, if the compiler doesn't know that a call was done.
- Plus rolling a different calling convention down the alternatives
patching macros will have some maintenance overhead side effects.
Possibly other usecases need to be found as well for this to be
worth it.
But I wanted to bust the false assumption you seem to be making about
C-call-ABI constraints.
Thanks,
Ingo
Powered by blists - more mailing lists