linux-kernel - Re: [PATCH] x86: Optimize __get_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251126192325.2a017d11@pumpkin>
Date: Wed, 26 Nov 2025 19:23:25 +0000
From: david laight <david.laight@...box.com>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, Thomas Gleixner
 <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav Petkov
 <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin"
 <hpa@...or.com>
Subject: Re: [PATCH] x86: Optimize __get_user_asm() assembly

On Wed, 26 Nov 2025 17:22:36 +0100
Uros Bizjak <ubizjak@...il.com> wrote:

> On Wed, Nov 26, 2025 at 4:18 PM david laight <david.laight@...box.com> wrote:
> >
> > On Wed, 26 Nov 2025 14:23:32 +0100
> > Uros Bizjak <ubizjak@...il.com> wrote:
> >  
> > > On x86, byte and word moves (MOVB, MOVW) write only to the low
> > > portion of a 32-bit register, leaving the upper bits unchanged.
> > > Modern compilers therefore prefer using MOVZBL and MOVZWL to load
> > > 8-bit and 16-bit values with zero-extension to the full register
> > > width.
> > >
> > > Update __get_user_asm() to follow this convention by explicitly
> > > zero-extending 8-bit and 16-bit loads from memory.
> > >
> > > An additional benefit of this change is that it enables the full
> > > integer register set to be used for 8-bit loads. Also, it
> > > eliminates the need for manual zero-extension of 8-bit values.
> > >
> > > There is only a minimal increase in code size:  
> >
> > Interesting, where does that come from.
> > I'd have thought it should shrink it.
> > The mov[sz]x is one byte longer.
> > Perhaps a lot of 8-bit values are never zero-extended?  
> 
> bloat-o-meter says:
> 
> add/remove: 0/0 grow/shrink: 4/0 up/down: 16/0 (16)
> Function                                     old     new   delta
> strncpy_from_kernel_nofault                  151     158      +7
> copy_from_kernel_nofault                     207     214      +7
> strncpy_from_user                            218     219      +1
> fault_in_readable                            150     151      +1
> Total: Before=23919018, After=23919034, chg +0.00%
> 
> where the difference in copy_from_kernel_nofault() is expected:
> 
> -       66 8b 45 00             mov    0x0(%rbp),%ax
> +       0f b7 45 00             movzwl 0x0(%rbp),%eax
> ...
> -       8a 45 00                mov    0x0(%rbp),%al
> +       0f b6 45 00             movzbl 0x0(%rbp),%eax
> 
> and then some NOPs at the end due to function entry alignment.
> 
> > I'm trying to remember what the magic 'k' is for.
> > Does gas treat %krxx as %dxx ?
> > Which would matter if 'x' is 64bit when using "=r" for 32bit reads.
> > But, in that case, I think you need it on the 'movl' as well.  
> 
> %k is gcc operand modifier:
> 
> ‘k’        Print the SImode name of the register.
> 
> that overrides operand width and prints a 32-bit register name.

I can never find anything in that bit of the manual.
It would be easier if there were a separate section for each architecture.

So it does just change %rxx to %exx.

But the u32 code is much the same - so I doubt you need it at all.

Looks at the full file....
The 'x' parameter is __gu_val from __inttype(*(ptr)) __gu_val;
__inttype() is a 'horrid mess' that generates char/short/long/long long.

(So the "k" are actually changing the 16bit %bx to the 32bit %ebx (etc).)

But that isn't what you need for the 'movzbl' (etc).
At most you need to pick between 'long' and 'long long'.
Put the "k" in and you can use 'long long' all the time.

But you lose at the final step:
	(x) = (__force __typeof__(*(ptr))_gu_val;
all your hard work goes away.

If the usage was:
	val = unsafe_get_user(ptr, err_label);
you could return the 32bit register containing the byte value.
If the calling code assigned it to an int, then you'd save the zero extend.

This means the entire patch is pointless - even if you change the type
of 'x' to be 32bit for the size 8/16 cases.

It also explains why there are no ripple through changes.

> 
> > There is the slight problem (which affects for of the asm) is that
> > gcc (I think) can't assume that the high 32bits of a register result
> > from an inline asm function are zero.  
> 
> No, the compiler can't assume that. It doesn't know that highpart is
> zero and ...
> 
> > But if you return 'u64' then everything at the call site has to
> > be extended as well.  
> 
> ... has to zero-extend from u32 to u64 even if we know highpart is zero.

I realised that later - so you can't win :-(

	David