lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250402182241.GY5880@noisy.programming.kicks-ass.net>
Date: Wed, 2 Apr 2025 20:22:41 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: mjguzik@...il.com, linux-kernel@...r.kernel.org, mingo@...hat.com,
	torvalds@...ux-foundation.org, x86@...nel.org
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Wed, Apr 02, 2025 at 07:17:03PM +0100, Andrew Cooper wrote:
> > Please make this a gcc bug-report instead - I really don't want to
> > have random compiler-specific tuning options in the kernel. Because
> > that whole memcpy-strategy thing is something that gets tuned by a lot
> > of other compiler options (ie -march and different versions).
> 
> I've discussed this with PeterZ in the past, although I can't for the
> life of me find the bugzilla ticket I thought I opened on the matter. 
> (Maybe I never got that far).
> 
> The behaviour wanted is:
> 
> 1) Convert to plain plain accesses (so they can be merged/combined/etc), or
> 2) Emit a library call
> 
> because we do provide forms that are better than the GCC-chosen "REP
> MOVSQ with manual alignment" in the general case.
> 
> Taking a leaf out of the repoline book, the ideal library call(s) would be:
> 
>     CALL __x86_thunk_rep_{mov,stos}sb
> 
> using the REP ABI (parameters in %rcx/%rdi/etc), rather than the SYSV ABI.
> 
> For current/future processors, which have fast reps of all short/zero
> flavours, we can even inline the REP {MOV,STO}S instruction to avoid the
> call.
> 
> For older microarchitectures, they can reuse the existing memcpy/memset
> implementations, just with marginally less parameter shuffling.
> 
> How does this sound?

Right, vague memories indeed. We do something like this manually for
copy_user_generic().

But it would indeed be very nice if the compiler were to emit such thunk
calls instead of doing rep whatever and then we can objtool collect the
locations and patch at runtime to be 'rep movs' or not, depending on
CPU flags etc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ