lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250321204723.1e21cb23@pumpkin>
Date: Fri, 21 Mar 2025 20:47:23 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mateusz Guzik <mjguzik@...il.com>, x86@...nel.org, hkrzesin@...hat.com,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, olichtne@...hat.com,
 atomasov@...hat.com, aokuliar@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an
 overlapping store

On Thu, 20 Mar 2025 16:53:32 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Thu, 20 Mar 2025 at 14:17, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Thu, 20 Mar 2025 at 12:33, Mateusz Guzik <mjguzik@...il.com> wrote:  
> > >
> > > I have a recollection that handling the tail after rep movsq with an
> > > overlapping store was suffering a penalty big enough to warrant a
> > > "normal" copy instead, avoiding the just written to area.  
> >
> > Ahh. Good point. The rep movsq might indeed end up having odd effects
> > with subsequent aliasing memory operations.
> >
> > Consider myself convinced.  
> 
> Actually, I think there's a solution for this.
> 
> Do not do the last 0-7 bytes as a word that overlaps with the tail of
> the 'rep movs'
> 
> Do the last 8-15 bytes *non-overlapping* (well, they overlap each
> other, but not the 'rep movs')
> 
> Something UNTESTED like the appended, in other words. The large case
> then ends up without any conditionals, looking something like this:
> 
>         mov    %rcx,%rax
>         shr    $0x3,%rcx
>         dec    %rcx
>         and    $0x7,%eax
>         rep movsq %ds:(%rsi),%es:(%rdi)
>         mov    (%rsi),%rcx
>         mov    %rcx,(%rdi)
>         mov    (%rsi,%rax,1),%rcx
>         mov    %rcx,(%rdi,%rax,1)
>         xor    %ecx,%ecx
>         ret

I think you can save the 'tail end' copying the same 8 bytes twice by doing:
	sub	$9,%rcx
	mov	%rcx,%rax
	shr	$3,%rcx
	and	$7,%rax
	inc	%rax
before the 'rep movsq'.

	David
	
> 
> with some added complexity - but not a lot - in the exception fixup cases.
> 
> This is once again intentionally whitespace-damaged, because I don't
> want people applying this mindlessly. Somebody needs to double-check
> my logic, and verify that this also avoids the cost from the aliasing
> with the rep movs.
> 
>                    Linus
...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ