[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250321204723.1e21cb23@pumpkin>
Date: Fri, 21 Mar 2025 20:47:23 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mateusz Guzik <mjguzik@...il.com>, x86@...nel.org, hkrzesin@...hat.com,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, olichtne@...hat.com,
atomasov@...hat.com, aokuliar@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an
overlapping store
On Thu, 20 Mar 2025 16:53:32 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Thu, 20 Mar 2025 at 14:17, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Thu, 20 Mar 2025 at 12:33, Mateusz Guzik <mjguzik@...il.com> wrote:
> > >
> > > I have a recollection that handling the tail after rep movsq with an
> > > overlapping store was suffering a penalty big enough to warrant a
> > > "normal" copy instead, avoiding the just written to area.
> >
> > Ahh. Good point. The rep movsq might indeed end up having odd effects
> > with subsequent aliasing memory operations.
> >
> > Consider myself convinced.
>
> Actually, I think there's a solution for this.
>
> Do not do the last 0-7 bytes as a word that overlaps with the tail of
> the 'rep movs'
>
> Do the last 8-15 bytes *non-overlapping* (well, they overlap each
> other, but not the 'rep movs')
>
> Something UNTESTED like the appended, in other words. The large case
> then ends up without any conditionals, looking something like this:
>
> mov %rcx,%rax
> shr $0x3,%rcx
> dec %rcx
> and $0x7,%eax
> rep movsq %ds:(%rsi),%es:(%rdi)
> mov (%rsi),%rcx
> mov %rcx,(%rdi)
> mov (%rsi,%rax,1),%rcx
> mov %rcx,(%rdi,%rax,1)
> xor %ecx,%ecx
> ret
I think you can save the 'tail end' copying the same 8 bytes twice by doing:
sub $9,%rcx
mov %rcx,%rax
shr $3,%rcx
and $7,%rax
inc %rax
before the 'rep movsq'.
David
>
> with some added complexity - but not a lot - in the exception fixup cases.
>
> This is once again intentionally whitespace-damaged, because I don't
> want people applying this mindlessly. Somebody needs to double-check
> my logic, and verify that this also avoids the cost from the aliasing
> with the rep movs.
>
> Linus
...
Powered by blists - more mailing lists