[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200107223606.GA32598@agluck-desk2.amr.corp.intel.com>
Date: Tue, 7 Jan 2020 14:36:06 -0800
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/cpufeatures: Add support for fast short rep mov
On Tue, Jan 07, 2020 at 07:40:03PM +0100, Borislav Petkov wrote:
> I don't mind this graph being part of the commit message - it shows
> nicely the speedup even if with some microbenchmark. Or you're not
> adding it just because it is a microbenchmark and not something more
> representative?
I'm not sure it should be archived forever in the commit message.
The benchmark was run on A-step silicon, so may not be representative
of production results.
> > .Lmemmove_begin_forward:
> > + ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
>
> So the enhancement is for string lengths up to two cachelines. Why
> are you limiting this to 32 bytes?
>
> I know, the function handles 32-bytes at a time but what I'd imagine
> here is having the fastest variant upfront which does REP; MOVSB for all
> lengths since FSRM means fast short strings and ERMS - and I'm strongly
> assuming here FSRM *implies* ERMS - means fast "longer" strings, so to
> speak, so FSRM would mean fast *all length* strings in the end, no?
>
> Also, does the copy direction influence the FSRM's REP; MOVSB variant's
> performance? If not, you can do something like this:
Yes FSRM implies ERMS
You can't use REP MOVS for overlapping src/dst strings (not even with the fancy
newer, faster, shinier FSRM version). So your suggestion will not work.
The old memmove code looked something like:
if (len < 32)
copy tail (backwards ... 8/4/2/1 bytes. works for both overlap & non-overlap case)
return
else if overlap src/dst
copy backwards 32-byte unrolled
copy tail
return
else if (ERMS)
REP MOVS;
return
else
unrolled copy 32-byte
copy tail
The new one with my changes looks something like
if (! overlap src/dst)
if (FSRM)
rep movs
return
if (len < 32)
copy tail
return
if (ERMS)
rep movs
return
unrolled copy
else
if (len < 32)
copy tail
return
copy backwards 32-byte unrolled
copy tail
-Tony
Powered by blists - more mailing lists