lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200107223606.GA32598@agluck-desk2.amr.corp.intel.com>
Date:   Tue, 7 Jan 2020 14:36:06 -0800
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/cpufeatures: Add support for fast short rep mov

On Tue, Jan 07, 2020 at 07:40:03PM +0100, Borislav Petkov wrote:
> I don't mind this graph being part of the commit message - it shows
> nicely the speedup even if with some microbenchmark. Or you're not
> adding it just because it is a microbenchmark and not something more
> representative?

I'm not sure it should be archived forever in the commit message.
The benchmark was run on A-step silicon, so may not be representative
of production results.

> >  .Lmemmove_begin_forward:
> > +	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> 
> So the enhancement is for string lengths up to two cachelines. Why
> are you limiting this to 32 bytes?
> 
> I know, the function handles 32-bytes at a time but what I'd imagine
> here is having the fastest variant upfront which does REP; MOVSB for all
> lengths since FSRM means fast short strings and ERMS - and I'm strongly
> assuming here FSRM *implies* ERMS - means fast "longer" strings, so to
> speak, so FSRM would mean fast *all length* strings in the end, no?
> 
> Also, does the copy direction influence the FSRM's REP; MOVSB variant's
> performance? If not, you can do something like this:

Yes FSRM implies ERMS

You can't use REP MOVS for overlapping src/dst strings (not even with the fancy
newer, faster, shinier FSRM version). So your suggestion will not work.

The old memmove code looked something like:

	if (len < 32)
		copy tail (backwards ... 8/4/2/1 bytes. works for both overlap & non-overlap case)
		return
	else if overlap src/dst
		copy backwards 32-byte unrolled
		copy tail
		return
	else if (ERMS)
		REP MOVS;
		return
	else
		unrolled copy 32-byte
		copy tail

The new one with my changes looks something like

	if (! overlap src/dst)
		if (FSRM)
			rep movs
			return
		if (len < 32)
			copy tail
			return
		if (ERMS)
			rep movs
			return
		unrolled copy
	else
		if (len < 32)
			copy tail
			return
		copy backwards 32-byte unrolled
		copy tail

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ