linux-kernel - Re: [PATCH] x86/asm/64: Align start of __clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200618210151.GA2212102@localhost.localdomain>
Date:   Fri, 19 Jun 2020 00:01:51 +0300
From:   Alexey Dobriyan <adobriyan@...il.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     'Matt Fleming' <matt@...eblueprint.co.uk>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Grimm, Jon" <Jon.Grimm@....com>,
        "Kumar, Venkataramanan" <Venkataramanan.Kumar@....com>,
        Jan Kara <jack@...e.cz>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] x86/asm/64: Align start of __clear_user() loop to
 16-bytes

On Thu, Jun 18, 2020 at 04:39:35PM +0000, David Laight wrote:
> From: Alexey Dobriyan 
> > Sent: 18 June 2020 14:17
> ...
> > > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
> > > > index fff28c6f73a2..b0dfac3d3df7 100644
> > > > --- a/arch/x86/lib/usercopy_64.c
> > > > +++ b/arch/x86/lib/usercopy_64.c
> > > > @@ -24,6 +24,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
> > > >  	asm volatile(
> > > >  		"	testq  %[size8],%[size8]\n"
> > > >  		"	jz     4f\n"
> > > > +		"	.align 16\n"
> > > >  		"0:	movq $0,(%[dst])\n"
> > > >  		"	addq   $8,%[dst]\n"
> > > >  		"	decl %%ecx ; jnz   0b\n"
> > >
> > > You can do better that that loop.
> > > Change 'dst' to point to the end of the buffer, negate the count
> > > and divide by 8 and you get:
> > > 		"0:	movq $0,($[dst],%%ecx,8)\n"
> > > 		"	add $1,%%ecx"
> > > 		"	jnz 0b\n"
> > > which might run at one iteration per clock especially on cpu that pair
> > > the add and jnz into a single uop.
> > > (You need to use add not inc.)
> > 
> > /dev/zero should probably use REP STOSB etc just like everything else.
> 
> Almost certainly it shouldn't, and neither should anything else.
> Potentially it could use whatever memset() is patched to.
> That MIGHT be 'rep stos' on some cpu variants, but in general
> it is slow.

Yes, that's what I meant: alternatives choosing REP variant.
memset loops are so 21-st century.