[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200618210151.GA2212102@localhost.localdomain>
Date: Fri, 19 Jun 2020 00:01:51 +0300
From: Alexey Dobriyan <adobriyan@...il.com>
To: David Laight <David.Laight@...lab.com>
Cc: 'Matt Fleming' <matt@...eblueprint.co.uk>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Grimm, Jon" <Jon.Grimm@....com>,
"Kumar, Venkataramanan" <Venkataramanan.Kumar@....com>,
Jan Kara <jack@...e.cz>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] x86/asm/64: Align start of __clear_user() loop to
16-bytes
On Thu, Jun 18, 2020 at 04:39:35PM +0000, David Laight wrote:
> From: Alexey Dobriyan
> > Sent: 18 June 2020 14:17
> ...
> > > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
> > > > index fff28c6f73a2..b0dfac3d3df7 100644
> > > > --- a/arch/x86/lib/usercopy_64.c
> > > > +++ b/arch/x86/lib/usercopy_64.c
> > > > @@ -24,6 +24,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
> > > > asm volatile(
> > > > " testq %[size8],%[size8]\n"
> > > > " jz 4f\n"
> > > > + " .align 16\n"
> > > > "0: movq $0,(%[dst])\n"
> > > > " addq $8,%[dst]\n"
> > > > " decl %%ecx ; jnz 0b\n"
> > >
> > > You can do better that that loop.
> > > Change 'dst' to point to the end of the buffer, negate the count
> > > and divide by 8 and you get:
> > > "0: movq $0,($[dst],%%ecx,8)\n"
> > > " add $1,%%ecx"
> > > " jnz 0b\n"
> > > which might run at one iteration per clock especially on cpu that pair
> > > the add and jnz into a single uop.
> > > (You need to use add not inc.)
> >
> > /dev/zero should probably use REP STOSB etc just like everything else.
>
> Almost certainly it shouldn't, and neither should anything else.
> Potentially it could use whatever memset() is patched to.
> That MIGHT be 'rep stos' on some cpu variants, but in general
> it is slow.
Yes, that's what I meant: alternatives choosing REP variant.
memset loops are so 21-st century.
Powered by blists - more mailing lists