[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d6b241979664402e907064245ebe5578@AcuMS.aculab.com>
Date: Thu, 3 Jun 2021 08:45:07 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Robin Murphy' <robin.murphy@....com>,
Sunil Kovvuri <sunil.kovvuri@...il.com>,
Oliver Swede <oli.swede@....com>
CC: Catalin Marinas <catalin.marinas@....com>,
"will@...nel.org" <will@...nel.org>,
"linux-arm-kernel@...ts.indradead.org"
<linux-arm-kernel@...ts.indradead.org>,
LKML <linux-kernel@...r.kernel.org>,
Sunil Goutham <sgoutham@...vell.com>,
George Cherian <gcherian@...vell.com>
Subject: RE: [PATCH v5 08/14] arm64: Import latest optimization of memcpy
From: Robin Murphy
> Sent: 01 June 2021 13:07
>
> On 2021-06-01 11:03, Sunil Kovvuri wrote:
> > On Mon, Sep 14, 2020 at 8:44 PM Oliver Swede <oli.swede@....com> wrote:
> >>
> >> From: Sam Tebbs <sam.tebbs@....com>
> >>
> >> Import the latest memcpy implementation into memcpy,
> >> copy_{from, to and in}_user.
> >> The implementation of the user routines is separated into two forms:
> >> one for when UAO is enabled and one for when UAO is disabled, with
> >> the two being chosen between with a runtime patch.
> >> This avoids executing the many NOPs emitted when UAO is disabled.
> >>
> >> The project containing optimized implementations for various library
> >> functions has now been renamed from 'cortex-strings' to
> >> 'optimized-routines', and the new upstream source is
> >> string/aarch64/memcpy.S as of commit 4c175c8be12 in
> >> https://github.com/ARM-software/optimized-routines.
> >>
...
> >
> > Do you have any performance data with this patch ?
> > I see these patches are still not pushed to mainline, any reasons ?
>
> Funny you should pick up on the 6-month-old thread days after I've been
> posting new versions of the relevant parts[1] :)
>
> I think this series mostly stalled on the complexity of the usercopy
> parts, which then turned into even more of a moving target anyway, hence
> why I decided to split it up.
It is also worth checking what kind of copy lengths the 'optimized'
routines are actually optimised for.
For instance a sendmsg() system call is likely to do at least 3 short
copy_from_user() requests before even thinking about reading the data buffer.
Even the costs of the comparisons to select between short/long copy
requests become significant on short copies.
I'm not sure you want to be calling
https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/memcpy.S
for 3 bytes!
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists