[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5156db7f-09a7-b0fa-d246-b024e40775fc@arm.com>
Date: Tue, 1 Jun 2021 13:06:32 +0100
From: Robin Murphy <robin.murphy@....com>
To: Sunil Kovvuri <sunil.kovvuri@...il.com>,
Oliver Swede <oli.swede@....com>
Cc: Catalin Marinas <catalin.marinas@....com>, will@...nel.org,
linux-arm-kernel@...ts.indradead.org,
LKML <linux-kernel@...r.kernel.org>,
Sunil Goutham <sgoutham@...vell.com>,
George Cherian <gcherian@...vell.com>
Subject: Re: [PATCH v5 08/14] arm64: Import latest optimization of memcpy
On 2021-06-01 11:03, Sunil Kovvuri wrote:
> On Mon, Sep 14, 2020 at 8:44 PM Oliver Swede <oli.swede@....com> wrote:
>>
>> From: Sam Tebbs <sam.tebbs@....com>
>>
>> Import the latest memcpy implementation into memcpy,
>> copy_{from, to and in}_user.
>> The implementation of the user routines is separated into two forms:
>> one for when UAO is enabled and one for when UAO is disabled, with
>> the two being chosen between with a runtime patch.
>> This avoids executing the many NOPs emitted when UAO is disabled.
>>
>> The project containing optimized implementations for various library
>> functions has now been renamed from 'cortex-strings' to
>> 'optimized-routines', and the new upstream source is
>> string/aarch64/memcpy.S as of commit 4c175c8be12 in
>> https://github.com/ARM-software/optimized-routines.
>>
>> Signed-off-by: Sam Tebbs <sam.tebbs@....com>
>> [ rm: add UAO fixups, streamline copy_exit paths, expand commit message ]
>> Signed-off-by: Robin Murphy <robin.murphy@....com>
>> [ os: import newer memcpy algorithm, update commit message ]
>> Signed-off-by: Oliver Swede <oli.swede@....com>
>> ---
>> arch/arm64/include/asm/alternative.h | 36 ---
>> arch/arm64/lib/copy_from_user.S | 113 ++++++--
>> arch/arm64/lib/copy_in_user.S | 129 +++++++--
>> arch/arm64/lib/copy_template.S | 375 +++++++++++++++------------
>> arch/arm64/lib/copy_template_user.S | 24 ++
>> arch/arm64/lib/copy_to_user.S | 112 ++++++--
>> arch/arm64/lib/copy_user_fixup.S | 14 +
>> arch/arm64/lib/memcpy.S | 47 ++--
>> 8 files changed, 557 insertions(+), 293 deletions(-)
>> create mode 100644 arch/arm64/lib/copy_template_user.S
>> create mode 100644 arch/arm64/lib/copy_user_fixup.S
>
> Do you have any performance data with this patch ?
> I see these patches are still not pushed to mainline, any reasons ?
Funny you should pick up on the 6-month-old thread days after I've been
posting new versions of the relevant parts[1] :)
I think this series mostly stalled on the complexity of the usercopy
parts, which then turned into even more of a moving target anyway, hence
why I decided to split it up.
> Also curious to know why 128bit registers are not considered, similar to
> https://android.googlesource.com/platform/bionic.git/+/a71b4c3f144a516826e8ac5b262099b920c49ce0/libc/arch-arm64/generic-neon/bionic/memcpy.S
The overhead of kernel_neon_begin() etc. is significant, and usually
only worth it in places like the crypto routines where there's enough
benefit from actual ASIMD computation to outweigh the save/restore cost.
On smaller cores where the L1 interface is only 128 bits wide anyway
there is no possible gain in memcpy() throughput to ever offset that
cost, and even for wider microarchitectures it's only likely to start
breaking even at relatively large copy sizes. Plus we can't necessarily
assume the ASIMD registers are even present (apparently the lack of a
soft-float ABI hasn't stopped people from wanting to run Linux on such
systems...)
Robin.
[1]
https://lore.kernel.org/linux-arm-kernel/cover.1622128527.git.robin.murphy@arm.com/
Powered by blists - more mailing lists