[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67c2f851-9a4d-e18d-4664-c07287e72ebf@arm.com>
Date: Wed, 15 May 2019 12:02:43 +0100
From: Robin Murphy <robin.murphy@....com>
To: Will Deacon <will.deacon@....com>
Cc: Zhangshaokun <zhangshaokun@...ilicon.com>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
linux-arm-kernel@...ts.infradead.org, netdev@...r.kernel.org,
ilias.apalodimas@...aro.org,
"huanglingyan (A)" <huanglingyan2@...wei.com>, steve.capper@....com
Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version
On 15/05/2019 10:47, Will Deacon wrote:
> On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote:
>> On 12/04/2019 10:52, Will Deacon wrote:
>>> I'm waiting for Robin to come back with numbers for a C implementation.
>>>
>>> Robin -- did you get anywhere with that?
>>
>> Still not what I would call finished, but where I've got so far (besides an
>> increasingly elaborate test rig) is as below - it still wants some unrolling
>> in the middle to really fly (and actual testing on BE), but the worst-case
>> performance already equals or just beats this asm version on Cortex-A53 with
>> GCC 7 (by virtue of being alignment-insensitive and branchless except for
>> the loop). Unfortunately, the advantage of C code being instrumentable does
>> also come around to bite me...
>
> Is there any interest from anybody in spinning a proper patch out of this?
> Shaokun?
FWIW I've learned how to fix the KASAN thing now, so I'll try giving
this some more love while I've got other outstanding optimisation stuff
to look at anyway.
Robin.
Powered by blists - more mailing lists