[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-7b8d3a12-e223-4b69-a35a-617b0d7ac8f7@palmerdabbelt-glaptop>
Date: Wed, 04 Aug 2021 13:40:16 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: mcroce@...ux.microsoft.com, mcroce@...ux.microsoft.com
CC: linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org,
Paul Walmsley <paul.walmsley@...ive.com>,
aou@...s.berkeley.edu, Atish Patra <Atish.Patra@....com>,
kernel@...il.dk, akira.tsukamoto@...il.com, drew@...gleboard.org,
bmeng.cn@...il.com, David.Laight@...lab.com, guoren@...nel.org,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] riscv: use the generic string routines
On Tue, 03 Aug 2021 09:54:34 PDT (-0700), mcroce@...ux.microsoft.com wrote:
> On Mon, Jul 19, 2021 at 1:44 PM Matteo Croce <mcroce@...ux.microsoft.com> wrote:
>>
>> From: Matteo Croce <mcroce@...rosoft.com>
>>
>> Use the generic routines which handle alignment properly.
>>
>> These are the performances measured on a BeagleV machine for a
>> 32 mbyte buffer:
>>
>> memcpy:
>> original aligned: 75 Mb/s
>> original unaligned: 75 Mb/s
>> new aligned: 114 Mb/s
>> new unaligned: 107 Mb/s
>>
>> memset:
>> original aligned: 140 Mb/s
>> original unaligned: 140 Mb/s
>> new aligned: 241 Mb/s
>> new unaligned: 241 Mb/s
>>
>> TCP throughput with iperf3 gives a similar improvement as well.
>>
>> This is the binary size increase according to bloat-o-meter:
>>
>> add/remove: 0/0 grow/shrink: 4/2 up/down: 432/-36 (396)
>> Function old new delta
>> memcpy 36 324 +288
>> memset 32 148 +116
>> strlcpy 116 132 +16
>> strscpy_pad 84 96 +12
>> strlcat 176 164 -12
>> memmove 76 52 -24
>> Total: Before=1225371, After=1225767, chg +0.03%
>>
>> Signed-off-by: Matteo Croce <mcroce@...rosoft.com>
>> Signed-off-by: Emil Renner Berthing <kernel@...il.dk>
>> ---
>
> Hi,
>
> can someone have a look at this change and share opinions?
This LGTM. How are the generic string routines landing? I'm happy to
take this into my for-next, but IIUC we need the optimized generic
versions first so we don't have a performance regression falling back to
the trivial ones for a bit. Is there a shared tag I can pull in?
Powered by blists - more mailing lists