[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACuRN0MV4zNj1rBTnppoSudy98aOj2Pj6Ld1+D8mz0fn8kxGtg@mail.gmail.com>
Date: Sat, 5 Jun 2021 17:02:44 +0900
From: Akira Tsukamoto <akira.tsukamoto@...il.com>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>,
Albert Ou <aou@...s.berkeley.edu>, Gary Guo <gary@...yguo.net>,
Nick Hu <nickhu@...estech.com>,
Nylon Chen <nylon7@...estech.com>,
linux-riscv@...ts.infradead.org,
Linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/1] riscv: better network performance with memcpy, uaccess
On Sat, Jun 5, 2021 at 1:19 AM Palmer Dabbelt <palmer@...belt.com> wrote:
>
> On Fri, 04 Jun 2021 02:53:33 PDT (-0700), akira.tsukamoto@...il.com wrote:
> > I am adding a cover letter to explain the history and details since
> > improvement is a combination with Gary's memcpy patch [1].
> >
> > Comparison of iperf3 benchmark results by applying Gary's memcpy patch and
> > my uaccess optimization patch. All results are from the same base kernel,
> > same rootfs and save BeagleV beta board.
> >
> > First left column : beaglev 5.13.rc4 kernel [2]
> > Second column : Added Palmer's memcpy in C + my uaccess patch [3]
> > Third column : Added Gary's memcpy + my uaccess patch [4]
> >
> > --- TCP recv ---
> > 686 Mbits/sec | 700 Mbits/sec | 904 Mbits/sec
> > 683 Mbits/sec | 701 Mbits/sec | 898 Mbits/sec
> > 695 Mbits/sec | 702 Mbits/sec | 905 Mbits/sec
> >
> > --- TCP send ---
> > 383 Mbits/sec | 390 Mbits/sec | 393 Mbits/sec
> > 384 Mbits/sec | 393 Mbits/sec | 392 Mbits/sec
> >
> > --- UDP send ---
> > 307 Mbits/sec | 358 Mbits/sec | 402 Mbits/sec
> > 307 Mbits/sec | 359 Mbits/sec | 402 Mbits/sec
> >
> > --- UDP recv ---
> > 630 Mbits/sec | 799 Mbits/sec | 875 Mbits/sec
> > 730 Mbits/sec | 796 Mbits/sec | 873 Mbits/sec
> >
> >
> > The uaccess patch is reducing pipeline stall of read after write (RAW)
> > by unroling load and store.
> > The main reason for using assembler inside uaccess.S is because the
> > __asm_to/copy_from_user() handling page fault must be done manually inside
> > the functions.
> >
> > The above result is combination from Gary $B!G (Bs memcpy speeding up
> > by reducing
> > the S-mode and M-mode switching and my uaccess reducing pipeline stall for
> > user space uses syscall with large data.
> >
> > We had a discussion of improving network performance on the BeagleV beta
> > board with Palmer.
> >
> > Palmer suggested to use C-based string routines, which checks the unaligned
> > address and use 8 bytes aligned copy if the both src and dest are aligned
> > and if not use the current copy function.
> >
> > The Gary's assembly version of memcpy is improving by not using unaligned
> > access in 64 bit boundary, uses shifting it after reading with offset of
> > aligned access, because every misaligned access is trapped and switches to
> > opensbi in M-mode. The main speed up is coming from avoiding S-mode (kernel)
> > and M-mode (opensbi) switching.
> >
> > Processing network packets require a lot of unaligned access for the packet
> > header, which is not able to change the design of the header format to be
> > aligned.
> > And user applications pass large packet data with send/recf() and sendto/
> > recvfrom() to repeat less function calls for reading and writing data for the
> > optimization.
>
> Makes sense. I'm still not opposed to moving to a C version, but it'd
> need to be a fairly complicated one. I think having a fast C memcpy
> would likely benefit a handful of architectures, as everything we're
> talking about is an algorithmic improvement that can be expressed in C.
>
> Given that the simple memcpy doesn't perform well for your workload, I'm
> fine taking the assembly version.
Thanks, for merging them.
I agree that having a fast C memcpy would benefit many architectures.
I will make the patches for lib/string.c by extending your memcpy and send
them after I finish other priorities. The current functions in lib/string.c
use a byte copy, while most linux capable cpus moved to 64 bits.
Akira
>
> Thanks!
>
> >
> > Akira
> >
> > [1] https://lkml.org/lkml/2021/2/16/778
> > [2] https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved
> > [3] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-palmer-string
> > [4] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-gary
> >
> > Akira Tsukamoto (1):
> > riscv: prevent pipeline stall in __asm_to/copy_from_user
> >
> > arch/riscv/lib/uaccess.S | 106 +++++++++++++++++++++++++++------------
> > 1 file changed, 73 insertions(+), 33 deletions(-)
Powered by blists - more mailing lists