linux-kernel - Re: [PATCH] riscv: use the generic string routines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAFnufp2M_9_TRxoXbRK0bggPXyTgffYnA4moez=uWDNNb=aT8w@mail.gmail.com>
Date:   Sun, 19 Sep 2021 21:13:24 +0200
From:   Matteo Croce <mcroce@...ux.microsoft.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Guo Ren <guoren@...nel.org>, Palmer Dabbelt <palmer@...belt.com>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Atish Patra <Atish.Patra@....com>,
        Emil Renner Berthing <kernel@...il.dk>,
        Akira Tsukamoto <akira.tsukamoto@...il.com>,
        Drew Fustini <drew@...gleboard.org>,
        Bin Meng <bmeng.cn@...il.com>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] riscv: use the generic string routines

On Mon, Sep 13, 2021 at 1:35 PM David Laight <David.Laight@...lab.com> wrote:
>
> > > These ended up getting rejected by Linus, so I'm going to hold off on
> > > this for now.  If they're really out of lib/ then I'll take the C
> > > routines in arch/riscv, but either way it's an issue for the next
> > > release.
> > Agree, we should take the C routine in arch/riscv for common
> > implementation. If any vendor what custom implementation they could
> > use the alternative framework in errata for string operations.
>
> I though the asm ones were significantly faster because
> they were less affected by read latency.
>
> (But they were horribly broken for misaligned transfers.)
>

I can get the same exact performance (and a very similar machine code)
in C with this on top of the C memset implementation:

--- a/arch/riscv/lib/string.c
+++ b/arch/riscv/lib/string.c
@@ -112,9 +112,12 @@ EXPORT_SYMBOL(__memmove);
 void *memmove(void *dest, const void *src, size_t count) __weak
__alias(__memmove);
 EXPORT_SYMBOL(memmove);

+#define BATCH 4
+
 void *__memset(void *s, int c, size_t count)
 {
  union types dest = { .as_u8 = s };
+ int i;

  if (count >= MIN_THRESHOLD) {
  unsigned long cu = (unsigned long)c;
@@ -138,8 +141,12 @@ void *__memset(void *s, int c, size_t count)
  }

  /* Copy using the largest size allowed */
- for (; count >= BYTES_LONG; count -= BYTES_LONG)
- *dest.as_ulong++ = cu;
+ for (; count >= BYTES_LONG * BATCH; count -= BYTES_LONG * BATCH) {
+#pragma GCC unroll 4
+     for (i = 0; i < BATCH; i++)
+         dest.as_ulong[i] = cu;
+     dest.as_ulong += BATCH;
+ }
  }

On the BeagleV the memset speed with the different batch size are:

1 (stock): 267 Mb/s
2: 272 Mb/s
4: 276 Mb/s
8: 276 Mb/s

The problem with biggest batch size is that it will fallback to a
single byte copy if the buffers are too small.

Regards,
-- 
per aspera ad upstream