lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFnufp2UaAEq8FCxSeX5xCOZYu4wJ783gy35RZF-D626XiF8MQ@mail.gmail.com>
Date:   Wed, 23 Jun 2021 00:00:06 +0200
From:   Matteo Croce <mcroce@...ux.microsoft.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Atish Patra <atish.patra@....com>,
        Emil Renner Berthing <kernel@...il.dk>,
        Akira Tsukamoto <akira.tsukamoto@...il.com>,
        Drew Fustini <drew@...gleboard.org>,
        Bin Meng <bmeng.cn@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v3 1/3] riscv: optimized memcpy

On Mon, Jun 21, 2021 at 4:26 PM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Thu, Jun 17, 2021 at 05:27:52PM +0200, Matteo Croce wrote:
> > +extern void *memcpy(void *dest, const void *src, size_t count);
> > +extern void *__memcpy(void *dest, const void *src, size_t count);
>
> No need for externs.
>

Right.

> > +++ b/arch/riscv/lib/string.c
>
> Nothing in her looks RISC-V specific.  Why doesn't this go into lib/ so
> that other architectures can use it as well.
>

Technically it could go into lib/ and be generic.
If you think it's worth it, I have just to handle the different
left/right shift because of endianness.

> > +#include <linux/module.h>
>
> I think you only need export.h.
>

Nice.

> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > +     const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +     const int mask = bytes_long - 1;
> > +     const int distance = (src - dest) & mask;
> > +#endif
> > +     union const_types s = { .u8 = src };
> > +     union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +     if (count < MIN_THRESHOLD)
>
> Using IS_ENABLED we can avoid a lot of the mess in this
> function.
>
>         int distance = 0;
>
>         if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
>                 if (count < MIN_THRESHOLD)
>                         goto copy_remainder;
>
>                 /* copy a byte at time until destination is aligned */
>                 for (; count && d.uptr & mask; count--)
>                         *d.u8++ = *s.u8++;
>                 distance = (src - dest) & mask;
>         }
>

Cool. What about putting this check in the very start:

        if (count < MIN_THRESHOLD)
                goto copy_remainder;

And since count is at least twice bytes_long, remove count from the check below?
Also, setting distance after d is aligned is as simple as getting the
lower bits of s:

        if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
                /* Copy a byte at time until destination is aligned. */
                for (; d.uptr & mask; count--)
                        *d.u8++ = *s.u8++;

                distance = s.uptr & mask;
        }

>         if (distance) {
>                 ...
>
> > +             /* 32/64 bit wide copy from s to d.
> > +              * d is aligned now but s is not, so read s alignment wise,
> > +              * and do proper shift to get the right value.
> > +              * Works only on Little Endian machines.
> > +              */
>
> Normal kernel comment style always start with a:
>

Right, I was used to netdev ones :)

>                 /*
>
>
> > +             for (next = s.ulong[0]; count >= bytes_long + mask; count -= bytes_long) {
>
> Please avoid the pointlessly overlong line.  And (just as a matter of
> personal preference) I find for loop that don't actually use a single
> iterator rather confusing.  Wjy not simply:
>
>                 next = s.ulong[0];
>                 while (count >= bytes_long + mask) {
>                         ...
>                         count -= bytes_long;
>                 }

My fault, in a previous version it was:

    next = s.ulong[0];
    for (; count >= bytes_long + mask; count -= bytes_long) {

So to have a single `count` counter for the loop.

Regards,
-- 
per aspera ad upstream

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ