lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jul 2022 17:08:06 +0200
From:   Andy Shevchenko <andy.shevchenko@...il.com>
To:     Yu-Jen Chang <arthurchang09@...il.com>
Cc:     Andrey Semashev <andrey.semashev@...il.com>,
        Andy Shevchenko <andy@...nel.org>,
        Akinobu Mita <akinobu.mita@...il.com>,
        Ching-Chun Huang <jserv@...s.ncku.edu.tw>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] lib/string.c: Optimize memchr()

On Tue, Jul 12, 2022 at 4:58 PM Yu-Jen Chang <arthurchang09@...il.com> wrote:
> Andrey Semashev <andrey.semashev@...il.com> 於 2022年7月11日 週一 晚上11:00寫道:
> > On 7/11/22 17:52, Yu-Jen Chang wrote:
> > > Andrey Semashev <andrey.semashev@...il.com> 於 2022年7月11日 週一 凌晨4:01寫道:
> > >> On 7/10/22 17:28, Yu-Jen Chang wrote:

...

> > >>> +             for (; p <= end - 8; p += 8) {
> > >>> +                     val = *(u64 *)p ^ mask;
> > >>
> > >> What if p is not aligned to 8 (or 4 on 32-bit targets) bytes? Not all
> > >> targets support (efficient) unaligned loads, do they?
> > >
> > > I think it works if p is not aligned to 8 or 4 bytes.
> > >
> > > Let's say the string is 10 bytes. The for loop here will search the first
> > > 8 bytes. If the target character is in the last 2 bytes, the second for
> > > loop will find it. It also work like this on 32-bit machine.
> >
> > I think you're missing the point. Loads at unaligned addresses may not
> > be allowed by hardware using conventional load instructions or may be
> > inefficient. Given that this memchr implementation is used as a fallback
> > when no hardware-specific version is available, you should be
> > conservative wrt. hardware capabilities and behavior. You should
> > probably have a pre-alignment loop.
>
> Got it. I add  pre-alignment loop. It aligns the address to 8 or 4bytes.

Still far from what can be accepted. Have you had a chance to read how
strscpy() is implemented? Do you understand why it's done that way?

> void *memchr(const void *p, int c, size_t length)
> {
>     u64 mask, val;
>     const void *end = p + length;
>     c &= 0xff;

>     while ((long ) p & (sizeof(long) - 1)) {
>         if (p >= end)
>             return NULL;
>         if (*(unsigned char *)p == c)
>             return (void *) p;
>         p++;
>     }
>     if (p <= end - 8) {
>         mask = c;
>         MEMCHR_MASK_GEN(mask);
>
>         for (; p <= end - 8; p += 8) {

Why you decided that this code will be run explicitly on 64-bit arch?

>             val = *(u64*)p ^ mask;
>             if ((val + 0xfefefefefefefeffull)
> & (~val & 0x8080808080808080ull))
>                 break;
>         }
>     }
>
>     for (; p < end; p++)
>         if (*(unsigned char *)p == c)
>             return (void *)p;
>
>     return NULL;
> }

-- 
With Best Regards,
Andy Shevchenko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ