[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75Vc0s6bFmNWtnVJEowJsDUPO6RnNqPgnYFQmV6A2P0mtYQ@mail.gmail.com>
Date: Tue, 12 Jul 2022 17:08:06 +0200
From: Andy Shevchenko <andy.shevchenko@...il.com>
To: Yu-Jen Chang <arthurchang09@...il.com>
Cc: Andrey Semashev <andrey.semashev@...il.com>,
Andy Shevchenko <andy@...nel.org>,
Akinobu Mita <akinobu.mita@...il.com>,
Ching-Chun Huang <jserv@...s.ncku.edu.tw>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] lib/string.c: Optimize memchr()
On Tue, Jul 12, 2022 at 4:58 PM Yu-Jen Chang <arthurchang09@...il.com> wrote:
> Andrey Semashev <andrey.semashev@...il.com> 於 2022年7月11日 週一 晚上11:00寫道:
> > On 7/11/22 17:52, Yu-Jen Chang wrote:
> > > Andrey Semashev <andrey.semashev@...il.com> 於 2022年7月11日 週一 凌晨4:01寫道:
> > >> On 7/10/22 17:28, Yu-Jen Chang wrote:
...
> > >>> + for (; p <= end - 8; p += 8) {
> > >>> + val = *(u64 *)p ^ mask;
> > >>
> > >> What if p is not aligned to 8 (or 4 on 32-bit targets) bytes? Not all
> > >> targets support (efficient) unaligned loads, do they?
> > >
> > > I think it works if p is not aligned to 8 or 4 bytes.
> > >
> > > Let's say the string is 10 bytes. The for loop here will search the first
> > > 8 bytes. If the target character is in the last 2 bytes, the second for
> > > loop will find it. It also work like this on 32-bit machine.
> >
> > I think you're missing the point. Loads at unaligned addresses may not
> > be allowed by hardware using conventional load instructions or may be
> > inefficient. Given that this memchr implementation is used as a fallback
> > when no hardware-specific version is available, you should be
> > conservative wrt. hardware capabilities and behavior. You should
> > probably have a pre-alignment loop.
>
> Got it. I add pre-alignment loop. It aligns the address to 8 or 4bytes.
Still far from what can be accepted. Have you had a chance to read how
strscpy() is implemented? Do you understand why it's done that way?
> void *memchr(const void *p, int c, size_t length)
> {
> u64 mask, val;
> const void *end = p + length;
> c &= 0xff;
> while ((long ) p & (sizeof(long) - 1)) {
> if (p >= end)
> return NULL;
> if (*(unsigned char *)p == c)
> return (void *) p;
> p++;
> }
> if (p <= end - 8) {
> mask = c;
> MEMCHR_MASK_GEN(mask);
>
> for (; p <= end - 8; p += 8) {
Why you decided that this code will be run explicitly on 64-bit arch?
> val = *(u64*)p ^ mask;
> if ((val + 0xfefefefefefefeffull)
> & (~val & 0x8080808080808080ull))
> break;
> }
> }
>
> for (; p < end; p++)
> if (*(unsigned char *)p == c)
> return (void *)p;
>
> return NULL;
> }
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists