[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YM4pJpNphEwvUF2F@yury-ThinkPad>
Date: Sat, 19 Jun 2021 10:28:06 -0700
From: Yury Norov <yury.norov@...il.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Lucas Stach <l.stach@...gutronix.de>,
Russell King <linux+etnaviv@...linux.org.uk>,
Christian Gmeiner <christian.gmeiner@...il.com>,
David Airlie <airlied@...ux.ie>,
Daniel Vetter <daniel@...ll.ch>,
Jean Delvare <jdelvare@...e.com>,
Guenter Roeck <linux@...ck-us.net>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
David Woodhouse <dwmw@...zon.co.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Wei Yang <richard.weiyang@...ux.alibaba.com>,
Geert Uytterhoeven <geert+renesas@...der.be>,
Alexey Klimov <aklimov@...hat.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, etnaviv@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, linux-hwmon@...r.kernel.org
Subject: Re: [PATCH 2/3] find: micro-optimize for_each_{set,clear}_bit()
On Sat, Jun 19, 2021 at 05:24:15PM +0100, Marc Zyngier wrote:
> On Fri, 18 Jun 2021 20:57:34 +0100,
> Yury Norov <yury.norov@...il.com> wrote:
> >
> > The macros iterate thru all set/clear bits in a bitmap. They search a
> > first bit using find_first_bit(), and the rest bits using find_next_bit().
> >
> > Since find_next_bit() is called shortly after find_first_bit(), we can
> > save few lines of I-cache by not using find_first_bit().
>
> Really?
>
> >
> > Signed-off-by: Yury Norov <yury.norov@...il.com>
> > ---
> > include/linux/find.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/find.h b/include/linux/find.h
> > index 4500e8ab93e2..ae9ed52b52b8 100644
> > --- a/include/linux/find.h
> > +++ b/include/linux/find.h
> > @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
> > #endif
> >
> > #define for_each_set_bit(bit, addr, size) \
> > - for ((bit) = find_first_bit((addr), (size)); \
> > + for ((bit) = find_next_bit((addr), (size), 0); \
>
> On which architecture do you observe a gain? Only 32bit ARM and m68k
> implement their own version of find_first_bit(), and everyone else
> uses the canonical implementation:
And those who enable GENERIC_FIND_FIRST_BIT - x86, arm64, arc, mips
and s390.
> #ifndef find_first_bit
> #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
> #endif
>
> These architectures explicitly have different implementations for
> find_first_bit() and find_next_bit() because they can do better
> (whether that is true or not is another debate). I don't think you
> should remove this optimisation until it has been measured on these
> two architectures.
This patch is based on a series that enables separate implementation
of find_first_bit() for all architectures; according to my tests,
find_first* is ~ twice faster than find_next* on arm64 and x86.
https://lore.kernel.org/lkml/20210612123639.329047-1-yury.norov@gmail.com/T/#t
After applying the series, I noticed that my small kernel module that
calls for_each_set_bit() is now using find_first_bit() to just find
one bit, and find_next_bit() for all others. I think it's better to
always use find_next_bit() in this case to minimize the chance of
cache miss. But if it's not that obvious, I'll try to write some test.
Powered by blists - more mailing lists