[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ZQGG+N3f7xDuoiNG1jY128pqaH0F4eLKO+fhvSNAbKfA@mail.gmail.com>
Date: Wed, 18 Jan 2023 22:47:06 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll
On Wed, Jan 18, 2023 at 10:18 PM Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> On Wed, 18 Jan 2023 16:07:03 +0100 Uros Bizjak <ubizjak@...il.com> wrote:
>
> > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > {set,clear}_bits_ll. x86 CMPXCHG instruction returns success in ZF
> > flag, so this change saves a compare after cmpxchg (and related move
> > instruction in front of cmpxchg).
> >
> > Also, try_cmpxchg implicitly assigns old *ptr value to "old"
> > when cmpxchg fails.
> >
> > Note that the value from *ptr should be read using READ_ONCE to prevent
> > the compiler from merging, refetching or reordering the read.
> >
> > The patch also declares these two functions inline, to ensure inlining.
>
> But why is that better? This adds a few hundred bytes more text, which
> has a cost.
Originally, both functions are inlined and the size of an object file
is (gcc version 12.2.1, x86_64):
text data bss dec hex filename
4661 480 0 5141 1415 genalloc-orig.o
When try_cmpxchg is used, gcc chooses to not inline set_bits_ll (its
estimate of code size is not very precise when multi-line assembly is
involved), resulting in:
text data bss dec hex filename
4705 488 0 5193 1449 genalloc-noinline.o
And with an inline added to avoid gcc's quirks:
text data bss dec hex filename
4629 480 0 5109 13f5 genalloc.o
Considering that these two changed functions are used only in
genalloc.o, adding inline qualifier is a win, also when comparing to
the original size.
Uros.
Powered by blists - more mailing lists