lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4Y8huj1iZo0yE9A7Ww5Z7WWoBK2KvG90s75OU-s7pL90Q@mail.gmail.com>
Date:   Wed, 18 Jan 2023 23:01:29 +0100
From:   Uros Bizjak <ubizjak@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mateusz Guzik <mjguzik@...il.com>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll

On Wed, Jan 18, 2023 at 10:55 PM Uros Bizjak <ubizjak@...il.com> wrote:
>
> On Wed, Jan 18, 2023 at 10:47 PM Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > On Wed, Jan 18, 2023 at 10:18 PM Andrew Morton
> > <akpm@...ux-foundation.org> wrote:
> > >
> > > On Wed, 18 Jan 2023 16:07:03 +0100 Uros Bizjak <ubizjak@...il.com> wrote:
> > >
> > > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > > > {set,clear}_bits_ll.  x86 CMPXCHG instruction returns success in ZF
> > > > flag, so this change saves a compare after cmpxchg (and related move
> > > > instruction in front of cmpxchg).
> > > >
> > > > Also, try_cmpxchg implicitly assigns old *ptr value to "old"
> > > > when cmpxchg fails.
> > > >
> > > > Note that the value from *ptr should be read using READ_ONCE to prevent
> > > > the compiler from merging, refetching or reordering the read.
> > > >
> > > > The patch also declares these two functions inline, to ensure inlining.
> > >
> > > But why is that better?  This adds a few hundred bytes more text, which
> > > has a cost.
> >
> > Originally, both functions are inlined and the size of an object file
> > is (gcc version 12.2.1, x86_64):
> >
> >   text    data     bss     dec     hex filename
> >   4661     480       0    5141    1415 genalloc-orig.o
> >
> > When try_cmpxchg is used, gcc chooses to not inline set_bits_ll (its
> > estimate of code size is not very precise when multi-line assembly is
> > involved), resulting in:
> >
> >   text    data     bss     dec     hex filename
> >   4705     488       0    5193    1449 genalloc-noinline.o
> >
> > And with an inline added to avoid gcc's quirks:
> >
> >   text    data     bss     dec     hex filename
> >   4629     480       0    5109    13f5 genalloc.o
> >
> > Considering that these two changed functions are used only in
> > genalloc.o, adding inline qualifier is a win, also when comparing to
> > the original size.
>
> BTW: Recently, it was determined [1] that the usage of cpu_relax()
> inside the cmpxchg loop can be harmful for performance. We actually
> have the same situation here, so perhaps cpu_relax() should be removed
> in the same way it was removed from the lockref.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5fe24ef17b5fbe6db49534163e77499fb10ae8c

I forgot to add some CCs that may be interested in the above.

Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ