lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whnTRF5yA2MrPGcmMm=hXaGHfC2HEDtNzA=_1=szhJ4-w@mail.gmail.com>
Date:   Tue, 24 Mar 2020 13:57:19 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH 5/7] x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()

On Tue, Mar 24, 2020 at 1:45 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> OK...  BTW, I'd been trying to recall the reasons for the way
> __futex_atomic_op2() loop is done; ISTR some discussion along
> the lines of cacheline ping-pong prevention, but I'd been unable
> to reconstruct enough details to find it and I'm not sure it
> hadn't been about some other code ;-/

No, that doesn't look like any cacheline advantage I can think of -
quite the reverse.

I suspect it's just lazy code, with the reload being unnecessary. Or
it might be written that way because you avoid an extra variable.

In fact, from a cacheline optimization standpoint, there are
advantages to not doing the load even on the initial run: if you know
a certain value is particularly likely, there are advantages to just
_assuming_ that value, rather than loading it. So no initial load at
all, and just initialize the first value to the likely case.

That can avoid an unnecessary "load for shared ownership" cacheline
state transition (since the cmpxchg will want to turn it into an
exclusive modified cacheline anyway).

But I don't think that optimization is likely the case here, and
you're right, the loop would be better written with the initial load
outside the loop.

           Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ