linux-kernel - Re: [RFC][PATCH 5/7] x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whnTRF5yA2MrPGcmMm=hXaGHfC2HEDtNzA=_1=szhJ4-w@mail.gmail.com>
Date:   Tue, 24 Mar 2020 13:57:19 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH 5/7] x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()

On Tue, Mar 24, 2020 at 1:45 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> OK...  BTW, I'd been trying to recall the reasons for the way
> __futex_atomic_op2() loop is done; ISTR some discussion along
> the lines of cacheline ping-pong prevention, but I'd been unable
> to reconstruct enough details to find it and I'm not sure it
> hadn't been about some other code ;-/

No, that doesn't look like any cacheline advantage I can think of -
quite the reverse.

I suspect it's just lazy code, with the reload being unnecessary. Or
it might be written that way because you avoid an extra variable.

In fact, from a cacheline optimization standpoint, there are
advantages to not doing the load even on the initial run: if you know
a certain value is particularly likely, there are advantages to just
_assuming_ that value, rather than loading it. So no initial load at
all, and just initialize the first value to the likely case.

That can avoid an unnecessary "load for shared ownership" cacheline
state transition (since the cmpxchg will want to turn it into an
exclusive modified cacheline anyway).

But I don't think that optimization is likely the case here, and
you're right, the loop would be better written with the initial load
outside the loop.

           Linus