linux-kernel - Re: locking/atomic: Introduce atomic_try

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyCkDTxEWPKYW5Z8VroCvp3K++fRdGu2kwY8NVJgWdL4Q@mail.gmail.com>
Date:   Fri, 24 Mar 2017 12:17:28 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg()

On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski <luto@...capital.net> wrote:
>
> Is there some hack like if __builtin_is_unescaped(*val) *val = old;
> that would work?

See my recent email suggesting a completely different interface, which
avoids this problem.

My interface generates:

0000000000000000 <T_refcount_inc>:
   0: 8b 07                 mov    (%rdi),%eax
   2: 83 f8 ff             cmp    $0xffffffff,%eax
   5: 74 12                 je     19 <T_refcount_inc+0x19>
   7: 85 c0                 test   %eax,%eax
   9: 74 0a                 je     15 <T_refcount_inc+0x15>
   b: 8d 50 01             lea    0x1(%rax),%edx
   e: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
  12: 75 ee                 jne    2 <T_refcount_inc+0x2>
  14: c3                   retq
  15: 31 c0                 xor    %eax,%eax
  17: 0f 0b                 ud2
  19: c3                   retq

for PeterZ's test-case, which seems optimal.

Of course, PeterZ used -Os, which isn't actually very natural for the
kernel. Using -O2 I get something else. It turns out that my macro
should use

        if (likely(__txchg_success)) goto success_label;

(that "likely()" is criticial) to make gcc not try to optimize for the
looping case.

So with that "likely()" fixed, with -O2 I get:

0000000000000000 <T_refcount_inc>:
   0: 8b 07                 mov    (%rdi),%eax
   2: 83 f8 ff             cmp    $0xffffffff,%eax
   5: 74 0d                 je     14 <T_refcount_inc+0x14>
   7: 85 c0                 test   %eax,%eax
   9: 74 12                 je     1d <T_refcount_inc+0x1d>
   b: 8d 50 01             lea    0x1(%rax),%edx
   e: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
  12: 75 02                 jne    16 <T_refcount_inc+0x16>
  14: f3 c3                 repz retq
  16: 83 f8 ff             cmp    $0xffffffff,%eax
  19: 75 ec                 jne    7 <T_refcount_inc+0x7>
  1b: f3 c3                 repz retq
  1d: 31 c0                 xor    %eax,%eax
  1f: 0f 0b                 ud2
  21: c3                   retq

which again looks pretty optimal (it did indeed actually generate
bigger but potentially higher-performance code by making the good case
be a fallthrough, and the unlikely case be a _forward_ jump that will
be predicted not-taken in the absense of other rpediction information.

(Of course, this also depends on the exact behavior that PeterZ's code
had, namely an exception for use-after-free, but a silent saturation)

            Linus