[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyCkDTxEWPKYW5Z8VroCvp3K++fRdGu2kwY8NVJgWdL4Q@mail.gmail.com>
Date: Fri, 24 Mar 2017 12:17:28 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andy Lutomirski <luto@...capital.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Dmitry Vyukov <dvyukov@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg()
On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski <luto@...capital.net> wrote:
>
> Is there some hack like if __builtin_is_unescaped(*val) *val = old;
> that would work?
See my recent email suggesting a completely different interface, which
avoids this problem.
My interface generates:
0000000000000000 <T_refcount_inc>:
0: 8b 07 mov (%rdi),%eax
2: 83 f8 ff cmp $0xffffffff,%eax
5: 74 12 je 19 <T_refcount_inc+0x19>
7: 85 c0 test %eax,%eax
9: 74 0a je 15 <T_refcount_inc+0x15>
b: 8d 50 01 lea 0x1(%rax),%edx
e: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
12: 75 ee jne 2 <T_refcount_inc+0x2>
14: c3 retq
15: 31 c0 xor %eax,%eax
17: 0f 0b ud2
19: c3 retq
for PeterZ's test-case, which seems optimal.
Of course, PeterZ used -Os, which isn't actually very natural for the
kernel. Using -O2 I get something else. It turns out that my macro
should use
if (likely(__txchg_success)) goto success_label;
(that "likely()" is criticial) to make gcc not try to optimize for the
looping case.
So with that "likely()" fixed, with -O2 I get:
0000000000000000 <T_refcount_inc>:
0: 8b 07 mov (%rdi),%eax
2: 83 f8 ff cmp $0xffffffff,%eax
5: 74 0d je 14 <T_refcount_inc+0x14>
7: 85 c0 test %eax,%eax
9: 74 12 je 1d <T_refcount_inc+0x1d>
b: 8d 50 01 lea 0x1(%rax),%edx
e: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
12: 75 02 jne 16 <T_refcount_inc+0x16>
14: f3 c3 repz retq
16: 83 f8 ff cmp $0xffffffff,%eax
19: 75 ec jne 7 <T_refcount_inc+0x7>
1b: f3 c3 repz retq
1d: 31 c0 xor %eax,%eax
1f: 0f 0b ud2
21: c3 retq
which again looks pretty optimal (it did indeed actually generate
bigger but potentially higher-performance code by making the good case
be a fallthrough, and the unlikely case be a _forward_ jump that will
be predicted not-taken in the absense of other rpediction information.
(Of course, this also depends on the exact behavior that PeterZ's code
had, namely an exception for use-after-free, but a silent saturation)
Linus
Powered by blists - more mailing lists