[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUwYUzTk_453WaxOE32RD1i=wBdQUcsrc9OvvtmPtoQ_Q@mail.gmail.com>
Date: Fri, 24 Mar 2017 12:16:11 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Dmitry Vyukov <dvyukov@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg()
On Fri, Mar 24, 2017 at 11:13 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Mar 24, 2017 at 07:08:38PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 24, 2017 at 06:51:15PM +0100, Dmitry Vyukov wrote:
>> > On Fri, Mar 24, 2017 at 6:23 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> > > On Fri, Mar 24, 2017 at 09:54:46AM -0700, Andy Lutomirski wrote:
>> > >> > So the first snipped I tested regressed like so:
>> > >> >
>> > >> >
>> > >> > 0000000000000000 <T_refcount_inc>: 0000000000000000 <T_refcount_inc>:
>> > >> > 0: 8b 07 mov (%rdi),%eax 0: 8b 17 mov (%rdi),%edx
>> > >> > 2: 83 f8 ff cmp $0xffffffff,%eax 2: 83 fa ff cmp $0xffffffff,%edx
>> > >> > 5: 74 13 je 1a <T_refcount_inc+0x1a> 5: 74 1a je 21 <T_refcount_inc+0x21>
>> > >> > 7: 85 c0 test %eax,%eax 7: 85 d2 test %edx,%edx
>> > >> > 9: 74 0d je 18 <T_refcount_inc+0x18> 9: 74 13 je 1e <T_refcount_inc+0x1e>
>> > >> > b: 8d 50 01 lea 0x1(%rax),%edx b: 8d 4a 01 lea 0x1(%rdx),%ecx
>> > >> > e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) e: 89 d0 mov %edx,%eax
>> > >> > 12: 75 ee jne 2 <T_refcount_inc+0x2> 10: f0 0f b1 0f lock cmpxchg %ecx,(%rdi)
>> > >> > 14: ff c2 inc %edx 14: 74 04 je 1a <T_refcount_inc+0x1a>
>> > >> > 16: 75 02 jne 1a <T_refcount_inc+0x1a> 16: 89 c2 mov %eax,%edx
>> > >> > 18: 0f 0b ud2 18: eb e8 jmp 2 <T_refcount_inc+0x2>
>> > >> > 1a: c3 retq 1a: ff c1 inc %ecx
>> > >> > 1c: 75 03 jne 21 <T_refcount_inc+0x21>
>> > >> > 1e: 0f 0b ud2
>> > >> > 20: c3 retq
>> > >> > 21: c3 retq
>> > >>
>>
>> > This seems to help ;)
>> >
>> > #define try_cmpxchg(ptr, pold, new) __atomic_compare_exchange_n(ptr, pold, new, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)
>>
>> That gets me:
>>
>> 0000000000000000 <T_refcount_inc>:
>> 0: 8b 07 mov (%rdi),%eax
>> 2: 89 44 24 fc mov %eax,-0x4(%rsp)
>> 6: 8b 44 24 fc mov -0x4(%rsp),%eax
>> a: 83 f8 ff cmp $0xffffffff,%eax
>> d: 74 1c je 2b <T_refcount_inc+0x2b>
>> f: 85 c0 test %eax,%eax
>> 11: 75 07 jne 1a <T_refcount_inc+0x1a>
>> 13: 8b 44 24 fc mov -0x4(%rsp),%eax
>> 17: 0f 0b ud2
>> 19: c3 retq
>> 1a: 8d 50 01 lea 0x1(%rax),%edx
>> 1d: 8b 44 24 fc mov -0x4(%rsp),%eax
>> 21: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>> 25: 75 db jne 2 <T_refcount_inc+0x2>
>> 27: ff c2 inc %edx
>> 29: 74 e8 je 13 <T_refcount_inc+0x13>
>> 2b: c3 retq
>>
>>
>> Which is even worse... (I did double check it actually compiled)
>
> Not to mention we cannot use the C11 atomics in kernel because we want
> to be able to runtime patch LOCK prefixes when only 1 CPU is available.
Is this really a show-stopper? I bet that objtool could be persuaded
to emit a list of the locations of all those LOCK prefixes.
--Andy
Powered by blists - more mailing lists