[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2236FBA76BA1254E88B949DDB74E612B41C4BA42@IRSMSX102.ger.corp.intel.com>
Date: Wed, 15 Feb 2017 11:17:19 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: Ingo Molnar <mingo@...nel.org>
CC: Peter Zijlstra <peterz@...radead.org>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"keescook@...omium.org" <keescook@...omium.org>,
"dwindsor@...il.com" <dwindsor@...il.com>,
"ishkamiel@...il.com" <ishkamiel@...il.com>
Subject: RE: [tip:locking/core] refcount_t: Introduce a special purpose
refcount type
> * Reshetova, Elena <elena.reshetova@...el.com> wrote:
>
> > > Subject: refcount: Out-of-line everything
> > > From: Peter Zijlstra <peterz@...radead.org>
> > > Date: Fri Feb 10 16:27:52 CET 2017
> > >
> > > Linus asked to please make this real C code.
> >
> > Perhaps a completely stupid question, but I am going to ask anyway since only
> > this way I can learn. What a real difference it makes? Or are we talking more
> > about styling and etc.? Is it because of size concerns? This way it is certainly
> > now done differently than rest of atomic and kref...
>
> It's about generated code size mostly.
>
> This inline function is way too large to be inlined:
>
> static inline __refcount_check
> bool refcount_add_not_zero(unsigned int i, refcount_t *r)
> {
> unsigned int old, new, val = atomic_read(&r->refs);
>
> for (;;) {
> if (!val)
> return false;
>
> if (unlikely(val == UINT_MAX))
> return true;
>
> new = val + i;
> if (new < val)
> new = UINT_MAX;
> old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> if (old == val)
> break;
>
> val = old;
> }
>
> REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated;
> leaking memory.\n");
>
> return true;
> }
>
> When used then this function generates this much code on x86-64 defconfig:
>
> 00000000000084d0 <test>:
> 84d0: 8b 0f mov (%rdi),%ecx
> 84d2: 55 push %rbp
> 84d3: 48 89 e5 mov %rsp,%rbp
>
> 84d6: 85 c9 test %ecx,%ecx |
> 84d8: 74 30 je 850a <test+0x3a> |
> 84da: 83 f9 ff cmp $0xffffffff,%ecx |
> 84dd: be ff ff ff ff mov $0xffffffff,%esi |
> 84e2: 75 04 jne 84e8 <test+0x18> |
> 84e4: eb 1d jmp 8503 <test+0x33> |
> 84e6: 89 c1 mov %eax,%ecx |
> 84e8: 8d 51 01 lea 0x1(%rcx),%edx |
> 84eb: 89 c8 mov %ecx,%eax |
> 84ed: 39 ca cmp %ecx,%edx |
> 84ef: 0f 42 d6 cmovb %esi,%edx |
> 84f2: f0 0f b1 17 lock cmpxchg %edx,(%rdi) |
> 84f6: 39 c8 cmp %ecx,%eax |
> 84f8: 74 09 je 8503 <test+0x33> |
> 84fa: 85 c0 test %eax,%eax |
> 84fc: 74 0c je 850a <test+0x3a> |
> 84fe: 83 f8 ff cmp $0xffffffff,%eax |
> 8501: 75 e3 jne 84e6 <test+0x16> |
> 8503: b8 01 00 00 00 mov $0x1,%eax |
>
> 8508: 5d pop %rbp
> 8509: c3 retq
> 850a: 31 c0 xor %eax,%eax
> 850c: 5d pop %rbp
> 850d: c3 retq
>
>
> (I've annotated the fastpath impact with '|'. Out of line code generally does not
> count.)
>
> It's about ~50 bytes of code per usage site. It might be better in some cases, but
> not by much.
>
> This is way above any sane inlining threshold. The 'unconditionally good' inlining
> threshold is at 1-2 instructions and below ~10 bytes of code.
>
> So for example refcount_set() and refcount_read() can stay inlined:
>
> static inline void refcount_set(refcount_t *r, unsigned int n)
> {
> atomic_set(&r->refs, n);
> }
>
> static inline unsigned int refcount_read(const refcount_t *r)
> {
> return atomic_read(&r->refs);
> }
>
>
> ... beacuse they compile into a single instruction with 2-5 bytes I$ overhead.
>
> Thanks,
>
> Ingo
Thank you very much Ingo for such detailed and nice explanation!
Best Regards,
Elena
Powered by blists - more mailing lists