[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170327171500.4beef762@redhat.com>
Date: Mon, 27 Mar 2017 17:15:00 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Pankaj Gupta <pagupta@...hat.com>,
Tariq Toukan <ttoukan.linux@...il.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Tariq Toukan <tariqt@...lanox.com>, netdev@...r.kernel.org,
akpm@...ux-foundation.org, linux-mm <linux-mm@...ck.org>,
Saeed Mahameed <saeedm@...lanox.com>, brouer@...hat.com
Subject: Re: Page allocator order-0 optimizations merged
On Mon, 27 Mar 2017 07:15:18 -0700
Matthew Wilcox <willy@...radead.org> wrote:
> On Mon, Mar 27, 2017 at 02:39:47PM +0200, Jesper Dangaard Brouer wrote:
> >
> > +static __always_inline int in_irq_or_nmi(void)
> > +{
> > + return in_irq() || in_nmi();
> > +// XXX: hoping compiler will optimize this (todo verify) into:
> > +// #define in_irq_or_nmi() (preempt_count() & (HARDIRQ_MASK | NMI_MASK))
> > +
> > + /* compiler was smart enough to only read __preempt_count once
> > + * but added two branches
> > +asm code:
> > + │ mov __preempt_count,%eax
> > + │ test $0xf0000,%eax // HARDIRQ_MASK: 0x000f0000
> > + │ ┌──jne 2a
> > + │ │ test $0x100000,%eax // NMI_MASK: 0x00100000
> > + │ │↓ je 3f
> > + │ 2a:└─→mov %rbx,%rdi
> > +
> > + */
> > +}
>
> To be fair, you told the compiler to do that with your use of fancy-pants ||
> instead of optimisable |. Try this instead:
Thanks you! -- good point! :-)
> static __always_inline int in_irq_or_nmi(void)
> {
> return in_irq() | in_nmi();
> }
>
> 0000000000001770 <test_fn>:
> 1770: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax # 1777 <test_fn+0x7>
> 1773: R_X86_64_PC32 __preempt_count-0x4
> #define in_nmi() (preempt_count() & NMI_MASK)
> #define in_task() (!(preempt_count() & \
> (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> static __always_inline int in_irq_or_nmi(void)
> {
> return in_irq() | in_nmi();
> 1777: 25 00 00 1f 00 and $0x1f0000,%eax
> }
> 177c: c3 retq
> 177d: 0f 1f 00 nopl (%rax)
And I also verified it worked:
0.63 │ mov __preempt_count,%eax
│ free_hot_cold_page():
1.25 │ test $0x1f0000,%eax
│ ↓ jne 1e4
And this simplification also made the compiler change this into a
unlikely branch, which is a micro-optimization (that I will leave up to
the compiler).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists