[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170822190039.519c25bc@redhat.com>
Date: Tue, 22 Aug 2017 19:00:39 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org, Jiri Olsa <jolsa@...nel.org>,
Ingo Molnar <mingo@...nel.org>, brouer@...hat.com
Subject: Re: [PATCH] trace: adjust code layout in get_recursion_context
On Tue, 22 Aug 2017 17:20:25 +0200
Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, Aug 22, 2017 at 05:14:10PM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 22, 2017 at 04:40:24PM +0200, Jesper Dangaard Brouer wrote:
> > > In an XDP redirect applications using tracepoint xdp:xdp_redirect to
> > > diagnose TX overrun, I noticed perf_swevent_get_recursion_context()
> > > was consuming 2% CPU. This was reduced to 1.6% with this simple
> > > change.
> >
> > It is also incorrect. What do you suppose it now returns when the NMI
> > hits a hard IRQ which hit during a Soft IRQ?
>
> Does this help any? I can imagine the compiler could struggle to CSE
> preempt_count() seeing how its an asm thing.
Nope, it does not help (see assembly below, with perf percentages).
But I think I can achieve that I want by a simple unlikely(in_nmi()) annotation.
> ---
> kernel/events/internal.h | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/internal.h b/kernel/events/internal.h
> index 486fd78eb8d5..e0b5b8fa83a2 100644
> --- a/kernel/events/internal.h
> +++ b/kernel/events/internal.h
> @@ -206,13 +206,14 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs);
>
> static inline int get_recursion_context(int *recursion)
> {
> + unsigned int pc = preempt_count();
> int rctx;
>
> - if (in_nmi())
> + if (pc & NMI_MASK)
> rctx = 3;
> - else if (in_irq())
> + else if (pc & HARDIRQ_MASK)
> rctx = 2;
> - else if (in_softirq())
> + else if (pc & SOFTIRQ_OFFSET)
Hmmm... shouldn't this be SOFTIRQ_MASK?
> rctx = 1;
> else
> rctx = 0;
perf_swevent_get_recursion_context /proc/kcore
│
│
│ Disassembly of section load0:
│
│ ffffffff811465c0 <load0>:
13.32 │ push %rbp
1.43 │ mov $0x14d20,%rax
5.12 │ mov %rsp,%rbp
6.56 │ add %gs:0x7eec3b5d(%rip),%rax
0.72 │ lea 0x34(%rax),%rdx
0.31 │ mov %gs:0x7eec5db2(%rip),%eax
2.46 │ mov %eax,%ecx
6.86 │ and $0x7fffffff,%ecx
0.72 │ test $0x100000,%eax
│ ↓ jne 40
│ test $0xf0000,%eax
0.41 │ ↓ je 5b
│ mov $0x8,%ecx
│ mov $0x2,%eax
│ ↓ jmp 4a
│40: mov $0xc,%ecx
│ mov $0x3,%eax
2.05 │4a: add %rcx,%rdx
16.60 │ mov (%rdx),%ecx
2.66 │ test %ecx,%ecx
│ ↓ jne 6d
1.33 │ movl $0x1,(%rdx)
1.54 │ pop %rbp
4.51 │ ← retq
3.89 │5b: shr $0x8,%ecx
9.53 │ and $0x1,%ecx
0.61 │ movzbl %cl,%eax
0.92 │ movzbl %cl,%ecx
4.30 │ shl $0x2,%rcx
14.14 │ ↑ jmp 4a
│6d: mov $0xffffffff,%eax
│ pop %rbp
│ ← retq
│ xchg %ax,%ax
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists