linux-kernel - Re: BUG: ftrace/perf dropping events at the begin of interrupt handlers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e9375c7c-745d-fac3-6e16-539712ceaaea@redhat.com>
Date:   Fri, 14 Dec 2018 11:21:33 +0100
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Clark Williams <williams@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Marko Pusch <marko.pusch@...mens.com>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Rômulo Silva de Oliveira 
        <romulo.deoliveira@...c.br>, Ingo Molnar <mingo@...hat.com>
Subject: Re: BUG: ftrace/perf dropping events at the begin of interrupt
 handlers

On 12/4/18 8:16 PM, Steven Rostedt wrote:
> Yes, it's a simple fix. The problem is that the recursion detection of
> the function tracer requires that when its called from interrupt, the
> "in_interrupt" needs to be true, otherwise it thinks that the function
> tracer is recursing on itself (which is common).
> 
> Looking an the dropped events, and the code in __irq_enter() we have
> this:
> 
> #define __irq_enter()					\
> 	do {						\
> 		account_irq_enter_time(current);	\
> 		preempt_count_add(HARDIRQ_OFFSET);	\ <<-- in_interrupt() returns true here
> 		trace_hardirq_enter();			\
> 	} while (0)
> 
> Interesting enough, the dropped events happen to be in
> account_irq_enter_time()!
> 
> Thus what I believe is happening is that an interrupt came in while one
> event was being recorded. When account_irq_enter_time was called, the
> function tracer noticed that its recursion bit for the current context
> was already set, and just dropped the event because it thought it was
> just tracing itself. After we add HARDIRQ_OFFSET to preempt_count, the
> "in_interrupt()" will be set and the function tracer will know its in a
> new context where its safe to continue tracing.
> 
> Can you try this patch to see if it fixes it for you?

Hi Steve,

I finally took some time to play the patch, sorry for the delay. I got the idea
of the patch, but it is not working as expected :-(.

When I enable it, the system [a VM with 1 CPU] mostly freezes when I run that:

# while [ 1 ]; do echo > /dev/null; done &

I still need to investigate why.

The other point is that I got that the patch would start showing
account_irq_enter_time(). But, as far as I understood, it would not trace the
do_IRQ(). Right?

Wouldn't be the case of using a per-cpu variable to set the flag right in the
begin of the handler (in the entry*.s)?

Thoughts?

-- Daniel