linux-kernel - Re: [PATCH V2 2/2] trace,x86: Add nmi to the irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9fc0399e-ab53-1074-623a-a826e2466277@redhat.com>
Date:   Mon, 8 Apr 2019 14:24:47 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Andy Lutomirski <luto@...nel.org>,
        Clark Williams <williams@...hat.com>, x86@...nel.org
Subject: Re: [PATCH V2 2/2] trace,x86: Add nmi to the irq_vectors class

On 4/5/19 11:50 PM, Thomas Gleixner wrote:
> On Mon, 1 Apr 2019, Daniel Bristot de Oliveira wrote:
>> Currently, the irq_vector class of tracepoints does not include the NMI
>> entry. The NMI was in the first set of tracepoints for IRQs, but it was
>> dropped off because of the logic of switching IDT when enabling trace[1].
>> However, as the switching IDT logic was removed [2], it is possible to add
> 
> Well, not really. tracepoints inside of do_nmi() have been there for a long
> time.
> 
> The nasty part was that the irq_vector patches injected the tracepoints way
> down in the low level entry code, which caused more problems than it solved
> including the extra IDT switcheroo.

humm... the benefits of having the tracepoints at the very begin/end of the
handler is that it captures better the amount of interference/overhead it caused
in the current thread (and IRQ too in the case of NMIs).

>> the NMI tracepoints back.
> 
> So again we have two tracepoints. One for the entry and one for the
> handlers. Do we really need both? If so please provide some rationale.

For example:

  f-4447  [000] d.Z.  1253.162036: nmi_entry: vector=2
  f-4447  [000] d.Z.  1253.162038: nmi_handler: nmi_cpu_backtrace_handler()
delta_ns: 354 handled: 0
  f-4447  [000] d.Z.  1253.162267: nmi_exit: vector=2

The nmi_handler tracepoint points to an interference of 354 ns, what is very
good! But the difference of time between the two irq_vectors tracepoint is:

  1253.162267-1253.162036 = 231 us!

The number looks really odd, but tracing with function tracer, we see this:

Thread running:
  f-4447  [000] d.Z.  1487.689288: rcu_nmi_enter <-do_nmi
  f-4447  [000] d.Z.  1487.689289: rcu_dynticks_curr_cpu_in_eqs <-rcu_nmi_enter

NMI starts:
  f-4447  [000] d.Z.  1487.689290: nmi_entry: vector=2
  f-4447  [000] d.Z.  1487.689291: default_do_nmi <-do_nmi
  f-4447  [000] d.Z.  1487.689291: nmi_handle <-default_do_nmi
  f-4447  [000] d.Z.  1487.689291: nmi_cpu_backtrace_handler <-nmi_handle

It handles very fast:
  f-4447  [000] d.Z.  1487.689292: nmi_handler: nmi_cpu_backtrace_handler()
delta_ns: 780 handled: 0

But some other functions continues running in the NMI context:
[Dazed and confused, but trying to continue message ]
  f-4447  [000] d.Z.  1487.689292: _raw_spin_trylock <-default_do_nmi
  f-4447  [000] d.Z.  1487.689566: unknown_nmi_error <-default_do_nmi
  f-4447  [000] d.Z.  1487.689566: nmi_handle <-unknown_nmi_error
  f-4447  [000] d.Z.  1487.689567: printk <-unknown_nmi_error
  f-4447  [000] d.Z.  1487.689567: vprintk_func <-printk
  f-4447  [000] d.Z.  1487.689567: printk_safe_log_store <-vprintk_func
  f-4447  [000] d.Z.  1487.689569: arch_irq_work_raise <-irq_work_queue
  f-4447  [000] d.Z.  1487.689569: default_send_IPI_self <-arch_irq_work_raise
  f-4447  [000] d.Z.  1487.689569: __default_send_IPI_shortcut
<-default_send_IPI_self
  f-4447  [000] d.Z.  1487.689585: native_apic_wait_icr_idle <-arch_irq_work_raise
  f-4447  [000] d.Z.  1487.689588: printk <-unknown_nmi_error
  f-4447  [000] d.Z.  1487.689588: vprintk_func <-printk
  f-4447  [000] d.Z.  1487.689588: printk_safe_log_store <-vprintk_func
  f-4447  [000] d.Z.  1487.689588: printk <-unknown_nmi_error.cold.13
  f-4447  [000] d.Z.  1487.689588: vprintk_func <-printk
  f-4447  [000] d.Z.  1487.689588: printk_safe_log_store <-vprintk_func

Handler finishes:
  f-4447  [000] d.Z.  1487.689589: nmi_exit: vector=2
  f-4447  [000] d.Z.  1487.689589: rcu_nmi_exit <-do_nmi
  f-4447  [000] d.Z.  1487.689589: rcu_irq_exit <-rcu_nmi_exit

And it was actually also (potentially) delaying an timer:
  f-4447  [000] d...  1487.689590: smp_irq_work_interrupt <-irq_work_interrupt

So, the nmi_handle tracepoint would not make suspicious that the NMI caused the
delay, but the irq_vectors would (because of their position).

Finally, we would have to enable a single class of tracepoints to monitor all
interrupts.

PS: I am using the vanilla kernel in a vm, so the numbers are higher, but still,
they have the same scale in the bare metal.

Thoughts?

-- Daniel

>> The tracepoints looks like:
>>           <idle>-0     [000] d.Z.   179.594315: nmi_entry: vector=2
>>           <idle>-0     [000] d.Z.   179.594396: nmi_exit: vector=2
>>
>> [1] trace,x86: irq vector tracepoint support
>>     https://lwn.net/Articles/555465/
> 
> Please use https://lkml.kernel.org/r/$MESSAGE-ID
> 
>> [2] commit 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely")
>>
>> Changes from V1:
>>   - Fix a compilation problem when CONFIG_X86_LOCAL_APIC is not set (kbuild test)
> 
> Please put changes below the '---' separator. They are not part of the
> final changelog and having them below spares the maintainer the time to
> remove them manually.
> 
> Thanks,
> 
> 	tglx
>