linux-kernel - Re: [PATCH v2] x86/irq: Add additional unhandled IRQ debug information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1612082254390.3609@nanos>
Date:   Thu, 8 Dec 2016 23:39:48 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Prarit Bhargava <prarit@...hat.com>
cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        Rui Wang <rui.y.wang@...el.com>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Borislav Petkov <bp@...e.de>, Aaron Lu <aaron.lu@...el.com>
Subject: Re: [PATCH v2] x86/irq: Add additional unhandled IRQ debug
 information

On Tue, 6 Dec 2016, Prarit Bhargava wrote:

> The current unhandled IRQ warning doesn't output enough information on x86 to
> determine which device issued an interrupt.  For example,
> 
> irq 16: nobody cared (try booting with the "irqpoll" option)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #19
> Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.90 06/01/2016
>  ffff88041f803eb0 ffffffff81337a95 ffff880418607200 ffff88041860729c
>  ffff88041f803ed8 ffffffff810db845 ffff880418607200 0000000000000000
>  0000000000000010 ffff88041f803f10 ffffffff810dbb9f ffff880418607200
> Call Trace:
>  <IRQ>  [<ffffffff81337a95>] dump_stack+0x63/0x8e
>  [<ffffffff810db845>] __report_bad_irq+0x35/0xd0
>  [<ffffffff810dbb9f>] note_interrupt+0x20f/0x260
>  [<ffffffff810d8f05>] handle_irq_event_percpu+0x45/0x60
>  [<ffffffff810d8f4c>] handle_irq_event+0x2c/0x50
>  [<ffffffff810dc8ca>] handle_fasteoi_irq+0x8a/0x150
>  [<ffffffff8102ee0b>] handle_irq+0xab/0x130
>  [<ffffffff8109de6a>] ? atomic_notifier_call_chain+0x1a/0x20
>  [<ffffffff8103631d>] ? __exit_idle+0x2d/0x30
>  [<ffffffff81715bad>] do_IRQ+0x4d/0xd0
>  [<ffffffff81713c02>] common_interrupt+0x82/0x82
>  <EOI>  [<ffffffff815e3931>] ? cpuidle_enter_state+0xc1/0x280
>  [<ffffffff815e3924>] ? cpuidle_enter_state+0xb4/0x280
>  [<ffffffff815e3b27>] cpuidle_enter+0x17/0x20
>  [<ffffffff810c1813>] call_cpuidle+0x23/0x40
>  [<ffffffff810c1a79>] cpu_startup_entry+0x149/0x240
>  [<ffffffff81705fe7>] rest_init+0x77/0x80
>  [<ffffffff81d8a147>] start_kernel+0x495/0x4a2
>  [<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
>  [<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
>  [<ffffffff81d895d6>] x86_64_start_reservations+0x2a/0x2c
>  [<ffffffff81d89715>] x86_64_start_kernel+0x13d/0x14c

Quoting the full backtrace adds which value to this changelog?

> handlers:
> [<ffffffff8155b050>] usb_hcd_irq
> Disabling IRQ #16
> 
> The unhandled interrupt is issued against IRQ 16, however, in many cases the
> handlers listed do not have anything to do with the device that has issued the
> interrupt.
> 
> This patch

Please read and finally understand Documentation/SubmittingPatches. Hint:
Search for: 'This patch'

> adds weak function arch_irq_debug(), and adds an x86 specific
> definition.  The x86 version dumps the following information: the status
> of the IO APIC pin line associated with the IRQ, the state of the CPU's
> LAPIC, and the PCI devices which have their interrupt set.

Again: You are telling what the patch does, but not WHY. Why is all of this
useful and necessary to debug such a problem?

> This information is output:
> 
> Additional x86 information on IRQ #16 ...
> IRQ16 maps to APIC0 PIN16 : pin10, enabled , level, low , V(10), IRR(1), S(1), remapped, I(98013),  Z(0)

So PIN16 is pin10? Very intuitive.

It's enabled, which we already know because otherwise we would not have
seen that interrupt.

It's level and low, which can be retrieved from /proc/interrupts as well

I can figure out what IRR and remapped means, but V, S, I and Z are telling
us what and in which way are they helpful?

> This CPU APIC ID is 0, and cpu id is 0.

We already know that we are on CPU 0. And the APIC Id is known as well.

> CPU 0 LAPIC IRR dump:
> APIC IRR 7 (IRQs 255 to 224) = 0x20008000  [ 239 253 ]
> APIC IRR 6 (IRQs 223 to 192) = 0x0
> APIC IRR 5 (IRQs 191 to 160) = 0x0
> APIC IRR 4 (IRQs 159 to 128) = 0x0
> APIC IRR 3 (IRQs 127 to 96) = 0x0
> APIC IRR 2 (IRQs 95 to 64) = 0x0
> APIC IRR 1 (IRQs 63 to 32) = 0x0
> APIC IRR 0 (IRQs 31 to 0) = 0x0

And the full IRR state is useful because this interrupt is connected to
IRR(1), right?

> Disabling IRQ #16
> PCI devices with INTERRUPT STATUS set (bit 3 of PCI_STATUS register):
> possible unhandled irq source 0000:00:16.0: pin = 1
> possible unhandled irq source 0000:05:00.0: pin = 1

You are looking at that way after the trouble happened. What guarantees
that the interrupt status bit is still set?

And aside of that, if we know that usb_hcd_irq is an installed handler on
that line, then figuring out which other PCI devices are using the same
line is a very trivial excercise:

 lspci -vvv 

contains all the required information.

> -			printk(KERN_DEBUG "%s, remapped, I(%04X),  Z(%X)\n",

> +		pr_debug("%s, remapped, I(%04X),  Z(%X)\n",

While the original code prints always (with DEBUG level) your change makes
it depend on either

   - DEBUG being defined in that file

or

   - CONFIG_DYNAMIC_DEBUG being set and the APIC/IOAPIC pr_debugs being
     enabled via the kernel command line or the debugfs interface.

That's a real achievement as it breaks apic=debug and apic=verbose !

The only benefit is that it prints less useless information in your dump
with the downside that it just dumps even more useless emergency level
messages around the void.

Congratulations for following checkpatch.pl advice blindly!

So in order to use this one has either to recompile the kernel or reboot
with magic command line options, or hope that the wreckage only happens
after enabling magic bits in debugfs or a combination of that.

Of course you went a great length to document all of this and explain how a
sysadmin is supposed to enable it and then interpret the valuable extra
information.

Maybe you can provide a real coherent explanation why we need to break
existing debug facilities and when we unbreak them dump lots of useless
information instead of using easily to access useful information.

The time to cobble this together would have been better spent to document
how this can be debugged with existing tools and existing command line
options, if further information about pin association is really required.

Thanks,

	tglx