[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxd1WGNBzSHeOGiXXdUD1GqDYv9PUNGdrdiGFwaX7HYJQ@mail.gmail.com>
Date: Thu, 19 Feb 2015 13:59:46 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Rafael David Tinoco <inaddy@...ntu.com>,
Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
Jiang Liu <jiang.liu@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <fweisbec@...il.com>,
Gema Gomez <gema.gomez-solano@...onical.com>,
Christopher Arges <chris.j.arges@...onical.com>,
"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: smp_call_function_single lockups
On Thu, Feb 19, 2015 at 12:29 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Now, what happens if we send an EOI for an ExtINT interrupt? It
> basically ends up being a spurious IPI. And I *think* that what
> normally happens is absolutely nothing at all. But if in addition to
> the ExtINT, there was a pending IPI (or other pending ISR bit set),
> maybe we lose interrupts..
>
> .. and it's entirely possible that I'm just completely full of shit.
> Who is the poor bastard who has worked most with things like ExtINT,
> and can educate me? I'm adding Ingo, hpa and Jiang Liu as primary
> contacts..
So quite frankly, trying to follow all the logic from do_IRQ() through
handle_irq() to the actual low-level handler, I just couldn't do it.
So instead, I wrote a patch to verify that the ISR bit is actually set
when we do ack_APIC_irq().
This was complicated by the fact that we don't actually pass in the
vector number at all to the acking, so 99% of the patch is just doing
that. A couple of places we don't really have a good vector number, so
I said "screw it, a negative value means that we won't check the ISR).
The attached patch is quite possibly garbage, but it gives an
interesting warning for me during i8042 probing, so who knows. Maybe
it actually shows a real problem - or maybe I just screwed up the
patch.
.. and maybe even if the patch is fine, it's actually never really a
problem to have spurious APIC ACK cycles. Maybe it cannot make
interrupts be ignored.
Anyway, the back-trace for the warning I get is during boot:
...
PNP: No PS/2 controller found. Probing ports directly.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./arch/x86/include/asm/apic.h:436
ir_ack_apic_edge+0x74/0x80()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.19.0-08857-g89d3fa45b4ad-dirty #2
Call Trace:
<IRQ>
dump_stack+0x45/0x57
warn_slowpath_common+0x80/0xc0
warn_slowpath_null+0x15/0x20
ir_ack_apic_edge+0x74/0x80
handle_edge_irq+0x51/0x110
handle_irq+0x74/0x140
do_IRQ+0x4a/0x140
common_interrupt+0x6a/0x6a
<EOI>
? _raw_spin_unlock_irqrestore+0x9/0x10
__setup_irq+0x239/0x5a0
request_threaded_irq+0xc2/0x180
i8042_probe+0x5b8/0x680
platform_drv_probe+0x2f/0xa0
driver_probe_device+0x8b/0x3e0
__driver_attach+0x93/0xa0
bus_for_each_dev+0x63/0xa0
driver_attach+0x19/0x20
bus_add_driver+0x178/0x250
driver_register+0x5f/0xf0
__platform_driver_register+0x45/0x50
__platform_driver_probe+0x26/0xa0
__platform_create_bundle+0xad/0xe0
i8042_init+0x3d0/0x3f6
do_one_initcall+0xb8/0x1d0
kernel_init_freeable+0x16d/0x1fa
kernel_init+0x9/0xf0
ret_from_fork+0x7c/0xb0
---[ end trace 1de82c4457c6a0f0 ]---
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
...
and it looks not entirely insane.
Is this worth looking at? Or is it something spurious? I might have
gotten the vectors wrong, and maybe the warning is not because the ISR
bit isn't set, but because I test the wrong bit.
Linus
View attachment "patch.diff" of type "text/plain" (13041 bytes)
Powered by blists - more mailing lists