linux-kernel - Re: do_IRQ: 1.55 No irq handler for vector (irq -1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1344379147.27383.29.camel@sbsiddha-desk.sc.intel.com>
Date:	Tue, 07 Aug 2012 15:39:07 -0700
From:	Suresh Siddha <suresh.b.siddha@...el.com>
To:	Borislav Petkov <bp@...64.org>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Robert Richter <robert.richter@....com>, mingo@...nel.org,
	hpa@...or.com, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	a.p.zijlstra@...llo.nl, tglx@...utronix.de,
	linux-tip-commits@...r.kernel.org
Subject: Re: do_IRQ: 1.55 No irq handler for vector (irq -1)

On Tue, 2012-08-07 at 22:57 +0200, Borislav Petkov wrote:
> The funny thing is, they deliver to all CPUs except the BSP.

Looking at your /proc/interrupts below, probably it is using some sort
of round-robin.

> Or maybe the BSP gets that IRQ too but it actually has a handler
> registered?

from /proc/interrupts you sent, bsp is also getting those.

> 
> Btw, I'm stabbing in the dark here - I have been purposefully and
> willfully keeping away from all the APIC debacle until now. I guess that
> carefree time is over :(.
> 
> > Certainly outside of x2apic mode I have seen that happen and that is why
> > the reservation in lowest priroity delivery mode was for the same vector
> > across all cpus.
> > 
> > This certainly looks like we have one irq going across multiple cpus
> > and the software simply appears unprepared for the irq to show up where
> > the irq is showing up.
> 
> The interesting thing is that this happens once per core early during
> boot and not anymore. I dropped the printk_ratelimit() in do_IRQ and
> still got those lines only once in dmesg.

What it says is the interrupts are arriving at the offline cpu's aswell.
In the pre 3.6-rc1 the vector that is assigned to the legacy irq's are
fixed (IRQ0_VECTOR, ...).

For the 3.6-rc1, we allow the vector to change when the IO-APIC starts
to handle and probably that is a bad idea given that this platform is
spraying interrupts (mostly timer?) on to the offline cpu's aswell. Pre
3.6 we handle those interrupts when we come online. Now  in the new
kernels, as the vector has changed when that irq is handled by the
io-apic mode we get a spurious no irq handler for vector message.

> The other funny thing is, irq 55 is not in /proc/interrupts:

'55' is the vector number. You have to add some debug code in the kernel
to identify what irq it used to belong to.

> 
>            CPU0       CPU1       CPU2       CPU3
>   0:         44          0          0          0   IO-APIC-edge      timer
>   1:          2          1          2          4   IO-APIC-edge      i8042
>   8:          6          7          6          6   IO-APIC-edge      rtc0
>   9:         22         25         24         21   IO-APIC-fasteoi   acpi
>  12:         31         23         30         30   IO-APIC-edge      i8042
>  16:         82         82         81        117   IO-APIC-fasteoi   snd_hda_intel
>  17:          0          1          1          0   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2
>  18:          3          6          8          8   IO-APIC-fasteoi   ohci_hcd:usb3, ohci_hcd:usb4, ohci_hcd:usb5
>  40:          0          0          0          0   PCI-MSI-edge      PCIe PME
>  41:          0          0          0          0   PCI-MSI-edge      PCIe PME
>  42:          0          0          0          0   PCI-MSI-edge      PCIe PME
>  43:          0          0          0          0   PCI-MSI-edge      PCIe PME
>  44:        675        662        676        690   PCI-MSI-edge      ahci
>  45:         41         44         38         41   PCI-MSI-edge      snd_hda_intel
>  46:      13484      13499      13501      13536   PCI-MSI-edge      eth0
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:      20719      21487      18015      16445   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:          0          0          0          0   Performance monitoring interrupts
> IWI:          0          0          0          0   IRQ work interrupts
> RTR:          0          0          0          0   APIC ICR read retries
> RES:      13744      12640      13425      12334   Rescheduling interrupts
> CAL:        571        790        539        801   Function call interrupts
> TLB:          0          0          0          0   TLB shootdowns
> TRM:          0          0          0          0   Thermal event interrupts
> THR:          0          0          0          0   Threshold APIC interrupts
> MCE:          0          0          0          0   Machine check exceptions
> MCP:         66         66         66         66   Machine check polls
> ERR:          0
> MIS:          0
> 
> so what is that thing?

And incase of Robert's SATA hang case, as we modify the vector when the
irq is handled by the io-apic, it sets cfg->move_in_progress during
setup_ioapic_irq() and later when we do setup_ioapic_dest() to update
the SMP affinity, we fail to update the RTE's as the
cfg->move_in_progress is still set (which gets cleared after the first
interrupt arrives).

And in case of Robert's system, all the interrupts go only to the last
cpu (cpu-7). As we fail to update the RTE's with the smp affinity in the
setup_ioapic_dest(), RTE is still pointing to cpu-0 (but the
vector_to_irq mapping is set on all the cpu's) and most likely Robert's
platform for some reason doesn't like it (though we don't see no irq
handler messages on Robert's platform).

Boris, Robert, can you check if the below patch makes both of your
systems happy again (essentially not allowing the vector to change for
legacy irq's, which also allows the RTE to be set correctly in the smp
case etc)? Based on your results and some more thinking, I will send a
detailed patch with changelog tomorrow.

 arch/x86/kernel/apic/io_apic.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a6c64aa..4b98610 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1356,6 +1356,15 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
 	if (!IO_APIC_IRQ(irq))
 		return;
 
+	/*
+	 * For legacy irqs, cfg->domain starts with cpu 0. Now that IO-APIC
+	 * can handle this irq and the apic driver is finialized at this point,
+	 * update the cfg->domain.
+	 */
+	if (irq < legacy_pic->nr_legacy_irqs &&
+	    cpumask_equal(cfg->domain, cpumask_of(0)))
+		apic->vector_allocation_domain(0, cfg->domain, cpu_online_mask);
+
 	if (assign_irq_vector(irq, cfg, apic->target_cpus()))
 		return;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/