lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120320094048.GC20079@phenom.dumpdata.com>
Date:	Tue, 20 Mar 2012 05:40:48 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Josh Boyer <jwboyer@...hat.com>
Cc:	Suresh Siddha <suresh.b.siddha@...el.com>,
	Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
	yinghai@...nel.org, linux-kernel@...r.kernel.org,
	kernel-team@...oraproject.org, midgoon@...il.com
Subject: Re: 3.2.1 Unable to reset IRR messages on boot

> > > > > Yes, it was helpful. Something like the appended patch should ignore the
> > > > > bogus io-apic entry all together. As I can't test this, can you or the
> > > > > reporter give the appended patch a try and ack please?
> > > > 
> > > > Hi Suresh,
> > > > 
> > > > Apologies for the delay.  The original reporter had to return the
> > > > machine he was using.  We've since had another report where this
> > > > happened and your patch below does indeed fix the issue.
> > > > 
> > > > I'd suggest pushing this soon.
> > > > 
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=801501
> > > > 
> > > 
> > > Thanks Josh. Peter/Ingo, please queue the appended patch for -tip.
> > 
> > Hi Suresh,
> > 
> > Seems this patch and Xen don't get along very well.  See the bug link
> > below.  I've CC'd Konrad and hopefully he'll have some insight as to why
> > that might be.
> 
> Quick glance at the code tells me that the 'mp_register_ioapic' with the
> patch won't increment the gsi_top. That value (gsi_top) is used in
.. snip..
> So the IO_APIC is all 0xfff.. 

My "quick glance" was wrong. The reason we are dying is b/c the call
acpi_get_override_irq() is used, which returns the polarity and trigger for the
IRQs. But that function calls mp_find_ioapics to get the 'struct ioapic' structure
- which along with the mp_irq[x] is used to figure out the default values
and the polarity/trigger overrides. Since the mp_find_ioapics now returns -1
[b/c the IOAPIC is filled with 0xffffffff], the acpi_get_override_irq() stops
trying to lookup in the mp_irq[x] the proper INT_SRV_OVR and we can't install the
SCI interrupt.

Furthermore, we end up using that function in a loop to setup the sixteen legacy
interrupts:

    /* Pre-allocate legacy irqs */
480         for (irq = 0; irq < NR_IRQS_LEGACY; irq++) {
481                 int trigger, polarity;
482 
483                 if (acpi_get_override_irq(irq, &trigger, &polarity) == -1)
484                         continue;
485 
486                 xen_register_pirq(irq, -1 /* no GSI override */,
487                         trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE,
488                         true /* Map GSI to PIRQ */);


and since we get -1, we never end up setting any interrupts. 

Also the nr_ioapics ends up being zero, so we think we are running in legacy mode with XT-PIC
interrupts (ugh). By the time ACPI kicks in and wants to install the SCI interrupt
we blow up since we cannot get the proper irq_desc.

The interesting thing is that the same issue could be reproduced on baremetal
if the first IOAPIC had 0xfff all over it - and the call to acpi_get_override_irq()
done by the ACPI layer to setup the SCI would have triggered a similar failure.
But the baremetal failing case has the first IOAPIC with the INT_SRV_OVR with
valid values - it is just the second IOAPIC is busted.

Or if the second IOAPIC was busted and there was an INT_SRV_OVR for the second
APIC to handle the SCI.

I think there are three ways of fixing this:

 1). Revert Suresh's patch and look at just removing the "Unable to reset IRR" warning
    (perhaps by being conditional on running in kexec-env?).

 2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
     clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
     like that - that stinks of a hack).

 3). Rework Suresh's patch - to only remove the IOAPIC entry if there is no
     INT_SRV_OVR that depend on it. I made a stab at it and here is draft patch, that
     looks to work on my boxes that have more than one IOAPIC and are booting under Xen:
     But I am not 100% confident about it so would appreciate somebody looking at it.

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 690d1cc..d92be91 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -170,6 +170,7 @@ extern u32 gsi_top;
 int mp_find_ioapic(u32 gsi);
 int mp_find_ioapic_pin(int ioapic, u32 gsi);
 void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
+void __init mp_erase_defective_ioapics(void);
 extern void __init pre_init_apic_IRQ0(void);
 
 extern void mp_save_irq(struct mpc_intsrc *m);
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 9c7d95f..e886853 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -103,6 +103,7 @@ extern void mp_config_acpi_legacy_irqs(void);
 struct device;
 extern int mp_register_gsi(struct device *dev, u32 gsi, int edge_level,
 				 int active_high_low);
+extern void mp_erase_defective_ioapics(void);
 #endif /* CONFIG_ACPI */
 
 #define PHYSID_ARRAY_SIZE	BITS_TO_LONGS(MAX_LOCAL_APIC)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index c3a5b95..b5115e7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1193,6 +1193,10 @@ static int __init acpi_parse_madt_ioapic_entries(void)
 	}
 
 	/*
+	 * Cleanup up invalid IOAPICs.
+	 */
+	mp_erase_defective_ioapics();
+	/*
 	 * If BIOS did not supply an INT_SRC_OVR for the SCI
 	 * pretend we got one so we can set the SCI flags.
 	 */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 6d10a66..77f1e84 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3989,14 +3989,43 @@ static __init int bad_ioapic_register(int idx)
 	reg_02.raw = io_apic_read(idx, 2);
 
 	if (reg_00.raw == -1 && reg_01.raw == -1 && reg_02.raw == -1) {
-		pr_warn("I/O APIC 0x%x registers return all ones, skipping!\n",
-			mpc_ioapic_addr(idx));
 		return 1;
 	}
 
 	return 0;
 }
+bool __init no_gsi_override_for_ioapic(int idx)
+{
+	unsigned int gsi, i;
+	struct mp_ioapic_gsi *gsi_cfg;
+
+	gsi_cfg = mp_ioapic_gsi_routing(idx);
+	for (gsi = gsi_cfg->gsi_base; gsi < gsi_cfg->gsi_end; gsi++) {
+		unsigned int pin = mp_find_ioapic_pin(idx, gsi);
+		for (i = 0; i < mp_irq_entries; i++) {
+			if (mp_irqs[i].dstirq == pin &&
+				mp_irqs[i].dstapic == mpc_ioapic_id(idx))
+				return false;
+		}
+	}
+	return true;
+}
+void __init mp_erase_defective_ioapics(void)
+{
+	unsigned int idx = 0;
 
+	while (idx < nr_ioapics) {
+		if (bad_ioapic_register(idx) && no_gsi_override_for_ioapic(idx)) {
+			pr_warn("I/O APIC 0x%x registers return all ones, skipping!\n",
+				mpc_ioapic_addr(idx));
+			clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+			memmove(&ioapics[idx], &(ioapics[idx+1]),
+				sizeof(struct ioapic) * (nr_ioapics - 1 - idx));
+			--nr_ioapics;
+		} else
+			idx++;
+	}
+}
 void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
 {
 	int idx = 0;
@@ -4014,11 +4043,6 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
 
 	set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
 
-	if (bad_ioapic_register(idx)) {
-		clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
-		return;
-	}
-
 	ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
 	ioapics[idx].mp_config.apicver = io_apic_get_version(idx);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ