lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 06 Oct 2006 09:14:53 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Muli Ben-Yehuda <muli@...ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Rajesh Shah <rajesh.shah@...el.com>, Andi Kleen <ak@....de>,
	"Protasevich, Natalie" <Natalie.Protasevich@...SYS.com>,
	"Luck, Tony" <tony.luck@...el.com>, Andrew Morton <akpm@...l.org>,
	Linus Torvalds <torvalds@...l.org>,
	Linux-Kernel <linux-kernel@...r.kernel.org>,
	Badari Pulavarty <pbadari@...il.com>
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Muli Ben-Yehuda <muli@...ibm.com> writes:

> My x366 no longer boots with 2.6.19-rc1. The boot either hangs in
> uhci_hcd_init or dies with 'do_IRQ: cannot handle IRQ -1". Bisection
> says this one is bad:

Ok.  So at least the second case is because some irq is being delivered
to a cpu that was not expecting it.

The hang case is weird because the kernel does not get told about
the irqs on your second ioapic.

When it gets the 'do_IRQ: cannot handle IRQ -1' how long
has the system been in user space?  (It doesn't look like
init got started but that is hard to tell, shutting off irqbalanced
for testing purposes would be interesting)

Seeing the failure case is really weird because this early in boot
everything should be routed to cpu 0.

What happens if you boot with max_cpus=1?

The change the patch introduced was that we are now always
pointing irqs towards individual cpus, and not accepting an irq
if it comes into the wrong cpu.  

The only hypothesis I have so far is that there may be an issue
with the x366 chipset ioapics that this patch reveals.

I would suspect a wider issue but in several months of testing
this is the first bug report I have seen.

If simple tests don't reveal what is going on then we will
have to instrument up that BUG and print out the per
cpu vector to irq tables, the cpu number, and the vector
the unexpected irq came in on.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ