lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 23 Jun 2007 18:45:05 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	"Darrick J. Wong" <djwong@...ibm.com>,
	linux-kernel@...r.kernel.org, ak@...e.de
Subject: Re: Device hang when offlining a CPU due to IRQ misrouting

Andrew Morton <akpm@...ux-foundation.org> writes:

> On Sun, 24 Jun 2007 01:54:52 +0200 "Rafael J. Wysocki" <rjw@...k.pl> wrote:
>
>> On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote:
>> > On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote:
>> > > 
>> > > This fixes the problem!  Hurrah!
>> > 
>> > Great!  Andrew, please include the appended patch in -mm.
>> > 
>> > ----
>> > Subject: [patch] x86_64, irq: use mask/unmask and proper locking in
> fixup_irqs
>> > From: Suresh Siddha <suresh.b.siddha@...el.com>
>> > 
>> > Force irq migration path during cpu offline, is not using proper
>> > locks and irq_chip mask/unmask routines. This will result in
>> > some races(especially the device generating the interrupt can see
>> > some inconsistent state, resulting in issues like stuck irq,..).
>> > 
>> > Appended patch fixes the issue by taking proper lock and
>> > encapsulating irq_chip set_affinity() with a mask() before and an
>> > unmask() after.
>> > 
>> > This fixes a MSI irq stuck issue reported by Darrick Wong.
>> > 
>> > There are several more general bugs in this area(irq migration in the
>> > process context). For example,
>> > 
>> > 1. Possibility of missing edge triggered irq.
>> > 2. Reliable method of migrating level triggered irq in the process context.
>> > 
>> > We plan to look and close these in the near future.
>> 
>> This patch breaks hibernation on my Turion 64 X2 - based testbox (HPC nx6325).
>> 
>> _cpu_down() just hangs as though there were a deadlock in there, 100% of the
>> time.
>> 
>
> Thanks, I dropped it.

Hmm.  It looks like Siddha sent the wrong version of the patch.
The working tested version had an additional test to ensure
the mask and unmask methods were implemented.

i.e.
+		if (irq_desc[irq].chip->mask)
+			irq_desc[irq].chip->mask(irq);
and

+		if (irq_desc[irq].chip->unmask)
+			irq_desc[irq].chip->unmask(irq);
+

Siddha think you can resend the correct version.

Rafael.  Think you can add those two ifs and see if you test bed box
works?

I'm still not convinced that we can make fixup_irqs work in general
but if we aren't going to yank it we should at least make it
consistent with the rest of the code.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists