linux-kernel - Re: USB OOPS 2.6.25-rc2-git1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200802201356.28723.david-b@pacbell.net>
Date:	Wed, 20 Feb 2008 13:56:28 -0800
From:	David Brownell <david-b@...bell.net>
To:	Alan Stern <stern@...land.harvard.edu>
Cc:	Andre Tomt <andre@...t.net>,
	Kernel development list <linux-kernel@...r.kernel.org>,
	USB list <linux-usb@...r.kernel.org>
Subject: Re: USB OOPS 2.6.25-rc2-git1

On Wednesday 20 February 2008, Alan Stern wrote:
> > ehci_hcd 0000:00:1d.7: IAA watchdog, lost IAA: status 8029 cmd 10021
> 
> lines in the log brings up some ideas that have been percolating in my 
> mind for a while.  They have to do with the possibility of a race 
> between the watchdog routine and assertion of IAA.

The curious bit IMO being STS_INT (0001), which should also have
triggered an IRQ.  Suggesting to me that the race might be lower
level than that ... at the level of a conflict between the various
mechanisms to ack irqs.

See the appended patch (Andre, this is the additional one I meant)
for a tweak at that level.


> In fact, if the timing comes out just wrong then it's possible (on SMP
> systems) for an IAA interrupt to arrive when the watchdog
> routine has already started running.  Then end_unlink_async() might get 
> called right at the start of a new IAA cycle, or when the reclaim list 
> is empty.

The driver's spinlock should prevent that particular problem from
appearing.

- Dave


========= CUT HERE
Modify EHCI irq handling on the theory that at least some of the
"lost" IRQs are caused by goofage between multiple lowlevel IRQ
acking mechanisms:  try rescanning before we exit the handler, in
case the EHCI-internal ack (by clearing the irq status) doesn't
always suffice for IRQs triggered nearly back-to-back.

---
 drivers/usb/host/ehci-hcd.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- g26.orig/drivers/usb/host/ehci-hcd.c	2008-02-20 13:26:00.000000000 -0800
+++ g26/drivers/usb/host/ehci-hcd.c	2008-02-20 13:54:37.000000000 -0800
@@ -638,6 +638,8 @@ static irqreturn_t ehci_irq (struct usb_
 		return IRQ_NONE;
 	}
 
+retrigger:
+
 	/* clear (just) interrupts */
 	ehci_writel(ehci, status, &ehci->regs->status);
 	cmd = ehci_readl(ehci, &ehci->regs->command);
@@ -725,6 +727,12 @@ dead:
 
 	if (bh)
 		ehci_work (ehci);
+
+	status = ehci_readl(ehci, &ehci->regs->status);
+	status &= INTR_MASK;
+	if (status)
+		goto retrigger;
+
 	spin_unlock (&ehci->lock);
 	if (pcd_status & STS_PCD)
 		usb_hcd_poll_rh_status(hcd);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/