[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.0802211057440.4528-100000@iolanthe.rowland.org>
Date: Thu, 21 Feb 2008 11:15:20 -0500 (EST)
From: Alan Stern <stern@...land.harvard.edu>
To: David Brownell <david-b@...bell.net>
cc: Andre Tomt <andre@...t.net>,
Kernel development list <linux-kernel@...r.kernel.org>,
USB list <linux-usb@...r.kernel.org>
Subject: Re: USB OOPS 2.6.25-rc2-git1
On Wed, 20 Feb 2008, David Brownell wrote:
> > CPU 0 CPU 1
> > ----- -----
> > Watchdog timer expires
> > Timer routine acquires spinlock
> > IAA IRQ arrives
> > ehci_irq tries to acquire
> > spinlock...
The following comment refers to the "Timer routine either sets" below,
right?
> There's another condition here, and
> another action. The condition is
> that ehci->reclaim must first be set;
> the action is to clear STS_IAA (and,
> given the previous patch, maybe IAAD).
>
> And this "either" is more concisely
> written as "call end_unlink_async()"
> (point made just for clarity).
Correct on both counts. I had forgotten that the watchdog routine
clears STS_IAA.
> > Timer routine either sets
> > ehci->reclaim to NULL
> > or else starts a new
> > IAA cycle
> > Timer routine releases spinlock
> > and returns
> > ehci_irq acquires spinlock
> > and sees IAA is set
>
> Can only happen if a new IAA
> cycle was started by CPU0, and
> the IAA condition triggered
> that quickly.
>
> > Call end_unlink_async()!
Okay, so this isn't as bad as it seemed. I don't have a copy of your
most recent patch, but it seems clear that the watchdog routine must:
First remove the circumstances that would cause the controller
to set IAA. I guess that means clearing IAAD; it's not
entirely clear from the spec whether this will do what we
want.
Then clear IAA (if it happens to be set).
This is the only way to avoid the race, and I know that my original
version of the routine does these steps in the wrong order (if at all).
That should be fixed. Given sufficiently bizarre hardware we can't be
certain that things won't still go wrong on occasion, but this is the
best we can do for now -- weird hardware can be handled as it arises.
The other change to make (which you have already anticipated) is to
guard against ehci->reclaim == NULL in end_unlink_async(). There's no
real need for a warning or stack dump; it should just return silently
when this happens. If there is a warning, maybe it should be placed at
the site of the caller (for example, in ehci_irq() when STS_IAA is
detected).
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists