linux-kernel - Re: [REGRESSION] 2.6.24/25: random lockups when accessing external USB harddrive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48641325.2020903@nokia.com>
Date:	Fri, 27 Jun 2008 01:07:33 +0300
From:	Stefan Becker <Stefan.Becker@...ia.com>
To:	ext Alan Stern <stern@...land.harvard.edu>,
	linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org
Subject: Re: [REGRESSION] 2.6.24/25: random lockups when accessing external
 USB harddrive

Hi,

ext Alan Stern wrote:
> 
> Do you have any of the RT patches installed?

No.

> Something else you should try is clearing your "owner" string just
> before the spinlock is released.  You could also add a check after the
> release; if the spinlock can't be locked again immediately then
> something is wrong.

Yes, the initial try was misleading. I tinkered around a little bit more 
and finally figured out that it is usb_hcd_unlink_urb_from_ep() itself 
that is called with interrupts enabled!


So with this code in place the error disappears:

void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb)
{
	/* clear all state linking urb to this dev (and hcd) */
	unsigned int flags;
	spin_lock_irqsave(&hcd_urb_list_lock, flags);
	list_del_init(&urb->urb_list);
	spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
}

This seems to impact USB performance though. In 2.6.23 (without the 
problem) I get 21MB/s with dd, but with the above "fix" only 14MB/s. But 
  I'll recheck once we have a real error fix in place.


After that I added the following code

         if (!raw_irqs_disabled()) {
	  printk(KERN_CRIT "usb_hcd_unlink_urb_from_ep called with interrupts 
enabled!\n");
	  dump_stack();
	}

and collected the attached kernel messages. I checked the messages 
briefly and it seems that the following code paths have the interrupts 
enabled when calling usb_hcd_unlink_urb_from_ep():

   [<c0574d9d>] usb_hcd_unlink_urb_from_ep+0x25/0x6b
   [<de850559>] uhci_giveback_urb+0xcd/0x1e3 [uhci_hcd]
   [<de850e02>] uhci_scan_schedule+0x511/0x720 [uhci_hcd]
...
   [<de8529c3>] uhci_irq+0x131/0x142 [uhci_hcd]
   [<c05750cb>] usb_hcd_irq+0x23/0x51

and

   [<c0574d9d>] usb_hcd_unlink_urb_from_ep+0x25/0x6b
   [<de839d55>] ehci_urb_done+0x73/0x92 [ehci_hcd]
   [<de83a92f>] qh_completions+0x373/0x3eb [ehci_hcd]
   [<de83aa43>] ehci_work+0x9c/0x6a9 [ehci_hcd]
...
   [<de83ec3c>] ehci_irq+0x241/0x265 [ehci_hcd]
...
   [<c05750cb>] usb_hcd_irq+0x23/0x51


Is that enough information to fix the problem?

Regards,

	Stefan

---
Stefan Becker
E-Mail: Stefan.Becker@...ia.com

View attachment "hcd.c.debug-patch" of type "text/plain" (3100 bytes)

Download attachment "dump_stack.txt.bz2" of type "application/x-bzip" (26275 bytes)