linux-kernel - Re: Null Pointer BUG in uhci

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.44L0.0907090950590.5823-100000@iolanthe.rowland.org>
Date:	Thu, 9 Jul 2009 10:18:56 -0400 (EDT)
From:	Alan Stern <stern@...land.harvard.edu>
To:	"Michael S. Zick" <lkml@...ethan.org>
cc:	Oliver Neukum <oliver@...kum.org>, Jiri Kosina <jkosina@...e.cz>,
	<linux-kernel@...r.kernel.org>, <linux-usb@...r.kernel.org>
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed, 8 Jul 2009, Michael S. Zick wrote:

> It is unlikely that VIA Tech. will recall the CX700 chipset.
> 
> So being able to take a device off-line (like the driver claims it is doing)
> and *leave* it off-line - until told to "try again" - that would be an
> improvement.

Sorry, you lost me there.  In all the logs you have posted, I can find 
only one line where the kernel claims to be taking a device offline:

> Jun 30 10:38:31 cb01 kernel: sd 2:0:0:0: Device offlined - not ready after error recovery

And in that case it _did_ leave the device offline.  So what are you
concerned about?

> The current process of filling up the /var/log directory until the machine
> chokes is a rather fragile sort of response to a hot-plugged device, good or bad.

It isn't a response to a hot-plugged device; it's a response to broken
hardware.  If your hardware was working properly you could hot-plug
and hot-unplug devices 'till you turned blue in the face, without
filling up the /var/log directory.

> > > > I suspect it's worse than a simple interrupt-routing mistake.
> > > > 
> > > 
> > > I would not object to your removing that one mistake - that is one less
> > > to contend with.
> > 
> > I didn't say there was an interrupt-routing mistake; I said it was 
> > _worse_ than an interrupt-routing mistake.
> > 
> 
> Never claimed you did - the driver made that claim.
> But still, it would be nice to get rid of the interrupt-routing mistake.

How can you get rid of an interrupt-routine mistake if there is no such 
mistake in the first place?

Not that I'm claiming there is no such mistake -- the logs you have 
provided aren't clear in this respect.  So that's the first issue to 
address: Determine whether the interrupts are or aren't being routed 
correctly.

To that end, you should try doing some more directed testing.

Start with a nice cold boot, with no USB devices plugged in.  Copy the 
dmesg log and clear the kernel's log buffer.  And just to get as much 
information as possible, start a process copying usbmon's 0u file 
(you'll have to enable CONFIG_USB_MON if it isn't already enabled).

Then plug in a high-speed device.  When everything settles down, copy
the dmesg buffer again and also get a copy of the
/sys/kernel/debug/ehci/0000:00:10.4/registers file.  Those, together 
with the usbmon trace, will provide a good starting point.

Assuming something goes wrong, of course.  If everything works okay, 
you'll have to keep trying similar experiments (plugging and unplugging 
devices) until something breaks.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/