lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4FA118BE.5040000@fold.natur.cuni.cz>
Date:	Wed, 02 May 2012 13:21:34 +0200
From:	Martin Mokrejs <mmokrejs@...d.natur.cuni.cz>
To:	Yinghai Lu <yinghai@...nel.org>
CC:	linux-pci@...r.kernel.org,
	Sarah Sharp <sarah.a.sharp@...ux.intel.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: linux-3.4-rc5: Oops in klist_del()



Yinghai Lu wrote:
> On Mon, Apr 30, 2012 at 2:58 PM, Martin Mokrejs
> <mmokrejs@...d.natur.cuni.cz> wrote:
>> Martin Mokrejs wrote:
>>> Hi,
>>>   I was removing an ExpressCard with a NEC chipset used by xhci_hcd and the kernel crashed
>>> on my Dell Vostro 3550 laptop. Its express slot chipset somehow does not report properly
>>> that a card was unplugged, as discussed earlier here on linux-pci under Subject
>>> "Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe".
>>>
>>>   Maybe this helps somebody to find the cause? Here is a bit of the attached Oops snapshot
>>> stacktrace:
>>>
>>> klist_del
>>> fdevice_del
>>> device_unregister
>>> pci_stop_bus_device
>>> pci_stop_and_remove_bus_device
>>> devehp_unconfigure_device
>>> pciehp_disable_slot
>>> ? pciehp_disable_slot
>>> pciehp_power_thread
>>> process_one_work
>>> worker_thriad
>>> ? manage_workers.clone
>>> ? manage_workers.clone
>>> kthrerd
>>> ...
>>
>> This is 100% reproducible. It happens with 3.4-rc5 and 3.3.4 kernels while not with 3.2.14.
>> I did not have the external enclosure earlier but repeated unplugs/re-plugs of the NEC-based
>> expresscard were/are just not enough to trigger the Oops. One needs the ASMedia2105 in addition.
>>
>> To hit the Ooops, the disk has to be *RE-detected* and unplugged (so not its very first plugin).
>> Express card removal after the very first plugin says usually something about the surprise
>> removal. When I re-plug the disk, it gets detected, and it removal hits the Ooops. I am attaching
>> dmesg from 3.2.14 kernel which does not have the problem. You can see how many times I did remove
>> and re-plug the card. If I waited long ebough for xhci_hcd to timeout waiting for the unplugged
>> device, it says
>>
>> xhci_hcd 0000:11:00.0: Timeout while waiting for configure endpoint command
>> xHCI xhci_drop_endpoint called for root hub
>> xHCI xhci_check_bandwidth called for root hub
>> xhci_hcd 0000:11:00.0: USB bus 5 deregistered
>> xhci_hcd 0000:11:00.0: PCI INT A disabled
>>
>> and then, subsequent re-plug of the card with the disk attached results in detected SCSI device.
>> In contrast, if I re-plug the express card earlier express card slot change is reported but the
>> disc is not detected (something is blocked in xhci_hcd I think).
>>
>> The above stacktrace (of actually the camera snapshot is not the first stacktrace, unfortunately).
>> I see there are 2-3 screen scrolled away. Hmm, but I do not have anything what could be used as
>> a remote console ...
> 
> So if you use "echo 1 > /sys/.../0000:11.00.0/remove" before surprise
> remove the card,
> 
> the oops would not happen?

You are right, it does NOT happen. I managed to get somehow a piece of the stacktrace before the usbserial
on the affected host died:

[  508.458604] pciehp 0000:00:1c.7:pcie04: Card present on Slot(7)
[  508.558838] pciehp 0000:00:1c.7:pcie04: Device 0000:11:00.0 already exists at 0000:11:00, cannot hot-add
[  508.558880] pciehp 0000:00:1c.7:pcie04: Cannot add device at 0000:11:00
[  521.584091] pciehp 0000:00:1c.7:pcie04: Card not present on Slot(7)
[  521.584319] ------------[ cut here ]------------
[  521.584325] WARNING: at fs/sysfs/group.c:142 sysfs_remove_group+0x4f/0x83()
[  521.584326] Hardware name: Vostro 3550
[  521.584328] sysfs group ffffffff8194f980 not found for kobject '0000:11:00.0'
[  521.584330] Modules linked in: iwlwifi usb_storage i2c_i801
[  521.584335] Pid: 1589, comm: kworker/0:2 Not tainted 3.4.0-rc5-default #5
[  521.584337] Call Trace:
[  521.584343]  [<ffffffff8107af23>] warn_slowpath_common+0x80/0x98
[  521.584345]  [<ffffffff8107afcf>] warn_slowpath_fmt+0x41/0x43
[  521.584348]  [<ffffffff8115e0fd>] ? sysfs_get_dirent+0x47/0x51
[  521.584351]  [<ffffffff8115f5dc>] sysfs_remove_group+0x4f/0x83
[  521.584354]  [<ffffffff8132f36a>] dpm_sysfs_remove+0x33/0x39
[  521.584357]  [<ffffffff81328486>] device_del+0x47/0x17d
[  521.584359]  [<ffffffff813285cd>] device_unregister+0x11/0x1e
[  521.584362]  [<ffffffff81242782>] pci_stop_bus_device+0x5e/0x80
[  521.584364]  [<ffffffff8124290f>] pci_stop_and_remove_bus_device+



Also, there is a message

[  525.302615] sysfs group ffffffff8194f980 not found for kobject '0000:11:00.0'

when I was rebooting the computer. I think this might be pointing to the place of crash as well.


Once again, does not happen with 3.2.14 but does with 3.3.4 as well.

I will learn to use the echo trick always for this USB3.0 express card and and we came to that
already, it only happens when xhci_hcd was loaded and claimed the device.

In the attached traces are actually two bootups. In the former I got the Ooops.
In the second I used carefully the echo command to "unplug" the card from xhci_hcd
before physically removing it. But at the very end I few times omitted teh echo trick.
So at some point you will see different messages logged. And that is why at the very
end when system was to reboot, sysfs complained.

Thanks,
Martin

View attachment "capture.txt" of type "text/plain" (56870 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ