lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 25 Jan 2013 23:54:06 +0200
From:	Artemy Lebedev <artyom.lebedev@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: PROBLEM: Permanent kernel panic in USB hub driver - 3.5.0-22

Hi,
I'm developing USB device with software implementation of USB protocol 
stack (even link level). It is not yet fully completed so it behaves 
wrong from host point of view. However now it is a good stress testing 
tool for the OS which discovered a couple of problems. I understand that 
the use case is quite unusual but still the problem is obvious. I have a 
permanent scenario for making a kernel panic.

So the scenario is quite simple - somewhere in the middle of USB control 
transaction the device stops to answer host requests (just silence) or 
in another scenario the device continuosly responds with NAKs to IN 
tokens. I have attached the file which contains some system information 
as well as description of the signaling on USB bus. There is an 
oscillogram capture in CSV format (volts on D- and D+ lines) changed in 
time (signal.csv.zip). For your convenience, I have parsed it by my 
analyzing script, see its output in signal_analysis.txt. Here it is seen 
that host sends "Get device descriptor" request, the device acknowledges 
the DATA0 packet by ACK in frame 3 and responds by NAK in frame 5 for IN 
token of status stage in frame 4. All the following IN tokens are 
ignored and host retries the whole transaction from the beginning while 
the device still keep silence.

This behavior leads to two bad effects on host. The first one is the 
following message in dmesg:

Jan 23 20:39:56 AST-mobile kernel: [ 355.978595] usb 3-2: new low-speed 
USB device number 8 using xhci_hcd
Jan 23 20:40:01 AST-mobile kernel: [ 360.977493] xhci_hcd 0000:00:14.0: 
Timeout while waiting for address device command
Jan 23 20:40:01 AST-mobile kernel: [ 361.181444] ------------[ cut here 
]------------
Jan 23 20:40:01 AST-mobile kernel: [ 361.181461] WARNING: at 
/build/buildd/linux-3.5.0/drivers/usb/host/xhci.c:3646 
xhci_address_device+0x2db/0x300()
Jan 23 20:40:01 AST-mobile kernel: [ 361.181464] Hardware name: 24382LU
Jan 23 20:40:01 AST-mobile kernel: [ 361.181466] Modules linked in: 
btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs 
xfs reiserfs bnep rfcomm parport_pc ppdev binfmt_misc ext2 coretemp 
kvm_intel kvm ghash_clmulni_intel aesni_intel cryptd arc4 aes_x86_64 
snd_hda_codec_realtek microcode joydev uvcvideo videobuf2_core videodev 
thinkpad_acpi videobuf2_vmalloc iwlwifi lpc_ich videobuf2_memops psmouse 
mac80211 ftdi_sio serio_raw usbserial btusb snd_hda_intel snd_seq_midi 
snd_hda_codec bluetooth cfg80211 snd_hwdep snd_rawmidi snd_pcm 
snd_seq_midi_event snd_seq snd_seq_device snd_timer snd_page_alloc snd 
soundcore mei nvram tpm_tis mac_hid usbmon lp parport hid_generic usbhid 
hid i915 nouveau firewire_ohci ttm drm_kms_helper firewire_core 
crc_itu_t drm i2c_algo_bit mxm_wmi sdhci_pci sdhci e1000e wmi video
Jan 23 20:40:01 AST-mobile kernel: [ 361.181552] Pid: 48, comm: khubd 
Not tainted 3.5.0-22-generic #34-Ubuntu
Jan 23 20:40:01 AST-mobile kernel: [ 361.181555] Call Trace:
Jan 23 20:40:01 AST-mobile kernel: [ 361.181567] [<ffffffff81051c1f>] 
warn_slowpath_common+0x7f/0xc0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181572] [<ffffffff81051c7a>] 
warn_slowpath_null+0x1a/0x20
Jan 23 20:40:01 AST-mobile kernel: [ 361.181578] [<ffffffff814dab8b>] 
xhci_address_device+0x2db/0x300
Jan 23 20:40:01 AST-mobile kernel: [ 361.181585] [<ffffffff814ae599>] 
hub_port_init+0x229/0x9f0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181594] [<ffffffff8142ba5d>] ? 
pm_runtime_set_autosuspend_delay+0x5d/0x80
Jan 23 20:40:01 AST-mobile kernel: [ 361.181600] [<ffffffff8103fa39>] ? 
default_spin_lock_flags+0x9/0x10
Jan 23 20:40:01 AST-mobile kernel: [ 361.181606] [<ffffffff814b1e03>] 
hub_port_connect_change+0x283/0x9c0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181611] [<ffffffff814b2943>] 
hub_events+0x403/0x690
Jan 23 20:40:01 AST-mobile kernel: [ 361.181619] [<ffffffff8168244f>] ? 
__schedule+0x3cf/0x7c0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181624] [<ffffffff814b2c05>] 
hub_thread+0x35/0x1a0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181632] [<ffffffff810768b0>] ? 
finish_wait+0x80/0x80
Jan 23 20:40:01 AST-mobile kernel: [ 361.181637] [<ffffffff814b2bd0>] ? 
hub_events+0x690/0x690
Jan 23 20:40:01 AST-mobile kernel: [ 361.181642] [<ffffffff81075f13>] 
kthread+0x93/0xa0
Jan 23 20:40:01 AST-mobile kernel: [ 361.181648] [<ffffffff8168d024>] 
kernel_thread_helper+0x4/0x10
Jan 23 20:40:01 AST-mobile kernel: [ 361.181653] [<ffffffff81075e80>] ? 
kthread_freezable_should_stop+0x70/0x70
Jan 23 20:40:01 AST-mobile kernel: [ 361.181657] [<ffffffff8168d020>] ? 
gs_change+0x13/0x13
Jan 23 20:40:01 AST-mobile kernel: [ 361.181660] ---[ end trace 
f87cad0b99ce3880 ]---
Jan 23 20:40:01 AST-mobile kernel: [ 361.181665] xhci_hcd 0000:00:14.0: 
Virt dev invalid for slot_id 0x7!
Jan 23 20:40:01 AST-mobile kernel: [ 361.385433] usb 3-2: device not 
accepting address 8, error -22
Jan 23 20:40:01 AST-mobile kernel: [ 361.385456] xHCI xhci_free_dev 
called with unaddressed device
Jan 23 20:40:06 AST-mobile kernel: [ 366.384344] xhci_hcd 0000:00:14.0: 
Timeout while waiting for a slot
Jan 23 20:40:06 AST-mobile kernel: [ 366.384375] hub 3-0:1.0: couldn't 
allocate port 2 usb_device

I did not pay attention to kernel warnings about timeouts since they are 
expected, but the backtrace indicates that something bad occurred in the 
kernel (LKM crash?). After that whole the bus becomes unusable. Even if 
the normal device is plugged into the port, it is not handled and the 
following messages continuously repeating:
Jan 23 20:40:06 AST-mobile kernel: [ 366.384344] xhci_hcd 0000:00:14.0: 
Timeout while waiting for a slot
Jan 23 20:40:06 AST-mobile kernel: [ 366.384375] hub 3-0:1.0: couldn't 
allocate port 2 usb_device
This ends only after the bus controller is powered off and on (e.g. 
sleep the system and wake up again).

The second problem is even more interesting, it's a kernel panic. 
Unfortunately I did find how to get the panic messages from the kernel 
persistent ring buffer (if there is such) after reboot. So I have taken 
a photo, available here:
https://dl.dropbox.com/u/8276110/2013-01-25%2022.49.16.jpg
And, of course, the sweets - two kernel core dumps (possibly the same, 
but just for a case created them twice):
https://dl.dropbox.com/u/8276110/vmcore.gz
https://dl.dropbox.com/u/8276110/vmcore2.gz

So, that's all I have. See attached file and cores. Also if there will 
be a patch I can test it with my setup (required firmware version of my 
device is specially tagged as "Linux exploit" in my repository :).

P.S. Sorry, probably I have violated all rules of writing to mail list 
and bug reporting but I did not managed to find clear instructions how 
it should be done. Bugzilla seems to be only for 2.6 version. Where to 
put big attachments (like core dumps) is also unclear. So this procedure 
for Linux kernel is toŠ¾ unobvious for ordinary system developer.

Best regards,
Artyom.

Download attachment "report.tgz" of type "application/x-compressed-tar" (85183 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ