linux-kernel - [usb-storage] SCSI errors hang user processes indefinitely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <484E5F27.6010803@student.cs.york.ac.uk>
Date:	Tue, 10 Jun 2008 12:01:59 +0100
From:	Alan Jenkins <aj504@...dent.cs.york.ac.uk>
To:	linux-kernel@...r.kernel.org
CC:	usb-storage@...ts.one-eyed-alien.net, alan-jenkins@...fmail.co.uk,
	linux-scsi@...r.kernel.org
Subject: [usb-storage] SCSI errors hang user processes indefinitely

Was: "bug report: regression - USB card reader doesn't work"
<https://lists.one-eyed-alien.net/pipermail/usb-storage/2008-June/003734.html>.

I have a buggy USB card reader which responds with "Unrecoverable read
error" for particular reads (and probably writes).  The error is
triggered immediately when I insert the device and udev runs "vol_id" on
it.  The usb-storage people have been quick to figure out a workaround,
but there is also a more general problem.

The kernel's response to this unrecoverable read error is to hang the
vol_id process.  "strace" shows that vol_id is hung on sys_read() for at
least 10 minutes.  It continues to hang even when I unplug the card
reader.  This ties up the device node (/dev/sdb) so that if I re-insert
the card reader, it uses a different device node.  It also stops me
hibernating the computer because the vol_id process is "refusing to
freeze".  Nor can the hung vol_id be killed; it's stuck in D state.  I
have reproduced all of this on linux 2.6.25.3.

I sent Alan Stern a copy of the kernel messages with
CONFIG_USB_STORAGE_DEBUG.  His conclusion was that the hang was not in
usb-storage but elsewhere.  I had also sent stack traces automatically
output after a failed hibernation which implicated a filesystem in the
hang (!).

There's a lot going on here.  Can anyone help pin down specific
problems?  I think the error handling is primarily the responsibility of
the SCSI generic layer, but I don't have any insight into what it is
supposed do.  My instinct says that SCSI error handling should be mature
and the issue is in some way specific to usb-storage, but that's just a
feeling.

As a starting point, I've attached dmesg output showing two sets of
stack traces (from Alt-SysRq-T) taken before and after the device was
removed.

Thanks!
Alan

View attachment "dmesg.txt" of type "text/plain" (124864 bytes)