linux-kernel - Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4890D329.80106@shaw.ca>
Date:	Wed, 30 Jul 2008 14:46:33 -0600
From:	Robert Hancock <hancockr@...w.ca>
To:	Robert Hancock <hancockr@...w.ca>,
	Alan Stern <stern@...land.harvard.edu>,
	usb-storage@...ts.one-eyed-alien.net,
	Tomas Styblo <tripie@...n.org>, linux-usb@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption
 bugfix - device 152d:2338

Matthew Dharm wrote:
> On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
>> Alan Stern wrote:
>>> On Wed, 23 Jul 2008, Robert Hancock wrote:
>>>
>>>> It remains an issue, though, that if there's no underflow, if the device 
>>>> reports an error in the CSW but doesn't provide sense data, we assume 
>>>> nothing bad happened and don't retry. That definitely does not seem 
>>>> correct. The device is not supposed to do this, but with how crappily 
>>>> some of these devices are designed we should be more defensive.
>>> The problem is, what can you do?  The device has said that something 
>>> was wrong, but it hasn't told you what.  Without knowing what went 
>>> wrong, you can't know how to recover.
> 
> Yes and no.  If ASC/ASCQ is clear, then it's telling you that nothing is
> wrong.  The device is contradicting itself. That doesn't really help us
> here, but it's a point I like to be clear on.
> 
>>> I suppose in such cases we could simply report that the command failed
>>> completely.
>> I think that is what we need to do. The SCSI/block layers should retry 
>> the command or report a failure to userspace. Above all else we can't 
>> just continue on our merry way and assume success, otherwise data will 
>> get silently corrupted.
> 
> The code path to supress the reporting of an error when auto-sense shows no
> ASC/ASCQ was added for a reason.  That reason has likely been lost to time,
> but I worry about devices that are out there that rely on the current
> behavior to function properly....

My original comment was that that code should be removed, but this is 
incorrect. In fact that code path is unrelated to this problem since it 
only executes if no transport error was detected. This code path is 
needed since retrieving sense data is done for multiple reasons other 
than a transport failure. For one, "If we're running the CB transport, 
which is incapable of determining status on its own, we will auto-sense 
unless the operation involved a data-in transfer." In this case, for a 
successful transfer the status must be reset to good after getting the 
sense data.

In the case in question here, the BOT transport reports a failure, and 
we retrieve sense data, but the sense data doesn't indicate an error. 
This results in the failure essentially being ignored. In this case I 
think we should be doing the same thing as we do on detecting an underflow:

srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/