linux-kernel - Re: [PATCH] ata: libata-scsi: fix bogus SCSI sense after abort

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <187b1b46-470c-8fe2-9969-051abf93199b@suse.de>
Date:   Mon, 26 Jun 2023 09:46:19 +0200
From:   Hannes Reinecke <hare@...e.de>
To:     Damien Le Moal <dlemoal@...nel.org>, Lorenz Brun <lorenz@...n.one>
Cc:     linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ata: libata-scsi: fix bogus SCSI sense after abort

On 6/26/23 09:29, Damien Le Moal wrote:
> On 6/24/23 03:19, Lorenz Brun wrote:
>> Since commit 058e55e120ca which fixed that commands without valid
>> error/status codes did not result in any sense error, the returned sense
>> errors were completely bogus as ata_to_sense_error did not have valid
>> inputs in the first place.
>>
>> For example the following ATA error
>>
>> exception Emask 0x10 SAct 0x20c000 SErr 0x280100 action 0x6 frozen
>> irq_stat 0x08000000, interface fatal error
>> SError: { UnrecovData 10B8B BadCRC }
>> failed command: READ FPDMA QUEUED
>> cmd 60/e0:70:20:0a:00/00:00:00:00:00/40 tag 14 ncq dma 114688 in
>> res 40/00:ac:20:5e:50/00:00:5d:01:00/40 Emask 0x10 (ATA bus error)
>> status: { DRDY }
>>
>> got turned into the following nonsensical SCSI error
>>
>> FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>> Sense Key : Illegal Request [current]
>> Add. Sense: Unaligned write command
>> CDB: Read(16) 88 00 00 00 00 00 00 00 0a 20 00 00 00 e0 00 00
>>
>> This has nothing to do with an unaligned write command, but is due to an
>> ATA EH-triggered abort. But ata_to_sense_error only knows about
>> status and error, both of which aren't even valid here as the command
>> has been aborted.
>>
>> Add an additional section to ata_gen_ata_sense which handles
>> errors not coming from the device first, before calling into
>> ata_to_sense_error.
>>
>> According to the SAT-5 spec a reset should cause a Unit Attention event,
>> which the SCSI subsystem should handle to retry its commands but I
>> am not sure how much of that infra is present in Linux's SCSI layer, so
>> this is a simpler solution.
>>
>> Signed-off-by: Lorenz Brun <lorenz@...n.one>
>> ---
>>   drivers/ata/libata-scsi.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 551077cea4e4..61c6a4e8123a 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -13,6 +13,7 @@
>>    *  - http://www.t13.org/
>>    */
>>   
>> +#include "scsi/scsi_proto.h"
>>   #include <linux/compat.h>
>>   #include <linux/slab.h>
>>   #include <linux/kernel.h>
>> @@ -1013,6 +1014,21 @@ static void ata_gen_ata_sense(struct ata_queued_cmd *qc)
>>   		ata_scsi_set_sense(dev, cmd, NOT_READY, 0x04, 0x21);
>>   		return;
>>   	}
>> +	if (qc->err_mask & (AC_ERR_HSM | AC_ERR_ATA_BUS | AC_ERR_HOST_BUS |
>> +		AC_ERR_SYSTEM | AC_ERR_OTHER)) {
> 
> Did you check SATA IO specs and/or AHCI to see if that says anything about these
> ? And I wonder if we should check if we have something in tf->status and
> tf->error...
> 
We really should. The above combination of error masks seems pretty 
arbitrary, as actually you do _not_ want to check for there error mask, 
but rather for the fact that the sense code is bogus.
So shouldn't we rather test for that one directly?

>> +		/* Command aborted because of some issue with the ATA subsystem
>> +		 * Should technically cause unit attention, but this is better
>> +		 * than nothing, which results in nonsensical errors.
>> +		 * POWER ON, RESET, OR BUS DEVICE RESET OCCURRED
>> +		 */
> 
> Multi-line comment style: start with a "/*" line please. The phrasing of the
> comment is not very clear. Maybe something like:
> 
> 		/*
> 		 * If the command aborted because of some issue with the
> 		 * adapter or link, report a POWER ON, RESET, OR BUS DEVICE
> 		 * RESET OCCURRED error.
> 		 */
> 
> Did you check that all of the above error flags lead to a drive reset ? The
> issue I have with this is that the drive reset is triggered by libata EH after
> it got these bad errors but the sense data you use here normally indicate that
> the reset was initiated by the adapter or the drive. Not sure this is ideal.
> 
Yes. We should rather return a DID_RESET status instead of any made up 
sense code for which we don't have a good justification, and which might 
get invalidated by future SATL versions.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@...e.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman