lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b44168a-195d-79bd-50cc-bc8baa0d8f16@kernel.org>
Date:   Mon, 26 Jun 2023 16:29:19 +0900
From:   Damien Le Moal <dlemoal@...nel.org>
To:     Lorenz Brun <lorenz@...n.one>
Cc:     linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ata: libata-scsi: fix bogus SCSI sense after abort

On 6/24/23 03:19, Lorenz Brun wrote:
> Since commit 058e55e120ca which fixed that commands without valid
> error/status codes did not result in any sense error, the returned sense
> errors were completely bogus as ata_to_sense_error did not have valid
> inputs in the first place.
> 
> For example the following ATA error
> 
> exception Emask 0x10 SAct 0x20c000 SErr 0x280100 action 0x6 frozen
> irq_stat 0x08000000, interface fatal error
> SError: { UnrecovData 10B8B BadCRC }
> failed command: READ FPDMA QUEUED
> cmd 60/e0:70:20:0a:00/00:00:00:00:00/40 tag 14 ncq dma 114688 in
> res 40/00:ac:20:5e:50/00:00:5d:01:00/40 Emask 0x10 (ATA bus error)
> status: { DRDY }
> 
> got turned into the following nonsensical SCSI error
> 
> FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> Sense Key : Illegal Request [current]
> Add. Sense: Unaligned write command
> CDB: Read(16) 88 00 00 00 00 00 00 00 0a 20 00 00 00 e0 00 00
> 
> This has nothing to do with an unaligned write command, but is due to an
> ATA EH-triggered abort. But ata_to_sense_error only knows about
> status and error, both of which aren't even valid here as the command
> has been aborted.
> 
> Add an additional section to ata_gen_ata_sense which handles
> errors not coming from the device first, before calling into
> ata_to_sense_error.
> 
> According to the SAT-5 spec a reset should cause a Unit Attention event,
> which the SCSI subsystem should handle to retry its commands but I
> am not sure how much of that infra is present in Linux's SCSI layer, so
> this is a simpler solution.
> 
> Signed-off-by: Lorenz Brun <lorenz@...n.one>
> ---
>  drivers/ata/libata-scsi.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 551077cea4e4..61c6a4e8123a 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -13,6 +13,7 @@
>   *  - http://www.t13.org/
>   */
>  
> +#include "scsi/scsi_proto.h"
>  #include <linux/compat.h>
>  #include <linux/slab.h>
>  #include <linux/kernel.h>
> @@ -1013,6 +1014,21 @@ static void ata_gen_ata_sense(struct ata_queued_cmd *qc)
>  		ata_scsi_set_sense(dev, cmd, NOT_READY, 0x04, 0x21);
>  		return;
>  	}
> +	if (qc->err_mask & (AC_ERR_HSM | AC_ERR_ATA_BUS | AC_ERR_HOST_BUS |
> +		AC_ERR_SYSTEM | AC_ERR_OTHER)) {

Did you check SATA IO specs and/or AHCI to see if that says anything about these
? And I wonder if we should check if we have something in tf->status and
tf->error...

> +		/* Command aborted because of some issue with the ATA subsystem
> +		 * Should technically cause unit attention, but this is better
> +		 * than nothing, which results in nonsensical errors.
> +		 * POWER ON, RESET, OR BUS DEVICE RESET OCCURRED
> +		 */

Multi-line comment style: start with a "/*" line please. The phrasing of the
comment is not very clear. Maybe something like:

		/*
		 * If the command aborted because of some issue with the
		 * adapter or link, report a POWER ON, RESET, OR BUS DEVICE
		 * RESET OCCURRED error.
		 */

Did you check that all of the above error flags lead to a drive reset ? The
issue I have with this is that the drive reset is triggered by libata EH after
it got these bad errors but the sense data you use here normally indicate that
the reset was initiated by the adapter or the drive. Not sure this is ideal.

> +		ata_scsi_set_sense(dev, cmd, ABORTED_COMMAND, 0x29, 0x00);
> +		return;
> +	}
> +	if (qc->err_mask & AC_ERR_TIMEOUT) {
> +		/* COMMAND TIMEOUT DURING PROCESSING */
> +		ata_scsi_set_sense(dev, cmd, ABORTED_COMMAND, 0x2e, 0x02);
> +		return;
> +	}
>  	/* Use ata_to_sense_error() to map status register bits
>  	 * onto sense key, asc & ascq.
>  	 */

-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ