lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Aug 2023 11:49:50 +0900
From:   Damien Le Moal <dlemoal@...nel.org>
To:     linan666@...weicloud.com
Cc:     linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org,
        linan122@...wei.com, yukuai3@...wei.com, yi.zhang@...wei.com,
        houtao1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH] scsi: ata: Fix a race condition between scsi error
 handler and ahci interrupt

On 8/10/23 10:48, linan666@...weicloud.com wrote:
> From: Li Nan <linan122@...wei.com>
> 

Please explain the problem first instead of starting with a function call
timeline which cannot ba analized without explanations.

> interrupt                            scsi_eh
> 
> ahci_error_intr
>   =>ata_port_freeze
>     =>__ata_port_freeze
>       =>ahci_freeze (turn IRQ off)
>     =>ata_port_abort
>       =>ata_port_schedule_eh
>         =>shost->host_eh_scheduled++;
>         host_eh_scheduled = 1
>                                      scsi_error_handler
>                                        =>ata_scsi_error
>                                          =>ata_scsi_port_error_handler
>                                            =>ahci_error_handler
>                                            . =>sata_pmp_error_handler
>                                            .   =>ata_eh_thaw_port
>                                            .     =>ahci_thaw (turn IRQ on)
> ahci_error_intr                            .
>   =>ata_port_freeze                        .
>     =>__ata_port_freeze                    .
>       =>ahci_freeze (turn IRQ off)         .
>     =>ata_port_abort                       .
>       =>ata_port_schedule_eh               .
>         =>shost->host_eh_scheduled++;      .
>         host_eh_scheduled = 2              .
>                                            =>ata_std_end_eh
>                                              =>host->host_eh_scheduled = 0;
> 
> 'host_eh_scheduled' is 0 and scsi eh thread will not be scheduled again,
> and the ata port remain freeze and will never be enabled.
> 
> If EH thread is already running, no need to freeze port and schedule
> EH again.
> 
> Reported-by: luojian <luojian5@...wei.com>
> Signed-off-by: Li Nan <linan122@...wei.com>
> ---
>  drivers/ata/libahci.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index e2bacedf28ef..0dfb0b807324 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -1840,9 +1840,17 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
>  
>  	/* okay, let's hand over to EH */
>  
> -	if (irq_stat & PORT_IRQ_FREEZE)
> +	if (irq_stat & PORT_IRQ_FREEZE) {
> +		/*
> +		 * EH already running, this may happen if the port is
> +		 * thawed in the EH. But we cannot freeze it again
> +		 * otherwise the port will never be thawed.
> +		 */
> +		if (ap->pflags & (ATA_PFLAG_EH_PENDING |
> +			ATA_PFLAG_EH_IN_PROGRESS))
> +			return;

This is definitely not correct because EH may have been scheduled for a non
fatal action like a device revalidate or to get sense data for successful
commands. With this change, the port will NOT be frozen when a hard error IRQ
comes while EH is waiting to start, that is, while EH waits for all commands to
complete first.

Furthermore, if you get an IRQ that requires the port to be frozen, it means
that you had a failed command. In that case, the drive is in error state per
ATA specs and stops all communication until a read log 10h command is issued.
So you should never ever see 2 error IRQs one after the other. If you do, it
very likely means that you have buggy hardware.

How do you get into this situation ? What adapter and disk are you using ?

>  		ata_port_freeze(ap);
> -	else if (fbs_need_dec) {
> +	} else if (fbs_need_dec) {
>  		ata_link_abort(link);
>  		ahci_fbs_dec_intr(ap);
>  	} else

-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ