lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2ae28b7-a105-9cd6-bf2e-63051a4000b0@huaweicloud.com>
Date:   Mon, 14 Aug 2023 14:41:48 +0800
From:   Li Nan <linan666@...weicloud.com>
To:     Damien Le Moal <dlemoal@...nel.org>
Cc:     linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org,
        linan122@...wei.com, yukuai3@...wei.com, yi.zhang@...wei.com,
        houtao1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH] scsi: ata: Fix a race condition between scsi error
 handler and ahci interrupt


在 2023/8/10 10:49, Damien Le Moal 写道:
> On 8/10/23 10:48, linan666@...weicloud.com wrote:
>> From: Li Nan <linan122@...wei.com>
>>
> 
> Please explain the problem first instead of starting with a function call
> timeline which cannot ba analized without explanations.
> 
>> interrupt                            scsi_eh
>>
>> ahci_error_intr
>>    =>ata_port_freeze
>>      =>__ata_port_freeze
>>        =>ahci_freeze (turn IRQ off)
>>      =>ata_port_abort
>>        =>ata_port_schedule_eh
>>          =>shost->host_eh_scheduled++;
>>          host_eh_scheduled = 1
>>                                       scsi_error_handler
>>                                         =>ata_scsi_error
>>                                           =>ata_scsi_port_error_handler
>>                                             =>ahci_error_handler
>>                                             . =>sata_pmp_error_handler
>>                                             .   =>ata_eh_thaw_port
>>                                             .     =>ahci_thaw (turn IRQ on)
>> ahci_error_intr                            .
>>    =>ata_port_freeze                        .
>>      =>__ata_port_freeze                    .
>>        =>ahci_freeze (turn IRQ off)         .
>>      =>ata_port_abort                       .
>>        =>ata_port_schedule_eh               .
>>          =>shost->host_eh_scheduled++;      .
>>          host_eh_scheduled = 2              .
>>                                             =>ata_std_end_eh
>>                                               =>host->host_eh_scheduled = 0;
>>
>> 'host_eh_scheduled' is 0 and scsi eh thread will not be scheduled again,
>> and the ata port remain freeze and will never be enabled.
>>
>> If EH thread is already running, no need to freeze port and schedule
>> EH again.
>>
>> Reported-by: luojian <luojian5@...wei.com>
>> Signed-off-by: Li Nan <linan122@...wei.com>
>> ---
>>   drivers/ata/libahci.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
>> index e2bacedf28ef..0dfb0b807324 100644
>> --- a/drivers/ata/libahci.c
>> +++ b/drivers/ata/libahci.c
>> @@ -1840,9 +1840,17 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
>>   
>>   	/* okay, let's hand over to EH */
>>   
>> -	if (irq_stat & PORT_IRQ_FREEZE)
>> +	if (irq_stat & PORT_IRQ_FREEZE) {
>> +		/*
>> +		 * EH already running, this may happen if the port is
>> +		 * thawed in the EH. But we cannot freeze it again
>> +		 * otherwise the port will never be thawed.
>> +		 */
>> +		if (ap->pflags & (ATA_PFLAG_EH_PENDING |
>> +			ATA_PFLAG_EH_IN_PROGRESS))
>> +			return;
> 
> This is definitely not correct because EH may have been scheduled for a non
> fatal action like a device revalidate or to get sense data for successful
> commands. With this change, the port will NOT be frozen when a hard error IRQ
> comes while EH is waiting to start, that is, while EH waits for all commands to
> complete first.
> 

Yeah, we should find a better way to fix it. Do you have any suggesstions?

> Furthermore, if you get an IRQ that requires the port to be frozen, it means
> that you had a failed command. In that case, the drive is in error state per
> ATA specs and stops all communication until a read log 10h command is issued.
> So you should never ever see 2 error IRQs one after the other. If you do, it
> very likely means that you have buggy hardware.
> 
> How do you get into this situation ? What adapter and disk are you using ?
> 

 > How do you get into this situation ?
The first IRQ is io error, the second IRQ is disk link flash break.

 > What adapter and disk are you using ?
It is a disk developed by our company, but we think the same issue 
exists when using other disks.

>>   		ata_port_freeze(ap);
>> -	else if (fbs_need_dec) {
>> +	} else if (fbs_need_dec) {
>>   		ata_link_abort(link);
>>   		ahci_fbs_dec_intr(ap);
>>   	} else
> 

-- 
Thanks,
Nan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ