linux-kernel - Re: [PATCH v2 2/6] scsi: libsas: Add sas_ata_device_link

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4b471300-a912-c3c0-ead4-7165c57cbbf8@huawei.com>
Date:   Fri, 2 Sep 2022 17:19:37 +0100
From:   John Garry <john.garry@...wei.com>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        <jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
        <jinpu.wang@...ud.ionos.com>, <yangxingui@...wei.com>
CC:     <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linuxarm@...wei.com>, <hare@...e.de>
Subject: Re: [PATCH v2 2/6] scsi: libsas: Add sas_ata_device_link_abort()

Hi Damien,

>>>
>>> But the pm8001 manual and current driver indicate that the
>>> OPC_INB_SATA_ABORT command should be sent after read log ext when
>>> handling NCQ error, regardless of an autopsy. I send OPC_INB_SATA_ABORT
>>> in ata_eh_reset() -> pm8001_I_T_nexus_reset() -> pm8001_send_abort_all()
>> You lost me: ata_eh_recover() will call ata_eh_reset() only if the 
>> ATA_EH_RESET
>> action flag is set. So are you saying that even though it is not 
>> needed, you
>> still need to set ATA_EH_RESET for pm8001 ?
> 
> As below, it was the only location I found suitable to call 
> pm8001_send_abort_all().
> 
> However I am not really sure it is required now. For pm8001 NCQ error 
> handling we require 2x steps:
> - read log ext
> - Send OPC_INB_SATA_ABORT - we do this in pm8001_send_abort_all()
> 
> pm8001_send_abort_all() sends OPC_INB_SATA_ABORT in "device abort all" 
> mode, meaning any IO in the HBA is aborted for the device. But we are 
> also earlier in EH sending OPC_INB_SATA_ABORT for individual IOs in 
> sas_eh_handle_sas_errors() -> sas_scsi_find_task() -> 
> pm8001_abort_task() -> sas_execute_internal_abort_single() -> ... 
> send_abort_task()
> 
> So I don't think that the pm8001_send_abort_all() call has any effect, 
> as we're already aborting any outstanding IO earlier.
> 
> Admittedly the order of the 2x steps is different, but 
> OPC_INB_SATA_ABORT does not send any protocol message to the disk, so 
> would not affect anything subsequently read with read log ext.
> 
> Having said all that, it may be wise to still send 
> pm8001_send_abort_all()..

Have you had a chance to consider all this yet?

I am now starting to think that it is not necessary to call 
pm8001_send_abort_all(), and instead rely only on 
sas_eh_handle_sas_errors() -> sas_scsi_find_task() -> 
pm8001_abort_task() -> sas_execute_internal_abort_single() -> ... -> 
send_abort_task() to abort each outstanding IO. Then we would not have 
the issue of forcing the reset in $subject (to lead to calling 
pm8001_send_abort_all()).

This idea could simply be tested by stubbing pm8001_send_abort_all()
(and dropping the |= ATA_EH_RESET in $subject).

Thanks,
John