linux-kernel - Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <374f2f3c-e0f0-cd28-4b43-fa46a1fd5002@huawei.com>
Date:   Wed, 27 Sep 2023 17:39:38 +0800
From:   Wenchao Hao <haowenchao2@...wei.com>
To:     Mike Christie <michael.christie@...cle.com>,
        Christoph Hellwig <hch@...radead.org>
CC:     "James E . J . Bottomley" <jejb@...ux.ibm.com>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        <linux-scsi@...r.kernel.org>, Hannes Reinecke <hare@...e.de>,
        <linux-kernel@...r.kernel.org>, <louhongxiang@...wei.com>,
        <lixiaokeng@...wei.com>
Subject: Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle
 mechanism

On 2023/9/27 1:37, Mike Christie wrote:
> On 9/26/23 7:57 AM, Wenchao Hao wrote:
>> On 2023/9/26 1:54, Mike Christie wrote:
>>> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>>>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>>>> Before we add another new error handling mechanism we need to fix the
>>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>>>>> reset handlers hasn't made a lot of progress in the last five years and
>>>>> we'll need to urgently fix that first before adding even more
>>>>> complexity.
>>>>>
>>>> I observed Hannes's patches posted about one year ago, it has not been
>>>> applied yet. I don't know if he is still working on it.
>>>>
>>>> My patches do not depend much on that work, I think the conflict can be
>>>> solved fast between two changes.
>>>
>>> I think we want to figure out Hannes's patches first.
>>>
>>> For a new EH design we will want to be able to do multiple TMFs in parallel
>>> on the same host/target right?
>>>
>>
>> It's not necessary to do multiple TMFs in parallel, it's ok to make sure
>> each TMFs do not affect each other.
>>
>> For example, we have two devices: 0:0:0:0 and 0:0:0:1
>>
>> Both of them request device reset, they do not happened in parallel, but
>> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
>> just wait 0:0:0:0 to finish.
> 
> I see. I guess we still get the benefit of not having to stop other devices
> when doing TMFs.
> 

Yes, it's better to support multiple TMFs in parallel than just run in serial.
I would wait for Hannes's changes to be applied and send my change again.

> I think we still want a common way to allocate/free and manage resources
> drivers will use during this time. Maybe have a init_device/target/cmd/eh_priv
> and exit_device/target/eh_priv (I'm not sure of the name, but something similar
> to the init_cmd_priv/exit_cmd_priv we have for normal commands.
> 
> scsi-ml then calls into the new eh with the priv data. Drivers don't have to
> do the preallocation and worry if it's per device/target/host.
> 
> I'm not 100% sure about the low level details. Check out how Hannes's is
> handling tag management for TMFs as well.
> 
> 
>>
>>> The problem is that we need to be able to make forward progress in the EH
>>> path and not fail just because we can't allocate memory for a TMF related
>>> struct. To accomplish this now, drivers will use mempools, preallocate TMF
>>> related structs/mem/tags with their scsi_cmnd related structs, preallocate
>>> per host/target/device related structs or ignore what I wrote above and just
>>> fail.
>>>
>>> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
>>> when it's not needed. That seems nice because after that, then for your new
>>> EH we can begin to standardize on how to handle preallocation of drivers
>>> resources needed to perform TMFs for your new EH. It could be a per
>>> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
>>> into the drivers with that data. It doesn't have to be exactly like that or
>>> anything close. It would be nice for drivers to not have to think about this
>>> type of thing and scsi-ml just to handle the resource management for us when
>>> there are multiple TMFs in progress.
>>>
>>
>