[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d590fde9-69bc-0b9c-c907-0b90838e5f94@huawei.com>
Date: Tue, 18 Jun 2024 21:10:34 +0800
From: yangxingui <yangxingui@...wei.com>
To: John Garry <john.g.garry@...cle.com>, <yanaijie@...wei.com>,
<jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
<damien.lemoal@...nsource.wdc.com>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linuxarm@...wei.com>, <prime.zeng@...ilicon.com>,
<chenxiang66@...ilicon.com>, <kangfenglong@...wei.com>
Subject: Re: [PATCH v3] scsi: libsas: Fix exp-attached end device cannot be
scanned in again after probe failed
Hi John,
On 2024/6/18 20:08, John Garry wrote:
> On 18/06/2024 12:45, yangxingui wrote:
>> Hi, John,
>>
>> Thanks for your reply.
>>
>> On 2024/6/18 16:55, John Garry wrote:
>>> On 13/06/2024 13:23, Xingui Yang wrote:
>>>
>>> Sorry for delay in responding and asking further questions.
>> It doesn't matter.
>>>
>>>> We found that it is judged as broadcast flutter when the
>>>> exp-attached end
>>>> device reconnects after probe failed, as follows:
>>>>
>>>> [78779.654026] sas: broadcast received: 0
>>>> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed
>>>> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated
>>>> BROADCAST(CHANGE)
>>>> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached
>>>> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached:
>>>> 500e004aaaaaaa05 (stp)
>>>> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found
>>>> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
>>>> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>>> ...
>>>> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
>>>> 0 tries: 1
>>>> [78835.171344] sas: sas_probe_sata: for exp-attached device
>>>> 500e004aaaaaaa05 returned -19
>>>> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone
>>>> [78835.187487] sas: broadcast received: 0
>>>> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed
>>>> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated
>>>> BROADCAST(CHANGE)
>>>> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05
>>>> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached:
>>>> 500e004aaaaaaa05 (stp)
>>>> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter
>>>> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
>>>>
>>>> The cause of the problem is that the related ex_phy's
>>>> attached_sas_addr was
>>>> not cleared after the end device probe failed. In order to solve the
>>>> above
>>>> problem, a function sas_ex_unregister_end_dev() is defined to clear the
>>>> ex_phy information and unregister the end device after the
>>>> exp-attached end
>>>> device probe failed.
>>>
>>> Can you just manually clear the ex_phy's attached_sas_addr at the
>>> appropiate point (along with calling sas_unregister_dev())? It seems
>>> that we are using heavy-handed approach in calling
>>> sas_unregister_devs_sas_addr(), which does the clearing and much more.
>>
>> I just tried it and it worked. If we only clear ex_phy's
>> attached_sas_addr, there is no need to call sas_destruct_ports(). We
>> are currently using sas_unregister_devs_sas_addr() which will add the
>> port to sas_port_del_list, so we need to call sas_destruct_ports()
>> separately to delete the port.
>>
>> Should we also delete the port after the devices probe failed?
>
> I'm not sure. Please check it.
>
> sas_fail_probe() would still call sas_unregister_dev(), as required.
>
> And you said that the sas_fail_probe() probe call would be asynchronous
> to sas_revalidate_domainin(). I actually expected you to have the new
> call to sas_destruct_ports() at the top of sas_revalidate_domainin(),
> like v2, but it is in sas_probe_devices().
>
> Anyway, please check whether you require this additional call to delete
> the port.
>
Sorry, there was something wrong with the previous process description.
the correct is:
1. REVALIDATING DOMAIN
2. new device attached, create port,etc.
4. done REVALIDATING DOMAIN
5. @out, handle parent->port->sas_port_del_list
6. sas_probe_devices()
7. if device probe failed in step 6 and call
sas_unregister_devs_sas_addr(), then add phy->port->list to
parent->port->sas_port_del_list // port won't delete
8. next, REVALIDATING DOMAIN
9. new device attached
10. new port create failed, as port already exits.
So, v3 delete port at then end of sas_probe_devices(). And if we don't
use sas_unregister_devs_sas_addr() follow your suggestion then we don't
need to call sas_destruct_ports().
Thanks,
Xingui
Powered by blists - more mailing lists