lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 19 Jun 2024 09:01:33 +0800
From: yangxingui <yangxingui@...wei.com>
To: John Garry <john.g.garry@...cle.com>, <yanaijie@...wei.com>,
	<jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
	<damien.lemoal@...nsource.wdc.com>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linuxarm@...wei.com>, <prime.zeng@...ilicon.com>,
	<chenxiang66@...ilicon.com>, <kangfenglong@...wei.com>
Subject: Re: [PATCH v3] scsi: libsas: Fix exp-attached end device cannot be
 scanned in again after probe failed

Hi, John

On 2024/6/18 23:21, John Garry wrote:
> On 18/06/2024 14:10, yangxingui wrote:
>>>>>
>>>>>> We found that it is judged as broadcast flutter when the 
>>>>>> exp-attached end
>>>>>> device reconnects after probe failed, as follows:
>>>>>>
>>>>>> [78779.654026] sas: broadcast received: 0
>>>>>> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>>>> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has 
>>>>>> changed
>>>>>> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated 
>>>>>> BROADCAST(CHANGE)
>>>>>> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached
>>>>>> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 
>>>>>> 500e004aaaaaaa05 (stp)
>>>>>> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found
>>>>>> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, 
>>>>>> res 0x0
>>>>>> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>>>>> ...
>>>>>> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 
>>>>>> failed: 0 tries: 1
>>>>>> [78835.171344] sas: sas_probe_sata: for exp-attached device 
>>>>>> 500e004aaaaaaa05 returned -19
>>>>>> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone
>>>>>> [78835.187487] sas: broadcast received: 0
>>>>>> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>>>> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has 
>>>>>> changed
>>>>>> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated 
>>>>>> BROADCAST(CHANGE)
>>>>>> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05
>>>>>> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 
>>>>>> 500e004aaaaaaa05 (stp)
>>>>>> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter
>>>>>> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, 
>>>>>> res 0x0
>>>>>>
>>>>>> The cause of the problem is that the related ex_phy's 
>>>>>> attached_sas_addr was
>>>>>> not cleared after the end device probe failed. In order to solve 
>>>>>> the above
>>>>>> problem, a function sas_ex_unregister_end_dev() is defined to 
>>>>>> clear the
>>>>>> ex_phy information and unregister the end device after the 
>>>>>> exp-attached end
>>>>>> device probe failed.
>>>>>
>>>>> Can you just manually clear the ex_phy's attached_sas_addr at the 
>>>>> appropiate point (along with calling sas_unregister_dev())? It 
>>>>> seems that we are using heavy-handed approach in calling 
>>>>> sas_unregister_devs_sas_addr(), which does the clearing and much more.
>>>>
>>>> I just tried it and it worked. If we only clear ex_phy's 
>>>> attached_sas_addr, there is no need to call sas_destruct_ports(). We 
>>>> are currently using sas_unregister_devs_sas_addr() which will add 
>>>> the port to sas_port_del_list, so we need to call 
>>>> sas_destruct_ports() separately to delete the port.
>>>>
>>>> Should we also delete the port after the devices probe failed?
>>>
>>> I'm not sure. Please check it.
>>>
>>> sas_fail_probe() would still call sas_unregister_dev(), as required.
>>>
>>> And you said that the sas_fail_probe() probe call would be 
>>> asynchronous to sas_revalidate_domainin(). I actually expected you to 
>>> have the new call to sas_destruct_ports() at the top of 
>>> sas_revalidate_domainin(), like v2, but it is in sas_probe_devices().
>>>
>>> Anyway, please check whether you require this additional call to 
>>> delete the port.
>>>
>> Sorry, there was something wrong with the previous process description.
>> the correct is:
>>
>> 1. REVALIDATING DOMAIN
>> 2. new device attached, create port,etc.
>> 4. done REVALIDATING DOMAIN
>> 5. @out, handle parent->port->sas_port_del_list
>> 6. sas_probe_devices()
>> 7. if device probe failed in step 6 and call 
>> sas_unregister_devs_sas_addr(), then add phy->port->list to 
>> parent->port->sas_port_del_list // port won't delete
>>
>> 8. next, REVALIDATING DOMAIN
>> 9. new device attached
>> 10. new port create failed, as port already exits.
>>
>>
>> So, v3 delete port at then end of sas_probe_devices(). And if we don't 
>> use sas_unregister_devs_sas_addr() follow your suggestion then we 
>> don't need to call sas_destruct_ports().
> 
> I am finding it hard to follow you now.
I'm sorry for that. ^-^
> 
> Can you show the complete change which you think that we now require to 
> fix this issue?
> 
Okay, I'll update a new version.

Thanks,
Xingui

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ