lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 18 Apr 2024 10:42:49 +0800
From: yangxingui <yangxingui@...wei.com>
To: Jason Yan <yanaijie@...wei.com>, <john.g.garry@...cle.com>,
	<jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
	<damien.lemoal@...nsource.wdc.com>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linuxarm@...wei.com>, <prime.zeng@...ilicon.com>,
	<chenxiang66@...ilicon.com>, <kangfenglong@...wei.com>
Subject: Re: [PATCH] scsi: libsas: Fix exp-attached end device cannot be
 scanned in again after probe failed

Hi Jason,

On 2024/4/18 9:46, Jason Yan wrote:
> On 2024/4/17 15:47, yangxingui wrote:
>>
>>
>> On 2024/4/17 9:46, Jason Yan wrote:
>>> Hi Xingui,
>>>
>>> On 2024/4/16 11:07, Xingui Yang wrote:
>>>> We found that it is judged as broadcast flutter and exits directly 
>>>> when the
>>>> exp-attached end device reconnects after the end device probe failed.
>>>
>>> Can you please describe how to reproduce this issue in detail?
>> The test steps we currently construct are to simulate link 
>> abnormalities and adjust the rate of the remote phy when running IO on 
>> all disks.
>>
>> When the sata disk is probed and the IDENTIFY command is sent to the 
>> disk, the expander return rate is abnormal, causing sata disk probe 
>> fail. But there may be many reasons for device probe failure, 
>> including expander or disk instability or link abnormalities.
>>
>>>
>>> Thanks,
>>> Jason
>>>
>>>>
>>>> [78779.654026] sas: broadcast received: 0
>>>> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed
>>>> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated 
>>>> BROADCAST(CHANGE)
>>>> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached
>>>> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 
>>>> 500e004aaaaaaa05 (stp)
>>>> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found
>>>> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
>>>> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>>> ...
>>>> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 
>>>> 0 tries: 1
>>>> [78835.171344] sas: sas_probe_sata: for exp-attached device 
>>>> 500e004aaaaaaa05 returned -19
>>>> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone
>>>> [78835.187487] sas: broadcast received: 0
>>>> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10
>>>> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed
>>>> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated 
>>>> BROADCAST(CHANGE)
>>>> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05
>>>> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 
>>>> 500e004aaaaaaa05 (stp)
>>>> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter
>>>> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
>>>>
>>>> The cause of the problem is that the related ex_phy information was not
>>>> cleared after the end device probe failed. In order to solve the above
>>>> problem, a function sas_ex_unregister_end_dev() is defined to clear the
>>>> ex_phy information and unregister the end device when the 
>>>> exp-attached end
>>>> device probe failed.
>>>>
>>>> As the sata device is an asynchronous probe, the sata device may probe
>>>> failed after done REVALIDATING DOMAIN. Then after the port is added 
>>>> to the
>>>> sas_port_del_list, the port will not be deleted until the end of the 
>>>> next
>>>> REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about
>>>> creating a duplicate port will occur in the new REVALIDATING DOMAIN 
>>>> when
>>>> the end device reconnects. Therefore, the previous destroy_list and
>>>> sas_port_del_list should be handled before REVALIDATING DOMAIN.
>>>>
>>>> Signed-off-by: Xingui Yang <yangxingui@...wei.com>
>>>> ---
>>>>   drivers/scsi/libsas/sas_discover.c |  2 ++
>>>>   drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++
>>>>   drivers/scsi/libsas/sas_internal.h |  6 +++++-
>>>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>>>> b/drivers/scsi/libsas/sas_discover.c
>>>> index 8fb7c41c0962..aae90153f4c6 100644
>>>> --- a/drivers/scsi/libsas/sas_discover.c
>>>> +++ b/drivers/scsi/libsas/sas_discover.c
>>>> @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct 
>>>> work_struct *work)
>>>>       struct sas_ha_struct *ha = port->ha;
>>>>       struct domain_device *ddev = port->port_dev;
>>>> +    sas_destruct_devices(port);
>>>> +    sas_destruct_ports(port);
>>>>       /* prevent revalidation from finding sata links in recovery */
>>>>       mutex_lock(&ha->disco_mutex);
>>>>       if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
>>>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>>>> b/drivers/scsi/libsas/sas_expander.c
>>>> index f6e6db8b8aba..6ae1f4aaaf61 100644
>>>> --- a/drivers/scsi/libsas/sas_expander.c
>>>> +++ b/drivers/scsi/libsas/sas_expander.c
>>>> @@ -1856,6 +1856,22 @@ static void 
>>>> sas_unregister_devs_sas_addr(struct domain_device *parent,
>>>>       }
>>>>   }
>>>> +void sas_ex_unregister_end_dev(struct domain_device *dev)
>>>> +{
>>>> +    struct domain_device *parent = dev->parent;
>>>> +    struct expander_device *parent_ex = &parent->ex_dev;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < parent_ex->num_phys; i++) {
>>>> +        struct ex_phy *phy = &parent_ex->ex_phy[i];
>>>> +
>>>> +        if (sas_phy_match_dev_addr(dev, phy)) {
>>>> +            sas_unregister_devs_sas_addr(parent, i, true);
>>>> +            break;
>>>> +        }
>>>> +    }
>>>
>>> Did you mean this end device is a wide-port end device ? How could 
>>> this happen?
>>
>> No, the end device described here is a non-expander device. Such as: 
>> sata/sas disk. But these devices are exp-attached.
> 
> If it is not a wide port, why do they have the same sas address here? 
> Why do you add this function to unregister these PHYs? And the last 
> parameter of sas_unregister_devs_sas_addr() means the last PHY of the 
> wide port, you just all passed true, it is irrational.
The non-expander end device does not have a wide port, such as a sata 
disk, and there is only one ex_phy corresponding to it. This function 
finds the ex_phy corresponding to the end dev through dev->sas_addr, 
then clears the ex_phy information and unregister the end device.

Thanks,
Xingui

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ