[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e33272f8-1ff5-561f-60a3-b4d24fe27c6b@huawei.com>
Date: Wed, 17 Apr 2024 09:46:56 +0800
From: Jason Yan <yanaijie@...wei.com>
To: Xingui Yang <yangxingui@...wei.com>, <john.g.garry@...cle.com>,
<jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
<damien.lemoal@...nsource.wdc.com>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linuxarm@...wei.com>, <prime.zeng@...ilicon.com>,
<chenxiang66@...ilicon.com>, <kangfenglong@...wei.com>
Subject: Re: [PATCH] scsi: libsas: Fix exp-attached end device cannot be
scanned in again after probe failed
Hi Xingui,
On 2024/4/16 11:07, Xingui Yang wrote:
> We found that it is judged as broadcast flutter and exits directly when the
> exp-attached end device reconnects after the end device probe failed.
Can you please describe how to reproduce this issue in detail?
Thanks,
Jason
>
> [78779.654026] sas: broadcast received: 0
> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10
> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed
> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE)
> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached
> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp)
> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found
> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> ...
> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
> [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19
> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone
> [78835.187487] sas: broadcast received: 0
> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10
> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed
> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE)
> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05
> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp)
> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter
> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0
>
> The cause of the problem is that the related ex_phy information was not
> cleared after the end device probe failed. In order to solve the above
> problem, a function sas_ex_unregister_end_dev() is defined to clear the
> ex_phy information and unregister the end device when the exp-attached end
> device probe failed.
>
> As the sata device is an asynchronous probe, the sata device may probe
> failed after done REVALIDATING DOMAIN. Then after the port is added to the
> sas_port_del_list, the port will not be deleted until the end of the next
> REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about
> creating a duplicate port will occur in the new REVALIDATING DOMAIN when
> the end device reconnects. Therefore, the previous destroy_list and
> sas_port_del_list should be handled before REVALIDATING DOMAIN.
>
> Signed-off-by: Xingui Yang <yangxingui@...wei.com>
> ---
> drivers/scsi/libsas/sas_discover.c | 2 ++
> drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++
> drivers/scsi/libsas/sas_internal.h | 6 +++++-
> 3 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index 8fb7c41c0962..aae90153f4c6 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct work_struct *work)
> struct sas_ha_struct *ha = port->ha;
> struct domain_device *ddev = port->port_dev;
>
> + sas_destruct_devices(port);
> + sas_destruct_ports(port);
> /* prevent revalidation from finding sata links in recovery */
> mutex_lock(&ha->disco_mutex);
> if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
> index f6e6db8b8aba..6ae1f4aaaf61 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -1856,6 +1856,22 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
> }
> }
>
> +void sas_ex_unregister_end_dev(struct domain_device *dev)
> +{
> + struct domain_device *parent = dev->parent;
> + struct expander_device *parent_ex = &parent->ex_dev;
> + int i;
> +
> + for (i = 0; i < parent_ex->num_phys; i++) {
> + struct ex_phy *phy = &parent_ex->ex_phy[i];
> +
> + if (sas_phy_match_dev_addr(dev, phy)) {
> + sas_unregister_devs_sas_addr(parent, i, true);
> + break;
> + }
> + }
Did you mean this end device is a wide-port end device ? How could this
happen?
> +}
> +
> static int sas_discover_bfs_by_root_level(struct domain_device *root,
> const int level)
> {
> diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
> index 3804aef165ad..434f928c2ed8 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -50,6 +50,7 @@ void sas_discover_event(struct asd_sas_port *port, enum discover_event ev);
>
> void sas_init_dev(struct domain_device *dev);
> void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev);
> +void sas_ex_unregister_end_dev(struct domain_device *dev);
>
> void sas_scsi_recover_host(struct Scsi_Host *shost);
>
> @@ -145,7 +146,10 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i
> func, dev->parent ? "exp-attached" :
> "direct-attached",
> SAS_ADDR(dev->sas_addr), err);
> - sas_unregister_dev(dev->port, dev);
> + if (dev->parent && !dev_is_expander(dev->dev_type))
> + sas_ex_unregister_end_dev(dev);
> + else
> + sas_unregister_dev(dev->port, dev);
> }
>
> static inline void sas_fill_in_rphy(struct domain_device *dev,
>
Powered by blists - more mailing lists