lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 30 Jan 2019 17:53:58 +0000
From:   John Garry <john.garry@...wei.com>
To:     Jason Yan <yanaijie@...wei.com>, <martin.petersen@...cle.com>,
        <jejb@...ux.vnet.ibm.com>
CC:     <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <zhaohongjiang@...wei.com>, <hare@...e.com>,
        <dan.j.williams@...el.com>, <jthumshirn@...e.de>, <hch@....de>,
        <huangdaode@...ilicon.com>, <chenxiang66@...ilicon.com>,
        <xiexiuqi@...wei.com>, <tj@...nel.org>, <miaoxie@...wei.com>,
        Xiaofei Tan <tanxiaofei@...wei.com>,
        Ewan Milne <emilne@...hat.com>, Tomas Henzl <thenzl@...hat.com>
Subject: Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks

On 30/01/2019 08:24, Jason Yan wrote:
> The work flow of revalidation now is scanning expander phy by the
> sequence of the phy and check if the phy have changed. This will leads
> to an issue of swapping two sas disks on one expander.
>
> Assume we have two sas disks, connected with expander phy10 and phy11:
>
> phy10: 5000cca04eb1001d  port-0:0:10
> phy11: 5000cca04eb043ad  port-0:0:11
>
> Swap these two disks, and imaging the following scenario:
>
> revalidation 1:

What does "revalidation 1" actually mean?

>   -->phy10: 0 --> delete phy10 domain device
>   -->phy11: 5000cca04eb043ad (no change)

so is disk 11 still inserted at this stage?

> revalidation done
>
> revalidation 2:

is port-0:0:10 deleted now?

>   -->step 1, check phy10:
>   -->phy10: 5000cca04eb043ad   --> add to wide port(port-0:0:11) (phy11
>        address is still 5000cca04eb043ad now)

So this should not happen. How are you physcially swapping them such 
that phy11 address is still 5000cca04eb043ad? I don't see how this would 
be true at revalidation 1.

>
>   -->step 2, check phy11:
>   -->phy11: 0  --> phy11 address is 0 now, but it's part of wide
>        port(port-0:0:11), the domain device will not be deleted.
> revalidation done
>
> revalidation 3:
>   -->phy10, 5000cca04eb043ad (no change)
>   -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
>        port-0:0:11 already exist, trigger a warning as follows
> revalidation done
>
> [14790.189699] sysfs: cannot create duplicate filename
> '/devices/pci0000:74/0000:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'
> [14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
> 4.16.0-rc1-191134-g138f084-dirty #228
> [14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI
> Nemo 2.0 RC0 - B303 05/16/2018
> [14790.219323] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain
> [14790.225404] Call trace:
> [14790.227842]  dump_backtrace+0x0/0x18c
> [14790.231492]  show_stack+0x14/0x1c
> [14790.234798]  dump_stack+0x88/0xac
> [14790.238101]  sysfs_warn_dup+0x64/0x7c
> [14790.241751]  sysfs_create_dir_ns+0x90/0xa0
> [14790.245835]  kobject_add_internal+0xa0/0x284
> [14790.250092]  kobject_add+0xb8/0x11c
> [14790.253570]  device_add+0xe8/0x598
> [14790.256960]  sas_port_add+0x24/0x50
> [14790.260436]  sas_ex_discover_devices+0xb10/0xc30
> [14790.265040]  sas_ex_revalidate_domain+0x1d8/0x518
> [14790.269731]  sas_revalidate_domain+0x12c/0x154
> [14790.274163]  process_one_work+0x128/0x2b0
> [14790.278160]  worker_thread+0x14c/0x408
> [14790.281897]  kthread+0xfc/0x128
> [14790.285026]  ret_from_fork+0x10/0x18
> [14790.288598] ------------[ cut here ]------------
>
> At last, the disk 5000cca04eb1001d is lost.
>
> The basic idea of fix this issue is to let the revalidation first scan
> all phys, and then unregisterring devices. Only when no devices need to
> be unregisterred, go to the next step to discover new devices. If there
> are devices need unregister, unregister those devices and tell the
> revalidation event processor to retry again. The next revalidation will
> process the discovering of the new devices.
>
> Tested-by: Chen Liangfei <chenliangfei1@...wei.com>
> Signed-off-by: Jason Yan <yanaijie@...wei.com>
> CC: Xiaofei Tan <tanxiaofei@...wei.com>
> CC: chenxiang <chenxiang66@...ilicon.com>
> CC: John Garry <john.garry@...wei.com>
> CC: Johannes Thumshirn <jthumshirn@...e.de>
> CC: Ewan Milne <emilne@...hat.com>
> CC: Christoph Hellwig <hch@....de>
> CC: Tomas Henzl <thenzl@...hat.com>
> CC: Dan Williams <dan.j.williams@...el.com>
> CC: Hannes Reinecke <hare@...e.com>
> ---
>  drivers/scsi/libsas/sas_expander.c | 146 +++++++++++++++++++++++++------------
>  1 file changed, 100 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
> index e781941a7088..073bb5c6e353 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -2056,7 +2056,7 @@ static bool sas_process_flutter(struct domain_device *dev, struct ex_phy *phy,
>  	return true;
>  }
>
> -static int sas_unregister(struct domain_device *dev, int phy_id, bool last,
> +static int sas_ex_unregister(struct domain_device *dev, int phy_id, bool last,
>  			      bool *retry)
>  {
>  	struct expander_device *ex = &dev->ex_dev;
> @@ -2108,21 +2108,8 @@ static int sas_unregister(struct domain_device *dev, int phy_id, bool last,
>  	return 0;
>  }
>
> -/**
> - * sas_rediscover - revalidate the domain.
> - * @dev:domain device to be detect.
> - * @phy_id: the phy id will be detected.
> - *
> - * NOTE: this process _must_ quit (return) as soon as any connection
> - * errors are encountered.  Connection recovery is done elsewhere.
> - * Discover process only interrogates devices in order to discover the
> - * domain.For plugging out, we un-register the device only when it is
> - * the last phy in the port, for other phys in this port, we just delete it
> - * from the port.For inserting, we do discovery when it is the
> - * first phy,for other phys in this port, we add it to the port to
> - * forming the wide-port.
> - */
> -static void sas_rediscover(struct domain_device *dev, const int phy_id,
> +
> +static void sas_ex_unregister_device(struct domain_device *dev, const int phy_id,
>  			   bool *retry)
>  {
>  	struct expander_device *ex = &dev->ex_dev;
> @@ -2131,31 +2118,70 @@ static void sas_rediscover(struct domain_device *dev, const int phy_id,
>  	int i;
>  	bool last = true;	/* is this the last phy of the port */
>
> -	pr_debug("ex %016llx phy%d originated BROADCAST(CHANGE)\n",
> -		 SAS_ADDR(dev->sas_addr), phy_id);
> -
> -	if (SAS_ADDR(changed_phy->attached_sas_addr) != 0) {
> -		for (i = 0; i < ex->num_phys; i++) {
> -			struct ex_phy *phy = &ex->ex_phy[i];
> +	for (i = 0; i < ex->num_phys; i++) {
> +		struct ex_phy *phy = &ex->ex_phy[i];
>
> -			if (i == phy_id)
> -				continue;
> -			if (SAS_ADDR(phy->attached_sas_addr) ==
> -			    SAS_ADDR(changed_phy->attached_sas_addr)) {
> -				pr_debug("phy%d part of wide port with phy%d\n",
> -					 phy_id, i);
> -				last = false;
> -				break;
> -			}
> +		if (i == phy_id)
> +			continue;
> +		if (SAS_ADDR(phy->attached_sas_addr) ==
> +		    SAS_ADDR(changed_phy->attached_sas_addr)) {
> +			pr_debug("phy%d part of wide port with phy%d\n",
> +				 phy_id, i);
> +			last = false;
> +			break;
>  		}
> -		res = sas_unregister(dev, phy_id, last, retry);
> -	} else
> -		res = sas_discover_new(dev, phy_id);
> +	}
> +	res = sas_ex_unregister(dev, phy_id, last, retry);
>
>  	pr_debug("ex %016llx phy%d discover returned 0x%x\n",
>  		 SAS_ADDR(dev->sas_addr), phy_id, res);
>  }
>
> +static int sas_ex_try_unregister(struct domain_device *dev, u8 *changed_phy,
> +				 int nr, bool *retry)
> +{
> +	struct expander_device *ex = &dev->ex_dev;
> +	int unregistered = 0;
> +	struct ex_phy *phy;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		pr_debug("ex %016llx phy%d originated BROADCAST(CHANGE)\n",
> +			 SAS_ADDR(dev->sas_addr), changed_phy[i]);
> +
> +		phy = &ex->ex_phy[changed_phy[i]];
> +
> +		if (SAS_ADDR(phy->attached_sas_addr) == 0)
> +			continue;
> +
> +		sas_ex_unregister_device(dev, changed_phy[i], retry);
> +		changed_phy[i] = 0xff;
> +		unregistered++;
> +	}
> +	return unregistered;
> +}
> +
> +static void sas_ex_register(struct domain_device *dev, u8 *changed_phy,
> +			    int nr)
> +{
> +	struct expander_device *ex = &dev->ex_dev;
> +	struct ex_phy *phy;
> +	int res = 0;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		if (changed_phy[i] == 0xff)
> +			continue;
> +
> +		phy = &ex->ex_phy[changed_phy[i]];
> +
> +		res = sas_discover_new(dev, changed_phy[i]);
> +
> +		pr_debug("ex %016llx phy%d register returned 0x%x\n",
> +			 SAS_ADDR(dev->sas_addr), changed_phy[i], res);
> +	}
> +}
> +
>  /**
>   * sas_ex_revalidate_domain - revalidate the domain
>   * @port_dev: port domain device.
> @@ -2170,21 +2196,49 @@ void sas_ex_revalidate_domain(struct domain_device *port_dev, bool *retry)
>  {
>  	int res;
>  	struct domain_device *dev = NULL;
> +	u8 changed_phy[MAX_EXPANDER_PHYS];
> +	struct expander_device *ex;
> +	int unregistered = 0;
> +	int phy_id;
> +	int nr = 0;
> +	int i = 0;
>
>  	res = sas_find_bcast_dev(port_dev, &dev);
> -	if (res == 0 && dev) {
> -		struct expander_device *ex = &dev->ex_dev;
> -		int i = 0, phy_id;
> -
> -		do {
> -			phy_id = -1;
> -			res = sas_find_bcast_phy(dev, &phy_id, i, true);
> -			if (phy_id == -1)
> -				break;
> -			sas_rediscover(dev, phy_id, retry);
> -			i = phy_id + 1;
> -		} while (i < ex->num_phys);
> +	if (res || !dev)
> +		return;
> +
> +	memset(changed_phy, 0xff, MAX_EXPANDER_PHYS);
> +	ex = &dev->ex_dev;
> +
> +	do {
> +		phy_id = -1;
> +		res = sas_find_bcast_phy(dev, &phy_id, i, true);
> +		if (phy_id == -1)
> +			break;
> +		changed_phy[nr++] = phy_id;
> +		i = phy_id + 1;
> +	} while (i < dev->ex_dev.num_phys);
> +
> +	if (nr == 0)
> +		return;
> +
> +	unregistered = sas_ex_try_unregister(dev, changed_phy, nr, retry);
> +
> +	if (unregistered > 0) {
> +		struct ex_phy *phy;
> +
> +		for (i = 0; i < nr; i++) {
> +			if (changed_phy[i] == 0xff)
> +				continue;
> +			phy = &ex->ex_phy[changed_phy[i]];
> +			phy->phy_change_count = -1;
> +		}
> +		ex->ex_change_count = -1;
> +		*retry = true;
> +		return;
>  	}
> +
> +	sas_ex_register(dev, changed_phy, nr);
>  }
>
>  void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost,
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ