[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa72bd76-2af5-202d-8a2c-afb5a700b6c0@linux.ibm.com>
Date: Thu, 30 Dec 2021 18:55:40 +0100
From: Steffen Maier <maier@...ux.ibm.com>
To: Wenchao Hao <haowenchao@...wei.com>,
"James E . J . Bottomley" <jejb@...ux.ibm.com>,
"Martin K . Petersen" <martin.petersen@...cle.com>,
linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
James Smart <james.smart@...adcom.com>,
Dick Kennedy <dick.kennedy@...adcom.com>,
Nilesh Javali <njavali@...vell.com>,
GR-QLogic-Storage-Upstream@...vell.com,
Hannes Reinecke <hare@...e.de>,
Martin Wilck <martin.wilck@...e.com>,
linux-s390 <linux-s390@...r.kernel.org>,
Benjamin Block <bblock@...ux.ibm.com>,
Ming Lei <ming.lei@...hat.com>
Cc: Zhiqiang Liu <liuzhiqiang26@...wei.com>,
Feilong Lin <linfeilong@...wei.com>, Wu Bo <wubo40@...wei.com>
Subject: Re: [PATCH] scsi: Do not break scan luns loop if add single lun
failed
On 12/26/21 00:29, Wenchao Hao wrote:
> Failed to add a single lun does not mean all luns are unaccessible,
> if we break the scan luns loop, the other luns reported by REPORT LUNS
> command would not be probed any more.
>
> In this case, we might loss some luns which are accessible.
Could you please add more details about the specific use case, where this
actually was a problem, for my understanding?
>
> Signed-off-by: Wenchao Hao <haowenchao@...wei.com>
> ---
> drivers/scsi/scsi_scan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 23e1c0acdeae..fee7ce082103 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1476,13 +1476,13 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
> lun, NULL, NULL, rescan, NULL);
> if (res == SCSI_SCAN_NO_RESPONSE) {
> /*
> - * Got some results, but now none, abort.
> + * Got some results, but now none, abort this lun
abort => skip ?
> */
> sdev_printk(KERN_ERR, sdev,
> "Unexpected response"
> " from lun %llu while scanning, scan"
> " aborted\n", (unsigned long long)lun);
That message would no longer be correct with your change, as it would not abort
the scan any more.
> - break;
> + continue;
> }
> }
> }
Wouldn't this change existing semantics for LLDDs intentionally returning
-ENXIO from their slave_alloc() callback in certain cases?:
> static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
...
> if (shost->hostt->slave_alloc) {
> ret = shost->hostt->slave_alloc(sdev);
> if (ret) {
> /*
> * if LLDD reports slave not present, don't clutter
> * console with alloc failure messages
> */
> if (ret == -ENXIO)
> display_failure_msg = 0;
> goto out_device_destroy;
...
> out_device_destroy:
> __scsi_remove_device(sdev);
> out:
> if (display_failure_msg)
> printk(ALLOC_FAILURE_MSG, __func__);
> return NULL;
scsi_probe_and_add_lun() [such as called by scsi_report_lun_scan() for the case
at hand] converts this case into a SCSI_SCAN_NO_RESPONSE return value.
> static int scsi_probe_and_add_lun(struct scsi_target *starget,
...
> int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
...
> sdev = scsi_alloc_sdev(starget, lun, hostdata);
> if (!sdev)
> goto out;
...
> out:
> return res;
Such as being used by zfcp:
> static int zfcp_scsi_slave_alloc(struct scsi_device *sdev)
> {
...
> unit = zfcp_unit_find(port, zfcp_scsi_dev_lun(sdev));
> if (unit)
> put_device(&unit->dev);
>
> if (!unit && !(allow_lun_scan && npiv)) {
> put_device(&port->dev);
> return -ENXIO;
^^^^^^
which implements an initiator-based LUN masking that is necessary for shared
HBAs virtualized without NPIV.
https://www.ibm.com/docs/en/linux-on-systems?topic=devices-manually-configured-fcp-luns
While things might still work, as zfcp now "just" gets (much) more callbacks to
slave_alloc() it has to end with -ENXIO, the user may get flooded with the
error(!) sdev_printk on "Unexpected response from LUN ..." in
scsi_report_lun_scan().
In the worst case, we could get this message now 64k - 1 times in a zfcp
scenario connected to IBM DS8000 storage being able to map (all) 64k volumes to
a single initiator (HBA), where the user via zfcp sysfs decided to use only the
first lun reported (for the vHBA).
Other LLLDs also seem to intentionally return -ENXIO from slave_alloc()
callbacks, such as but not limited to lpfc or qla2xxx:
> int fc_slave_alloc(struct scsi_device *sdev)
> {
> struct fc_rport *rport = starget_to_rport(scsi_target(sdev));
>
> if (!rport || fc_remote_port_chkready(rport))
> return -ENXIO;
> static int
> qla2xxx_slave_alloc(struct scsi_device *sdev)
> {
> struct fc_rport *rport = starget_to_rport(scsi_target(sdev));
>
> if (!rport || fc_remote_port_chkready(rport))
> return -ENXIO;
--
Mit freundlichen Gruessen / Kind regards
Steffen Maier
Linux on IBM Z and LinuxONE
https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Powered by blists - more mailing lists