linux-kernel - Re: [PATCH] scsi: libiscsi: Set expecting_cc_ua flag when stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ead203fc-abf5-49b1-b34c-64b97d3fecd6@oracle.com>
Date: Fri, 11 Oct 2024 09:48:45 -0500
From: michael.christie@...cle.com
To: Xiang Zhang <hawkxiang.cpp@...il.com>, lduncan@...e.com, cleech@...hat.com,
        ames.Bottomley@...senPartnership.com, martin.petersen@...cle.com,
        james.smart@...adcom.com, ram.vegesna@...adcom.com,
        njavali@...vell.com
Cc: open-iscsi@...glegroups.com, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] scsi: libiscsi: Set expecting_cc_ua flag when stop_conn

CC'ing the fibre channel experts because they might have the same issue.

On 10/11/24 3:18 AM, Xiang Zhang wrote:
> Initiator need to recover session and reconnect to target, after calling stop_conn. And target will rebuild new session info, and mark ASC_POWERON_RESET ua sense for scsi devices belong to the target(device reset). After recovery, first scsi command(scmd) request to target will get ASC_POWERON_RESET(ua sense) + SAM_STAT_CHECK_CONDITION(status) in response.
> According to scsi code: "scsi_done --> scsi_complete --> scsi_decide_disposition --> scsi_check_sense", if expecting_cc_ua = 0, scmd response with ASC_POWERON_RESET(ua sense) will ignore "cmd->retries <= cmd->allowed", fail directly. It will cause SCSI return io_error to upper layer without retry.

Just want to make sure I understand the problem.

Does the failure only happen with tape or passthrough or if removable is
set?

For commands coming from sd, then scsi_io_completion will end up calling
scsi_io_completion_action and seeing the UNIT_ATTENTION and will retry.
I'm not saying we shouldn't do a fix like you did below. Just want to
make sure I understand the case you describe above.

> If we set expecting_cc_ua=1 in fail_scsi_tasks, SISC will retry the scmd which is response with ASC_POWERON_RESET. The scmd second request to target can successful, because target will clear ASC_POWERON_RESET in device pending ua_sense_list after first scmd request.

What does "SISC" stand for?

> 
> Signed-off-by: Xiang Zhang <hawkxiang.cpp@...il.com>
> ---
>  drivers/scsi/libiscsi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 0fda8905eabd..317e57be32b3 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -629,9 +629,10 @@ static void __fail_scsi_task(struct iscsi_task *task, int err)
>  		conn->session->queued_cmdsn--;
>  		/* it was never sent so just complete like normal */
>  		state = ISCSI_TASK_COMPLETED;
> -	} else if (err == DID_TRANSPORT_DISRUPTED)
> +	} else if (err == DID_TRANSPORT_DISRUPTED) {
>  		state = ISCSI_TASK_ABRT_SESS_RECOV;
> -	else
> +		sc->device->expecting_cc_ua = 1;

The failure case can happen with other transports like fibre channel
right? If it's common I think we want this in the core scsi code.

For iscsi, we want to set expecting_cc_ua whenever we call
scsi_block_targets() or whenever we return DID_TRANSPORT_DISRUPTED or
DID_TRANSPORT_FAILFAST.

FC developers, I'm not sure if that's the case for you. For example if
your driver called fc_remote_port_delete -> scsi_block_targets but then
the issue is resolved quickly, like for a quick cable pull, and you
called fc_remote_port_add, could there be cases where you did not get a
I_T Nexus loss/reset type of issue?

Or is it the case where anytime a fc driver calls fc_remote_port_delete
then you will expect a UA after calling fc_remote_port_add again?