[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SJ0PR11MB5896CBD4F94402C6615520A0C3212@SJ0PR11MB5896.namprd11.prod.outlook.com>
Date: Wed, 20 Nov 2024 00:03:42 +0000
From: "Karan Tilak Kumar (kartilak)" <kartilak@...co.com>
To: Hannes Reinecke <hare@...e.de>, "Sesidhar Baddela (sebaddel)"
<sebaddel@...co.com>
CC: "Arulprabhu Ponnusamy (arulponn)" <arulponn@...co.com>, "Dhanraj Jhawar
(djhawar)" <djhawar@...co.com>, "Gian Carlo Boffa (gcboffa)"
<gcboffa@...co.com>, "Masa Kai (mkai2)" <mkai2@...co.com>, "Satish Kharat
(satishkh)" <satishkh@...co.com>, "Arun Easi (aeasi)" <aeasi@...co.com>,
"jejb@...ux.ibm.com" <jejb@...ux.ibm.com>, "martin.petersen@...cle.com"
<martin.petersen@...cle.com>, "linux-scsi@...r.kernel.org"
<linux-scsi@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v5 09/14] scsi: fnic: Modify IO path to use FDLS
On Monday, October 28, 2024 5:55 PM, Karan Tilak Kumar (kartilak) wrote:
>
> On Thursday, October 24, 2024 12:05 AM, Hannes Reinecke <hare@...e.de> wrote:
> >
> > On 10/18/24 18:14, Karan Tilak Kumar wrote:
> > > Modify IO path to use FDLS.
> > > Add helper functions to process IOs.
> > > Remove unused template functions.
> > > Cleanup obsolete code.
> > > Refactor old function definitions.
> > >
> > > Reviewed-by: Sesidhar Baddela <sebaddel@...co.com>
> > > Reviewed-by: Arulprabhu Ponnusamy <arulponn@...co.com>
> > > Reviewed-by: Gian Carlo Boffa <gcboffa@...co.com>
> > > Reviewed-by: Arun Easi <aeasi@...co.com>
> > > Signed-off-by: Karan Tilak Kumar <kartilak@...co.com>
> > > @@ -2274,11 +2494,11 @@ static int fnic_clean_pending_aborts(struct fnic *fnic,
> > > int fnic_device_reset(struct scsi_cmnd *sc)
> > > {
> > > struct request *rq = scsi_cmd_to_rq(sc);
> > > - struct fc_lport *lp;
> > > struct fnic *fnic;
> > > struct fnic_io_req *io_req = NULL;
> > > struct fc_rport *rport;
> > > int status;
> > > + int count = 0;
> > > int ret = FAILED;
> > > unsigned long flags;
> > > unsigned long start_time = 0;
> > > @@ -2289,31 +2509,61 @@ int fnic_device_reset(struct scsi_cmnd *sc)
> > > DECLARE_COMPLETION_ONSTACK(tm_done);
> > > bool new_sc = 0;
> > > uint16_t hwq = 0;
> > > + struct fnic_iport_s *iport = NULL;
> > > + struct rport_dd_data_s *rdd_data;
> > > + struct fnic_tport_s *tport;
> > > + u32 old_soft_reset_count;
> > > + u32 old_link_down_cnt;
> > > + int exit_dr = 0;
> > >
> > > /* Wait for rport to unblock */
> > > fc_block_scsi_eh(sc);
> > >
> > > /* Get local-port, check ready and link up */
> > > - lp = shost_priv(sc->device->host);
> > > + fnic = *((struct fnic **) shost_priv(sc->device->host));
> > > + iport = &fnic->iport;
> > >
> > > - fnic = lport_priv(lp);
> > > fnic_stats = &fnic->fnic_stats;
> > > reset_stats = &fnic->fnic_stats.reset_stats;
> > >
> > > atomic64_inc(&reset_stats->device_resets);
> > >
> > > rport = starget_to_rport(scsi_target(sc->device));
> > > +
> > > + spin_lock_irqsave(&fnic->fnic_lock, flags);
> > > FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host, fnic->fnic_num,
> > > - "fcid: 0x%x lun: 0x%llx hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n",
> > > + "fcid: 0x%x lun: %llu hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n",
> > > rport->port_id, sc->device->lun, hwq, mqtag,
> > > fnic_priv(sc)->flags);
> > >
> > > - if (lp->state != LPORT_ST_READY || !(lp->link_up))
> > > + rdd_data = rport->dd_data;
> > > + tport = rdd_data->tport;
> > > + if (!tport) {
> > > + FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num,
> > > + "Dev rst called after tport delete! rport fcid: 0x%x lun: %llu\n",
> > > + rport->port_id, sc->device->lun);
> > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > > + goto fnic_device_reset_end;
> > > + }
> > > +
> > > + if (iport->state != FNIC_IPORT_STATE_READY) {
> > > + FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host, fnic->fnic_num,
> > > + "iport NOT in READY state");
> > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > > goto fnic_device_reset_end;
> > > + }
> > > +
> > > + if ((tport->state != FDLS_TGT_STATE_READY) &&
> > > + (tport->state != FDLS_TGT_STATE_ADISC)) {
> > > + FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num,
> > > + "tport state: %d\n", tport->state);
> > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > > + goto fnic_device_reset_end;
> > > + }
> > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > >
> > Please check if returning FAST_IO_FAIL here wouldn't be a better option.
> > Most of the time a device reset is triggered by a command timeout, which
> > typically happens due to a transport issue (eg link is down or something).
> > So returning 'FAILED' will just escalate to host reset, and that can
> > take for a really long time trying to abort all commands.
>
> Thanks for your insights Hannes.
> We're investing this currently.
>
Thanks again for this comment, Hannes.
We reviewed this internally and as we understand it,
returning a FAST_IO_FAIL status would complete the scsi_cmnd.
This would cause issues if the IO is still active in firmware.
We do take care of cleaning up such I/Os in the higher error
escalation path (host reset) though, which would be induced
by returning the 'FAILED' return status here.
Regards,
Karan
Powered by blists - more mailing lists