[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CO6PR18MB44199B7F8AABE14ED404CF61D8879@CO6PR18MB4419.namprd18.prod.outlook.com>
Date: Thu, 23 Mar 2023 10:17:25 +0000
From: Manish Rangankar <mrangankar@...vell.com>
To: Zheng Hacker <hackerzheng666@...il.com>,
Mike Christie <michael.christie@...cle.com>
CC: Zheng Wang <zyytlz.wz@....com>,
Nilesh Javali <njavali@...vell.com>,
GR-QLogic-Storage-Upstream <GR-QLogic-Storage-Upstream@...vell.com>,
"jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
"martin.petersen@...cle.com" <martin.petersen@...cle.com>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"1395428693sheep@...il.com" <1395428693sheep@...il.com>,
"alex000young@...il.com" <alex000young@...il.com>
Subject: RE: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
qedi_remove due to race condition
> -----Original Message-----
> From: Zheng Hacker <hackerzheng666@...il.com>
> Sent: Thursday, March 23, 2023 9:15 AM
> To: Mike Christie <michael.christie@...cle.com>
> Cc: Zheng Wang <zyytlz.wz@....com>; Nilesh Javali <njavali@...vell.com>;
> Manish Rangankar <mrangankar@...vell.com>; GR-QLogic-Storage-
> Upstream <GR-QLogic-Storage-Upstream@...vell.com>;
> jejb@...ux.ibm.com; martin.petersen@...cle.com; linux-
> scsi@...r.kernel.org; linux-kernel@...r.kernel.org;
> 1395428693sheep@...il.com; alex000young@...il.com
> Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> qedi_remove due to race condition
>
> External Email
>
> ----------------------------------------------------------------------
> Mike Christie <michael.christie@...cle.com> 于2023年3月21日周二 00:11写
> 道:
> >
> > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > In qedi_probe, it calls __qedi_probe, which bound
> > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > &qedi->board_disable_work with qedi_board_disable_work.
> > >
> > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > schedule_delayed_work to start the work.
> > >
> > > When we call qedi_remove to remove the driver, there may be a
> > > sequence as follows:
> > >
> > > Fix it by finishing the work before cleanup in qedi_remove.
> > >
> > > CPU0 CPU1
> > >
> > > |qedi_recovery_handler
> > > qedi_remove |
> > > __qedi_remove |
> > > iscsi_host_free |
> > > scsi_host_put |
> > > //free shost |
> > > |iscsi_host_for_each_session
> > > |//use qedi->shost
> > >
> > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > Signed-off-by: Zheng Wang <zyytlz.wz@....com>
> > > ---
> > > drivers/scsi/qedi/qedi_main.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > 100644
> > > --- a/drivers/scsi/qedi/qedi_main.c
> > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> *pdev, int mode)
> > > int rval;
> > > u16 retry = 10;
> > >
> > > + /*cancel work*/
> >
> > This comment is not needed. The name of the functions you are calling
> > have "cancel" and "work" in them so we know. If you want to add a
> > comment explain why the cancel calls are needed here.
> >
>
> Hi,
>
> Sorry for my late reply and thanks for your advice. Will remove it in the next
> version of patch.
>
> >
> > > + cancel_delayed_work_sync(&qedi->recovery_work);
> > > + cancel_delayed_work_sync(&qedi->board_disable_work);
> >
> >
> > How do you know after you have called cancel_delayed_work_sync that
> > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > I don't know the qed driver well, but it looks like you could have
> > operations still running, so after you cancel here one of those ops
> > could lead to them scheduling the work again.
> >
>
> Sorry I didn't know how to make sure there's no more schedule. But I do
> think this is important. Maybe there're someone else who can give us advice.
>
> Best regards,
> Zheng
> >
Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and
qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.
Thanks,
Manish
Powered by blists - more mailing lists