lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN6PR04MB4640F34CAA25B3CB58F94CABFC630@SN6PR04MB4640.namprd04.prod.outlook.com>
Date:   Sun, 12 Jul 2020 10:04:55 +0000
From:   Avri Altman <Avri.Altman@....com>
To:     Stanley Chu <stanley.chu@...iatek.com>
CC:     "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "alim.akhtar@...sung.com" <alim.akhtar@...sung.com>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "bvanassche@....org" <bvanassche@....org>,
        "beanhuo@...ron.com" <beanhuo@...ron.com>,
        "asutoshd@...eaurora.org" <asutoshd@...eaurora.org>,
        "cang@...eaurora.org" <cang@...eaurora.org>,
        "matthias.bgg@...il.com" <matthias.bgg@...il.com>,
        "linux-mediatek@...ts.infradead.org" 
        <linux-mediatek@...ts.infradead.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kuohong.wang@...iatek.com" <kuohong.wang@...iatek.com>,
        "peter.wang@...iatek.com" <peter.wang@...iatek.com>,
        "chun-hung.wu@...iatek.com" <chun-hung.wu@...iatek.com>,
        "andy.teng@...iatek.com" <andy.teng@...iatek.com>,
        "chaotian.jing@...iatek.com" <chaotian.jing@...iatek.com>,
        "cc.chou@...iatek.com" <cc.chou@...iatek.com>
Subject: RE: [PATCH v3] scsi: ufs: Cleanup completed request without interrupt
 notification



> 
> Hi Avri,
> 
> On Thu, 2020-07-09 at 08:31 +0000, Avri Altman wrote:
> > >
> > > If somehow no interrupt notification is raised for a completed request
> > > and its doorbell bit is cleared by host, UFS driver needs to cleanup
> > > its outstanding bit in ufshcd_abort().
> > Theoretically, this case is already accounted for -
> > See line 6407: a proper error is issued and eventually outstanding req is
> cleared.
> >
> > Can you go over the scenario you are attending line by line,
> > And explain why ufshcd_abort does not account for it?
> 
> Sure.
> 
> If a request using tag N is completed by UFS device without interrupt
> notification till timeout happens, ufshcd_abort() will be invoked.
> 
> Since request completion flow is not executed, current status may be
> 
> - Tag N in hba->outstanding_reqs is set
> - Tag N in doorbell register is not set
> 
> In this case, ufshcd_abort() flow would be
> 
> - This log is printed: "ufshcd_abort: cmd was completed, but without a
> notifying intr, tag = N"
> - This log is printed: "ufshcd_abort: Device abort task at tag N"
> - If hba->req_abort_skip is zero, QUERY_TASK command is sent
> - Device responds "UPIU_TASK_MANAGEMENT_FUNC_COMPL"
> - This log is printed: "ufshcd_abort: cmd at tag N not pending in the
> device."
> - Doorbell tells that tag N is not set, so the driver goes to label
> "out" with this log printed: "ufshcd_abort: cmd at tag %d successfully
> cleared from DB."
> - In label "out" section, no cleanup will be made, and then ufshcd_abort
> exits
> - This request will be re-queued to request queue by SCSI timeout
> handler
> 
> Now, Inconsistent state shows-up: A request is "re-queued" but its
> corresponding resource in UFS layer is not cleared, below flow will
> trigger bad things,
> 
> - A new request with tag M is finished
> - Interrupt is raised and ufshcd_transfer_req_compl() found both tag N
> and M can process the completion flow
> - The post-processing flow for tag N will be executed while its request
> is still alive
> 
> I am sorry that below messages are only for old kernel in non-blk-mq
> case. However above scenario will also trigger bad thing in blk-mq case.

Ok.  Thanks.

> 
> >
> > >
> > > Otherwise, system may crash by below abnormal flow:
> > >
> > > After this request is requeued by SCSI layer with its
> > > outstanding bit set, the next completed request will trigger
> > > ufshcd_transfer_req_compl() to handle all "completed outstanding
> > > bits". In this time, the "abnormal outstanding bit" will be detected
> > > and the "requeued request" will be chosen to execute request
> > > post-processing flow. This is wrong and blk_finish_request() will
> > > BUG_ON because this request is still "alive".
> > >
> > > It is worth mentioning that before ufshcd_abort() cleans the timed-out
> > > request, driver need to check again if this request is really not
> > > handled by __ufshcd_transfer_req_compl() yet because it may be
> > > possible that the interrupt comes very lately before the cleaning.
> > What do you mean? Why checking the outstanding reqs isn't enough?
> >
> > >
> > > Signed-off-by: Stanley Chu <stanley.chu@...iatek.com>
> > > ---
> > >  drivers/scsi/ufs/ufshcd.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > index 8603b07045a6..f23fb14df9f6 100644
> > > --- a/drivers/scsi/ufs/ufshcd.c
> > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > @@ -6462,7 +6462,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
> > >                         /* command completed already */
> > >                         dev_err(hba->dev, "%s: cmd at tag %d successfully cleared
> from
> > > DB.\n",
> > >                                 __func__, tag);
> > > -                       goto out;
> > > +                       goto cleanup;
> > But you've arrived here only if (!(test_bit(tag, &hba->outstanding_reqs))) -
> > See line 6400.
> >
> > >                 } else {
> > >                         dev_err(hba->dev,
> > >                                 "%s: no response from device. tag = %d, err %d\n",
> > > @@ -6496,9 +6496,14 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
> > >                 goto out;
> > >         }
> > >
> > > +cleanup:
> > > +       spin_lock_irqsave(host->host_lock, flags);
> > > +       if (!test_bit(tag, &hba->outstanding_reqs)) {
Is this needed?  it was already checked in line 6439.

Thanks,
Avri

> > > +               spin_unlock_irqrestore(host->host_lock, flags);
> > > +               goto out;
> > > +       }
> > >         scsi_dma_unmap(cmd);
> > >
> > > -       spin_lock_irqsave(host->host_lock, flags);
> > >         ufshcd_outstanding_req_clear(hba, tag);
> > >         hba->lrb[tag].cmd = NULL;
> > >         spin_unlock_irqrestore(host->host_lock, flags);
> > > --
> > > 2.18.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ