lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAd53p7H2C+LenmzynRHqnxd-qgc2+p9ZLKidVUOAtXMyU2rmQ@mail.gmail.com>
Date:   Thu, 1 Jun 2023 23:55:21 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Damien Le Moal <dlemoal@...nel.org>
Cc:     jejb@...ux.ibm.com, martin.petersen@...cle.com,
        bblock@...ux.ibm.com, acelan.kao@...onical.com,
        linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/2] ata: libata: Defer rescan on suspended device

On Sun, May 7, 2023 at 5:23 PM Damien Le Moal <dlemoal@...nel.org> wrote:
>
> On 2023/05/03 0:04, Kai-Heng Feng wrote:
> > During system resume, if an EH is schduled after ATA host is resumed
> > (i.e. ATA_PFLAG_PM_PENDING cleared), but before the disk device is
> > fully resumed, the device_lock hold by scsi_rescan_device() is never
> > released so the dpm_resume() of the disk is blocked forerver.
> >
> > That's because scsi_attach_vpd() is expecting the disk device is in
> > operational state, as it doesn't work on suspended device.
> >
> > To avoid such deadlock, defer rescan if the disk is still suspended so
> > the resume process of the disk device can proceed. At the end of the
> > resume process, use the complete() callback to schedule the rescan task.
> >
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> > ---
> > v4:
> >  - No change.
> >
> > v3:
> >  - New patch to resolve undefined pm_suspend_target_state.
> >
> > v2:
> >  - Schedule rescan task at the end of system resume phase.
> >  - Wording.
> >
> >  drivers/ata/libata-core.c | 11 +++++++++++
> >  drivers/ata/libata-eh.c   | 11 +++++++++--
> >  include/linux/libata.h    |  1 +
> >  3 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > index 8bf612bdd61a..bdd244bdb8a2 100644
> > --- a/drivers/ata/libata-core.c
> > +++ b/drivers/ata/libata-core.c
> > @@ -5093,6 +5093,16 @@ static int ata_port_pm_poweroff(struct device *dev)
> >       return 0;
> >  }
> >
> > +static void ata_port_pm_complete(struct device *dev)
> > +{
> > +     struct ata_port *ap = to_ata_port(dev);
> > +
> > +     if (ap->pflags & ATA_PFLAG_DEFER_RESCAN)
> > +             schedule_work(&(ap->scsi_rescan_task));
> > +
> > +     ap->pflags &= ~ATA_PFLAG_DEFER_RESCAN;
>
> Is this called with the port lock held ? Otherwise, there is a race with
> ata_eh_revalidate_and_attach() and we may end up never actually revalidating the
> drive. At the very least, I think that ATA_PFLAG_DEFER_RESCAN needs to be
> cleared before calling schedule_work().

OK. Maybe all of these are unnecessary if the rescanning can wait for
disk device to be resumed.

>
> > +}
> > +
> >  static const unsigned int ata_port_resume_ehi = ATA_EHI_NO_AUTOPSY
> >                                               | ATA_EHI_QUIET;
> >
> > @@ -5158,6 +5168,7 @@ static const struct dev_pm_ops ata_port_pm_ops = {
> >       .thaw = ata_port_pm_resume,
> >       .poweroff = ata_port_pm_poweroff,
> >       .restore = ata_port_pm_resume,
> > +     .complete = ata_port_pm_complete,
> >
> >       .runtime_suspend = ata_port_runtime_suspend,
> >       .runtime_resume = ata_port_runtime_resume,
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index a6c901811802..0881b590fb7e 100644
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/blkdev.h>
> >  #include <linux/export.h>
> >  #include <linux/pci.h>
> > +#include <linux/suspend.h>
> >  #include <scsi/scsi.h>
> >  #include <scsi/scsi_host.h>
> >  #include <scsi/scsi_eh.h>
> > @@ -2983,8 +2984,14 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
> >                        */
> >                       ehc->i.flags |= ATA_EHI_SETMODE;
> >
> > -                     /* schedule the scsi_rescan_device() here */
> > -                     schedule_work(&(ap->scsi_rescan_task));
> > +                     /* Schedule the scsi_rescan_device() here.
>
> Code style: please start multi-line comment with a line starting with "/*"
> without text after it.

OK. Will do.

>
> > +                      * Defer the rescan if it's in process of
> > +                      * suspending or resuming.
> > +                      */
> > +                     if (pm_suspend_target_state != PM_SUSPEND_ON)
>
> Why ? Shouldn't this be "pm_suspend_target_state == PM_SUSPEND_ON" ? Because if
> the device is already resumed, why would we need to defer the rescan ?

"pm_suspend_target_state != PM_SUSPEND_ON" means the system is not
"ON" state. For this particular issue, it means the system is still in
resume process.

>
> > +                             ap->pflags |= ATA_PFLAG_DEFER_RESCAN;
> > +                     else
> > +                             schedule_work(&(ap->scsi_rescan_task));
> >               } else if (dev->class == ATA_DEV_UNKNOWN &&
> >                          ehc->tries[dev->devno] &&
> >                          ata_class_enabled(ehc->classes[dev->devno])) {
> > diff --git a/include/linux/libata.h b/include/linux/libata.h
> > index 311cd93377c7..1696c9ebd168 100644
> > --- a/include/linux/libata.h
> > +++ b/include/linux/libata.h
> > @@ -189,6 +189,7 @@ enum {
> >       ATA_PFLAG_UNLOADING     = (1 << 9), /* driver is being unloaded */
> >       ATA_PFLAG_UNLOADED      = (1 << 10), /* driver is unloaded */
> >
> > +     ATA_PFLAG_DEFER_RESCAN  = (1 << 16), /* peform deferred rescan on system resume */
>
> Do we really need a new flag ? Can't we use ATA_PFLAG_PM_PENDING correctly ?
> From the rather sparse commit message description, it sounds like this flag is
> being cleared too early. Not sure though. Need to dig further into this.

Defer clearing ATA_PFLAG_PM_PENDING doesn't seem to be trivial.
I'll send a new version which IMO is a lot simpler.

Kai-Heng

>
> >       ATA_PFLAG_SUSPENDED     = (1 << 17), /* port is suspended (power) */
> >       ATA_PFLAG_PM_PENDING    = (1 << 18), /* PM operation pending */
> >       ATA_PFLAG_INIT_GTM_VALID = (1 << 19), /* initial gtm data valid */
>
> --
> Damien Le Moal
> Western Digital Research
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ