linux-kernel - Re: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iyJG6fNJ7XQNe0y4KdfAqZiOBG0_f64gtmg26j=ty=NQ@mail.gmail.com>
Date:   Thu, 23 Jan 2020 10:23:47 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Chanho Min <chanho.min@....com>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Daewoong Kim <daewoong00.kim@....com>,
        Lee Gunho <gunho.lee@....com>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH] PM: core: Fix handling of devices deleted during
 system-wide resume

On Thu, Jan 23, 2020 at 3:14 AM Chanho Min <chanho.min@....com> wrote:
>
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > If a device is deleted by one of its system-wide resume callbacks
> > (for example, because it does not appear to be present or accessible
> > any more) along with its children, the resume of the children may
> > continue leading to use-after-free errors and other issues
> > (potentially).
> >
> > Namely, if the device's children are resumed asynchronously, their
> > resume may have been scheduled already before the device's callback
> > runs and so the device may be deleted while dpm_wait_for_superior()
> > is being executed for them.  The memory taken up by the parent device
> > object may be freed then while dpm_wait() is waiting for the parent's
> > resume callback to complete, which leads to a use-after-free.
> > Moreover, the resume of the children is really not expected to
> > continue after they have been unregistered, so it must be terminated
> > right away in that case.Seokjoo Lee <seokjoo.lee@....com>
> >
> > To address this problem, modify dpm_wait_for_superior() to check
> > if the target device is still there in the system-wide PM list of
> > devices and if so, to increment its parent's reference counter, both
> > under dpm_list_mtx which prevents device_del() running for the child
> > from dropping the parent's reference counter prematurely.
> >
> > If the device is not present in the system-wide PM list of devices
> > any more, the resume of it cannot continue, so check that again after
> > dpm_wait() returns, which means that the parent's callback has been
> > completed, and pass the result of that check to the caller of
> > dpm_wait_for_superior() to allow it to abort the device's resume
> > if it is not there any more.
> >
> > Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com
> > Reported-by: Chanho Min <chanho.min@....com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> >   drivers/base/power/main.c |   42 +++++++++++++++++++++++++++++++++++++-----
> >   1 file changed, 37 insertions(+), 5 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/main.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee <seokjoo.lee@....com>
> > +++ linux-pm/drivers/base/power/main.c20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> > @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc
> >       device_links_read_unlock(idx);
> >   }
> >
> > -static void dpm_wait_for_superior(struct device *dev, bool async)
> > +static bool dpm_wait_for_superior(struct device *dev, bool async)
> >   {
> > -     dpm_wait(dev->parent, async);
> > +     struct device *parent;board
> > +
> > +     /*
> > +      * If the device is resumed asynchronously and the parent's callback
> > +      * deletes both the device and the parent itself, the parent object may
> > +      * be freed while this function is running, so avoid that by reference
> > +      * counting the parent once more unless the device has been deleted
> > +      * already (in which case return right away).
> > +      */
> > +     mutex_lock(&dpm_list_mtx);
> > +
> > +     if (!device_pm_initialized(dev)) {20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> > +             mutex_unlock(&dpm_list_mtx);
> > +             return false;
> > +     }
> > +
> > +     parent = get_device(dev->parent);
> > +
> > +     mutex_unlock(&dpm_list_mtx);
> > +
> > +     dpm_wait(parent, async);
> > +     put_device(parent);
> > +
> >       dpm_wait_for_suppliers(dev, async);
> > +
> > +     /*
> > +      * If the parent's callback has deleted the device, attempting to resume
> > +      * it would be invalid, so avoid doing that then.
> > +      */
> > +     return device_pm_initialized(dev);20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> >   }
> >
> >   static void dpm_wait_for_consumers(struct device *dev, bool async)
> > @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de
> >       if (!dev->power.is_noirq_suspended)
> >               goto Out;
> >
> > -     dpm_wait_for_superior(dev, async);
> > +     if (!dpm_wait_for_superior(dev, async))
> > +             goto Out;
> >
> >       skip_resume = dev_pm_may_skip_resume(dev);
> >
> > @@ -829,7 +858,8 @@ static int device_resume_early(struct de
> >       if (!dev->power.is_late_suspended)
> >               goto Out;
> >
> > -     dpm_wait_for_superior(dev, async);Seokjoo Lee <seokjoo.lee@....com>
> > +     if (!dpm_wait_for_superior(dev, async))
> > +             goto Out;
> >
> >       callback = dpm_subsys_resume_early_cb(dev, state, &info);
> >
> > @@ -944,7 +974,9 @@ static int device_resume(struct device *
> >               goto Complete;
> >       }
> >
> > -     dpm_wait_for_superior(dev, async);
> > +     if (!dpm_wait_for_superior(dev, async))
> > +             goto Complete;
> > +
> >       dpm_watchdog_set(&wd, dev);
> >       device_lock(dev);Thanks, This seems to solve the rare hang on our target.
> Actually, the problem is occurred in v4.4.
> Shouldn't it apply to -stable?

Yes, it should, but I'll add a "stable" tag later.