linux-kernel - Re: [PATCH 2/2][RESEND v3] PM-runtime: change the tracepoints to cover all usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200716014946.GB4588@chenyu-office.sh.intel.com>
Date:   Thu, 16 Jul 2020 09:49:46 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Miroslaw <mirq-linux@...e.qmqm.pl>,
        Zhang Rui <rui.zhang@...el.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2][RESEND v3] PM-runtime: change the tracepoints to
 cover all usage_count

On Wed, Jul 15, 2020 at 05:47:36PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jul 15, 2020 at 8:26 AM Chen Yu <yu.c.chen@...el.com> wrote:
> >
> > Commit d229290689ae ("PM-runtime: add tracepoints for usage_count changes")
> > has added some tracepoints to monitor the change of runtime usage, and
> > there is something to improve:
> > 1. There are some places that adjust the usage count not
> >    been traced yet. For example, pm_runtime_get_noresume() and
> >    pm_runtime_put_noidle()
> > 2. The change of the usage count will not be tracked if decreased
> >    from 1 to 0.
> >
> > This patch intends to adjust the logic to be consistent with the
> > change of usage_counter, that is to say, only after the counter has
> > been possibly modified, we record it. Besides, all usage changes will
> > be shown using rpm_usage even if included by other trace points.
> > And these changes has helped track down the e1000e runtime issue.
> >
> > Reviewed-by: Michał Mirosław <mirq-linux@...e.qmqm.pl>
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > ---
> >  drivers/base/power/runtime.c | 38 +++++++++++++++++++++++-------------
> >  1 file changed, 24 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 85a248e196ca..5789d2624513 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -1004,10 +1004,11 @@ int __pm_runtime_idle(struct device *dev, int rpmflags)
> >         int retval;
> >
> >         if (rpmflags & RPM_GET_PUT) {
> > -               if (!atomic_dec_and_test(&dev->power.usage_count)) {
> > -                       trace_rpm_usage_rcuidle(dev, rpmflags);
> > +               bool non_zero = !atomic_dec_and_test(&dev->power.usage_count);
> > +
> > +               trace_rpm_usage_rcuidle(dev, rpmflags);
> 
> It looks like you could move the trace event before the atomic variable check.
> 
> The ordering between the two doesn't matter, because usage_count may
> change between the check and the trace event anyway.
> 
> But then what is the trace event useful for in the first place?
>
Thanks for explanation, I've changed my mind, it seems that we should not
trace the counter because we don't know who the operator is due to race condition.
> > +               if (non_zero)
> >                         return 0;
> > -               }
> >         }
> >
> >         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
> > @@ -1038,10 +1039,12 @@ int __pm_runtime_suspend(struct device *dev, int rpmflags)
> >         int retval;
> >
> >         if (rpmflags & RPM_GET_PUT) {
> > -               if (!atomic_dec_and_test(&dev->power.usage_count)) {
> > -                       trace_rpm_usage_rcuidle(dev, rpmflags);
> > +               bool non_zero = !atomic_dec_and_test(&dev->power.usage_count);
> > +
> > +               trace_rpm_usage_rcuidle(dev, rpmflags);
> 
> And the same comments apply here.
> 
> > +               if (non_zero)
> >                         return 0;
> > -               }
> > +
> >         }
> >
> >         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
> > @@ -1073,8 +1076,10 @@ int __pm_runtime_resume(struct device *dev, int rpmflags)
> >         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe &&
> >                         dev->power.runtime_status != RPM_ACTIVE);
> >
> > -       if (rpmflags & RPM_GET_PUT)
> > +       if (rpmflags & RPM_GET_PUT) {
> >                 atomic_inc(&dev->power.usage_count);
> 
> So the reason why things like that don't work is because the atomic
> variable can change again between the inc and the trace event.
> 
Got it.
> > +               trace_rpm_usage_rcuidle(dev, rpmflags);
> > +       }
> >
> >         spin_lock_irqsave(&dev->power.lock, flags);
> >         retval = rpm_resume(dev, rpmflags);
> > @@ -1433,6 +1438,7 @@ void pm_runtime_forbid(struct device *dev)
> >
> >         dev->power.runtime_auto = false;
> >         atomic_inc(&dev->power.usage_count);
> 
> Analogously here.
> 
> > +       trace_rpm_usage_rcuidle(dev, 0);
> >         rpm_resume(dev, 0);
> >
> >   out:
> > @@ -1448,16 +1454,17 @@ EXPORT_SYMBOL_GPL(pm_runtime_forbid);
> >   */
> >  void pm_runtime_allow(struct device *dev)
> >  {
> > +       bool is_zero;
> > +
> >         spin_lock_irq(&dev->power.lock);
> >         if (dev->power.runtime_auto)
> >                 goto out;
> >
> >         dev->power.runtime_auto = true;
> > -       if (atomic_dec_and_test(&dev->power.usage_count))
> > +       is_zero = atomic_dec_and_test(&dev->power.usage_count);
> > +       trace_rpm_usage_rcuidle(dev, RPM_AUTO | RPM_ASYNC);
> > +       if (is_zero)
> >                 rpm_idle(dev, RPM_AUTO | RPM_ASYNC);
> > -       else
> > -               trace_rpm_usage_rcuidle(dev, RPM_AUTO | RPM_ASYNC);
> 
> The change of ordering is pointless for the reasons outlined above.
> 
> And so on.
> 
> > -
> >   out:
> >         spin_unlock_irq(&dev->power.lock);
> >  }
> > @@ -1523,9 +1530,8 @@ static void update_autosuspend(struct device *dev, int old_delay, int old_use)
> >                 /* If it used to be allowed then prevent it. */
> >                 if (!old_use || old_delay >= 0) {
> >                         atomic_inc(&dev->power.usage_count);
> > -                       rpm_resume(dev, 0);
> > -               } else {
> >                         trace_rpm_usage_rcuidle(dev, 0);
> > +                       rpm_resume(dev, 0);
> >                 }
> >         }
> >
> > @@ -1533,8 +1539,10 @@ static void update_autosuspend(struct device *dev, int old_delay, int old_use)
> >         else {
> >
> >                 /* If it used to be prevented then allow it. */
> > -               if (old_use && old_delay < 0)
> > +               if (old_use && old_delay < 0) {
> >                         atomic_dec(&dev->power.usage_count);
> > +                       trace_rpm_usage_rcuidle(dev, 0);
> > +               }
> >
> >                 /* Maybe we can autosuspend now. */
> >                 rpm_idle(dev, RPM_AUTO);
> > @@ -1741,12 +1749,14 @@ void pm_runtime_drop_link(struct device *dev)
> >  void pm_runtime_get_noresume(struct device *dev)
> >  {
> >         atomic_inc(&dev->power.usage_count);
> > +       trace_rpm_usage_rcuidle(dev, 0);
> >  }
> 
> This actually kind of makes sense, as a matter of tracing the
> pm_runtime_get_noresume() usage, but not as a matter of tracing the
> atomic variable value.
> 
Okay, I'll re-iterate the code and re-think about it on how to
track the runtime process more robustly.

Thanks,
Chenyu
> >  EXPORT_SYMBOL_GPL(pm_runtime_get_noresume);
> >
> >  void pm_runtime_put_noidle(struct device *dev)
> >  {
> >         atomic_add_unless(&dev->power.usage_count, -1, 0);
> > +       trace_rpm_usage_rcuidle(dev, 0);
> >  }
> >  EXPORT_SYMBOL_GPL(pm_runtime_put_noidle);
> >
> > --