linux-kernel - Re: [PATCH 2/2][RESEND v3] PM-runtime: change the tracepoints to cover all usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j9Jb5c7AQT611Zjep+jx+iTqKB8-KbPCKb8SCXMqEiJQ@mail.gmail.com>
Date:   Wed, 15 Jul 2020 17:47:36 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Miroslaw <mirq-linux@...e.qmqm.pl>,
        Zhang Rui <rui.zhang@...el.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2][RESEND v3] PM-runtime: change the tracepoints to
 cover all usage_count

On Wed, Jul 15, 2020 at 8:26 AM Chen Yu <yu.c.chen@...el.com> wrote:
>
> Commit d229290689ae ("PM-runtime: add tracepoints for usage_count changes")
> has added some tracepoints to monitor the change of runtime usage, and
> there is something to improve:
> 1. There are some places that adjust the usage count not
>    been traced yet. For example, pm_runtime_get_noresume() and
>    pm_runtime_put_noidle()
> 2. The change of the usage count will not be tracked if decreased
>    from 1 to 0.
>
> This patch intends to adjust the logic to be consistent with the
> change of usage_counter, that is to say, only after the counter has
> been possibly modified, we record it. Besides, all usage changes will
> be shown using rpm_usage even if included by other trace points.
> And these changes has helped track down the e1000e runtime issue.
>
> Reviewed-by: Michał Mirosław <mirq-linux@...e.qmqm.pl>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> ---
>  drivers/base/power/runtime.c | 38 +++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 85a248e196ca..5789d2624513 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1004,10 +1004,11 @@ int __pm_runtime_idle(struct device *dev, int rpmflags)
>         int retval;
>
>         if (rpmflags & RPM_GET_PUT) {
> -               if (!atomic_dec_and_test(&dev->power.usage_count)) {
> -                       trace_rpm_usage_rcuidle(dev, rpmflags);
> +               bool non_zero = !atomic_dec_and_test(&dev->power.usage_count);
> +
> +               trace_rpm_usage_rcuidle(dev, rpmflags);

It looks like you could move the trace event before the atomic variable check.

The ordering between the two doesn't matter, because usage_count may
change between the check and the trace event anyway.

But then what is the trace event useful for in the first place?

> +               if (non_zero)
>                         return 0;
> -               }
>         }
>
>         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
> @@ -1038,10 +1039,12 @@ int __pm_runtime_suspend(struct device *dev, int rpmflags)
>         int retval;
>
>         if (rpmflags & RPM_GET_PUT) {
> -               if (!atomic_dec_and_test(&dev->power.usage_count)) {
> -                       trace_rpm_usage_rcuidle(dev, rpmflags);
> +               bool non_zero = !atomic_dec_and_test(&dev->power.usage_count);
> +
> +               trace_rpm_usage_rcuidle(dev, rpmflags);

And the same comments apply here.

> +               if (non_zero)
>                         return 0;
> -               }
> +
>         }
>
>         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
> @@ -1073,8 +1076,10 @@ int __pm_runtime_resume(struct device *dev, int rpmflags)
>         might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe &&
>                         dev->power.runtime_status != RPM_ACTIVE);
>
> -       if (rpmflags & RPM_GET_PUT)
> +       if (rpmflags & RPM_GET_PUT) {
>                 atomic_inc(&dev->power.usage_count);

So the reason why things like that don't work is because the atomic
variable can change again between the inc and the trace event.

> +               trace_rpm_usage_rcuidle(dev, rpmflags);
> +       }
>
>         spin_lock_irqsave(&dev->power.lock, flags);
>         retval = rpm_resume(dev, rpmflags);
> @@ -1433,6 +1438,7 @@ void pm_runtime_forbid(struct device *dev)
>
>         dev->power.runtime_auto = false;
>         atomic_inc(&dev->power.usage_count);

Analogously here.

> +       trace_rpm_usage_rcuidle(dev, 0);
>         rpm_resume(dev, 0);
>
>   out:
> @@ -1448,16 +1454,17 @@ EXPORT_SYMBOL_GPL(pm_runtime_forbid);
>   */
>  void pm_runtime_allow(struct device *dev)
>  {
> +       bool is_zero;
> +
>         spin_lock_irq(&dev->power.lock);
>         if (dev->power.runtime_auto)
>                 goto out;
>
>         dev->power.runtime_auto = true;
> -       if (atomic_dec_and_test(&dev->power.usage_count))
> +       is_zero = atomic_dec_and_test(&dev->power.usage_count);
> +       trace_rpm_usage_rcuidle(dev, RPM_AUTO | RPM_ASYNC);
> +       if (is_zero)
>                 rpm_idle(dev, RPM_AUTO | RPM_ASYNC);
> -       else
> -               trace_rpm_usage_rcuidle(dev, RPM_AUTO | RPM_ASYNC);

The change of ordering is pointless for the reasons outlined above.

And so on.

> -
>   out:
>         spin_unlock_irq(&dev->power.lock);
>  }
> @@ -1523,9 +1530,8 @@ static void update_autosuspend(struct device *dev, int old_delay, int old_use)
>                 /* If it used to be allowed then prevent it. */
>                 if (!old_use || old_delay >= 0) {
>                         atomic_inc(&dev->power.usage_count);
> -                       rpm_resume(dev, 0);
> -               } else {
>                         trace_rpm_usage_rcuidle(dev, 0);
> +                       rpm_resume(dev, 0);
>                 }
>         }
>
> @@ -1533,8 +1539,10 @@ static void update_autosuspend(struct device *dev, int old_delay, int old_use)
>         else {
>
>                 /* If it used to be prevented then allow it. */
> -               if (old_use && old_delay < 0)
> +               if (old_use && old_delay < 0) {
>                         atomic_dec(&dev->power.usage_count);
> +                       trace_rpm_usage_rcuidle(dev, 0);
> +               }
>
>                 /* Maybe we can autosuspend now. */
>                 rpm_idle(dev, RPM_AUTO);
> @@ -1741,12 +1749,14 @@ void pm_runtime_drop_link(struct device *dev)
>  void pm_runtime_get_noresume(struct device *dev)
>  {
>         atomic_inc(&dev->power.usage_count);
> +       trace_rpm_usage_rcuidle(dev, 0);
>  }

This actually kind of makes sense, as a matter of tracing the
pm_runtime_get_noresume() usage, but not as a matter of tracing the
atomic variable value.

>  EXPORT_SYMBOL_GPL(pm_runtime_get_noresume);
>
>  void pm_runtime_put_noidle(struct device *dev)
>  {
>         atomic_add_unless(&dev->power.usage_count, -1, 0);
> +       trace_rpm_usage_rcuidle(dev, 0);
>  }
>  EXPORT_SYMBOL_GPL(pm_runtime_put_noidle);
>
> --